[ 
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690533#comment-16690533
 ] 

Duo Zhang commented on HBASE-21490:
-----------------------------------

OK, the root cause is a bug in RecoverStandByProcedure, there is a NPE when 
loading it and then causes the master down. But after two times of restarts, 
the file contains the procedures is deleted.

{noformat}
2018-11-16,20:43:37,454 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true    
ugi=hbase_tst/[email protected] (auth:KERBEROS)      ip=/10.132.16.33        
cmd=create      
src=/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000185.log   
perm=hbase_tst:supergroup:rw-r-----        proto=rpc
2018-11-16,21:05:58,652 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true    
ugi=hbase_tst/[email protected] (auth:KERBEROS)      ip=/10.132.16.34        
cmd=open        
src=/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000185.log   proto=rpc
2018-11-16,21:05:58,747 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true    
ugi=hbase_tst/[email protected] (auth:KERBEROS)      ip=/10.132.16.34        
cmd=open        
src=/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000185.log   proto=rpc
2018-11-16,21:06:04,196 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true    
ugi=hbase_tst/[email protected] (auth:KERBEROS)      ip=/10.132.16.34        
cmd=open        
src=/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000185.log   proto=rpc
2018-11-16,21:06:04,305 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true    
ugi=hbase_tst/[email protected] (auth:KERBEROS)      ip=/10.132.16.34        
cmd=open        
src=/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000185.log   proto=rpc
2018-11-16,21:06:04,669 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true    
ugi=hbase_tst/[email protected] (auth:KERBEROS)      ip=/10.132.16.34        
cmd=rename      
src=/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000185.log   
dst=/hbase/c4tst-sync1/oldWALs/pv2-00000000000000000185.log        
perm=hbase_tst:supergroup:rw-r-----     proto=rpc
2018-11-16,21:07:12,776 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true    
ugi=hbase_tst/[email protected] (auth:KERBEROS)      ip=/10.132.16.34        
cmd=delete      src=/hbase/c4tst-sync1/oldWALs/pv2-00000000000000000185.log     
{noformat}

Let me check what is going on here...

> WALProcedure may remove proc wal files still with active procedures
> -------------------------------------------------------------------
>
>                 Key: HBASE-21490
>                 URL: https://issues.apache.org/jira/browse/HBASE-21490
>             Project: HBase
>          Issue Type: Sub-task
>          Components: proc-v2
>            Reporter: Duo Zhang
>            Priority: Major
>
> It happens for me several times. After master restart, all the procedures are 
> gone.
> And the proc wal files were deleted before restarting, I see this in the 
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all 
> state logs with ID less than 184, since all the active procedures are in the 
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread] 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving 
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000184.log
>  to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-00000000000000000184.log
> {noformat}
> Let me dig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to