[
https://issues.apache.org/jira/browse/HBASE-21490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690533#comment-16690533
]
Duo Zhang commented on HBASE-21490:
-----------------------------------
OK, the root cause is a bug in RecoverStandByProcedure, there is a NPE when
loading it and then causes the master down. But after two times of restarts,
the file contains the procedures is deleted.
{noformat}
2018-11-16,20:43:37,454 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=hbase_tst/[email protected] (auth:KERBEROS) ip=/10.132.16.33
cmd=create
src=/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000185.log
perm=hbase_tst:supergroup:rw-r----- proto=rpc
2018-11-16,21:05:58,652 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=hbase_tst/[email protected] (auth:KERBEROS) ip=/10.132.16.34
cmd=open
src=/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000185.log proto=rpc
2018-11-16,21:05:58,747 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=hbase_tst/[email protected] (auth:KERBEROS) ip=/10.132.16.34
cmd=open
src=/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000185.log proto=rpc
2018-11-16,21:06:04,196 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=hbase_tst/[email protected] (auth:KERBEROS) ip=/10.132.16.34
cmd=open
src=/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000185.log proto=rpc
2018-11-16,21:06:04,305 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=hbase_tst/[email protected] (auth:KERBEROS) ip=/10.132.16.34
cmd=open
src=/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000185.log proto=rpc
2018-11-16,21:06:04,669 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=hbase_tst/[email protected] (auth:KERBEROS) ip=/10.132.16.34
cmd=rename
src=/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000185.log
dst=/hbase/c4tst-sync1/oldWALs/pv2-00000000000000000185.log
perm=hbase_tst:supergroup:rw-r----- proto=rpc
2018-11-16,21:07:12,776 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true
ugi=hbase_tst/[email protected] (auth:KERBEROS) ip=/10.132.16.34
cmd=delete src=/hbase/c4tst-sync1/oldWALs/pv2-00000000000000000185.log
{noformat}
Let me check what is going on here...
> WALProcedure may remove proc wal files still with active procedures
> -------------------------------------------------------------------
>
> Key: HBASE-21490
> URL: https://issues.apache.org/jira/browse/HBASE-21490
> Project: HBase
> Issue Type: Sub-task
> Components: proc-v2
> Reporter: Duo Zhang
> Priority: Major
>
> It happens for me several times. After master restart, all the procedures are
> gone.
> And the proc wal files were deleted before restarting, I see this in the
> master's log
> {noformat}
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread]
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore: Remove all
> state logs with ID less than 184, since all the active procedures are in the
> latest log
> 2018-11-16,20:57:40,177 INFO [WALProcedureStoreSyncThread]
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFile: Archiving
> hdfs://c4tst-xiaomi/hbase/c4tst-sync1/MasterProcWALs/pv2-00000000000000000184.log
> to hdfs://c4tst-xiaomi/hbase/c4tst-sync1/oldWALs/pv2-00000000000000000184.log
> {noformat}
> Let me dig...
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)