[
https://issues.apache.org/jira/browse/HBASE-24585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Stack updated HBASE-24585:
----------------------------------
Description:
(This description got redone after I figured out what was going on. Previously
it was just a litany of me banging around trying to learn procedure-based WAL
splitting and hbase.wal.split.to.hfile; no one needs to read that; hence the
refactor).
HBASE-24574 procedure-based distributed WAL splitting is enabled and
split-to-hflie too. A force crash requires recovery with ServerCrashProcedure
splitting old WALs on restart. The recovery fails because we get stuck. The
Master can't assign meta because it is being recovered. The recovery can't make
progress because it is asking for a table descriptor for meta -- needed by the
hbase.wal.split.to.hfile feature -- and the master is not yet initialized.
After the default timeout, Master shuts down because it can't initialize.
{code}
2020-06-18 19:53:54,175 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: Master not initialized after 200000ms
at
org.apache.hadoop.hbase.util.JVMClusterUtil.waitForEvent(JVMClusterUtil.java:232)
at
org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:200)
at
org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:430)
at
org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:232)
at
org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at
org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3059)
{code}
The abort of Master interrupts other ongoing actions so later in the log we'll
see the WAL split show as interrupted
{code}
2020-06-17 21:20:37,472 ERROR
[RS_LOG_REPLAY_OPS-regionserver/localhost:16020-0] handler.RSProcedureHandler:
Error when call RSProcedureCallable:
java.io.IOException: Failed WAL split, status=RESIGNED,
wal=file:/Users/stack/checkouts/hbase.apache.git/tmp/hbase/WALs/localhost,16020,1592440848604-splitting/localhost%2C16020%2C1592440848604.meta.1592440852959.meta
at
org.apache.hadoop.hbase.regionserver.SplitWALCallable.splitWal(SplitWALCallable.java:106)
at
org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:86)
at
org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:49)
at
org.apache.hadoop.hbase.regionserver.handler.RSProcedureHandler.process(RSProcedureHandler.java:49)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}
This issue becomes how to make hbase.wal.split.to.hfile work in standalone mode.
was:
(This description got redone after I figured out what was going on. Previously
it was just a litany of me banging around trying to learn procedure-based WAL
splitting and hbase.wal.split.to.hfile; no one needs to read that; hence the
refactor).
HBASE-24574 procedure-based distributed WAL splitting is enabled and
split-to-hflie too. A force crash requires recovery with ServerCrashProcedure
splitting old WALs on restart. The recovery fails because we get stuck. The
Master can't assign meta because it is being recovered. The recovery can't make
progress because it is asking for a table descriptor for meta -- needed by the
hbase.wal.split.to.hfile feature -- and the master is not yet initialized.
After the default timeout, Master shuts down because it can't initialize.
The abort of Master interrupts other ongoing actions so later in the log we'll
see the WAL split show as interrupted
{code}
2020-06-17 21:20:37,472 ERROR
[RS_LOG_REPLAY_OPS-regionserver/localhost:16020-0] handler.RSProcedureHandler:
Error when call RSProcedureCallable:
java.io.IOException: Failed WAL split, status=RESIGNED,
wal=file:/Users/stack/checkouts/hbase.apache.git/tmp/hbase/WALs/localhost,16020,1592440848604-splitting/localhost%2C16020%2C1592440848604.meta.1592440852959.meta
at
org.apache.hadoop.hbase.regionserver.SplitWALCallable.splitWal(SplitWALCallable.java:106)
at
org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:86)
at
org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:49)
at
org.apache.hadoop.hbase.regionserver.handler.RSProcedureHandler.process(RSProcedureHandler.java:49)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}
This issue becomes how to make hbase.wal.split.to.hfile work in standalone mode.
> Failed start recovering crash in standalone mode if procedure-based
> distributed WAL split & hbase.wal.split.to.hfile=true
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-24585
> URL: https://issues.apache.org/jira/browse/HBASE-24585
> Project: HBase
> Issue Type: Bug
> Reporter: Michael Stack
> Priority: Major
>
> (This description got redone after I figured out what was going on.
> Previously it was just a litany of me banging around trying to learn
> procedure-based WAL splitting and hbase.wal.split.to.hfile; no one needs to
> read that; hence the refactor).
> HBASE-24574 procedure-based distributed WAL splitting is enabled and
> split-to-hflie too. A force crash requires recovery with ServerCrashProcedure
> splitting old WALs on restart. The recovery fails because we get stuck. The
> Master can't assign meta because it is being recovered. The recovery can't
> make progress because it is asking for a table descriptor for meta -- needed
> by the hbase.wal.split.to.hfile feature -- and the master is not yet
> initialized. After the default timeout, Master shuts down because it can't
> initialize.
> {code}
> 2020-06-18 19:53:54,175 ERROR [main] master.HMasterCommandLine: Master
> exiting
> java.lang.RuntimeException: Master not initialized after 200000ms
> at
> org.apache.hadoop.hbase.util.JVMClusterUtil.waitForEvent(JVMClusterUtil.java:232)
> at
> org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:200)
> at
> org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:430)
> at
> org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:232)
> at
> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
> at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3059)
> {code}
> The abort of Master interrupts other ongoing actions so later in the log
> we'll see the WAL split show as interrupted
> {code}
> 2020-06-17 21:20:37,472 ERROR
> [RS_LOG_REPLAY_OPS-regionserver/localhost:16020-0]
> handler.RSProcedureHandler: Error when call RSProcedureCallable:
> java.io.IOException: Failed WAL split, status=RESIGNED,
> wal=file:/Users/stack/checkouts/hbase.apache.git/tmp/hbase/WALs/localhost,16020,1592440848604-splitting/localhost%2C16020%2C1592440848604.meta.1592440852959.meta
> at
> org.apache.hadoop.hbase.regionserver.SplitWALCallable.splitWal(SplitWALCallable.java:106)
> at
> org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:86)
> at
> org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:49)
> at
> org.apache.hadoop.hbase.regionserver.handler.RSProcedureHandler.process(RSProcedureHandler.java:49)
> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> This issue becomes how to make hbase.wal.split.to.hfile work in standalone
> mode.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)