[
https://issues.apache.org/jira/browse/HBASE-21165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang resolved HBASE-21165.
-------------------------------
Assignee: (was: Michael Stack)
Resolution: Won't Fix
We use RegionProcedureStore now, close the WALProcedureStore related issues as
Won't Fix.
> During ProcedureStore load, there is no listing of progress...
> --------------------------------------------------------------
>
> Key: HBASE-21165
> URL: https://issues.apache.org/jira/browse/HBASE-21165
> Project: HBase
> Issue Type: Bug
> Components: amv2, Operability
> Reporter: Michael Stack
> Priority: Major
> Attachments: HBASE-21165.branch-2.1.001.patch
>
>
> I have a Master that crashed on a large cluster with hundreds of outstanding
> Procedure WALs and (probably --TBD) a few million Procedures to load. It is
> taking a long time (two hours)...
> There were STUCK procedures that were preventing clean-up of the old WALs.
> I can tell we are making progress by enabling TRACE on the Procedure Store.
> Better would be an emission as we made progress through the files with an
> output after every so many procedures loaded.
> Then, post-load, there is a long time spent sorting-out the Procedure
> image... We are in finish for ages doing stuff like:
> {code}
> "master/vc0207:22001:becomeActiveMaster" #98 daemon prio=5 os_prio=0
> tid=0x0000000000d31800 nid=0x1efc0 runnable [0x00007f0a3c17d000]
> java.lang.Thread.State: RUNNABLE
> at
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader$WalProcedureMap.removeFromMap(ProcedureWALFormatReader.java:837)
> at
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader$WalProcedureMap.fetchReady(ProcedureWALFormatReader.java:614)
> at
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.finish(ProcedureWALFormatReader.java:201)
> at
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:94)
> at
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:426)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:382)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:663)
> at
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1335)
> at
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:878)
> at
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2119)
> at
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:567)
> at
> org.apache.hadoop.hbase.master.HMaster$$Lambda$42/1930759883.run(Unknown
> Source)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> and....
> {code}
> "master/vc0207:22001:becomeActiveMaster" #98 daemon prio=5 os_prio=0
> tid=0x0000000000d31800 nid=0x1efc0 runnable [0x00007f0a3c17d000]
> java.lang.Thread.State: RUNNABLE
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at java.security.Provider$Service.newInstance(Provider.java:1595)
> at sun.security.jca.GetInstance.getInstance(GetInstance.java:236)
> at sun.security.jca.GetInstance.getInstance(GetInstance.java:164)
> at java.security.Security.getImpl(Security.java:695)
> at java.security.MessageDigest.getInstance(MessageDigest.java:167)
> at org.apache.hadoop.hbase.util.MD5Hash.getMD5AsHex(MD5Hash.java:59)
> at
> org.apache.hadoop.hbase.client.RegionInfo.createRegionName(RegionInfo.java:560)
> at
> org.apache.hadoop.hbase.client.RegionInfo.createRegionName(RegionInfo.java:490)
> at
> org.apache.hadoop.hbase.client.RegionInfoBuilder$MutableRegionInfo.<init>(RegionInfoBuilder.java:243)
> at
> org.apache.hadoop.hbase.client.RegionInfoBuilder.build(RegionInfoBuilder.java:120)
> at
> org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.toRegionInfo(ProtobufUtil.java:3132)
> at
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.deserializeStateData(ServerCrashProcedure.java:335)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:283)
> at
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader$Entry.convert(ProcedureWALFormatReader.java:359)
> at
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader$EntryIterator.next(ProcedureWALFormatReader.java:410)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:460)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:76)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:391)
> at
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore$2.load(WALProcedureStore.java:441)
> at
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.finish(ProcedureWALFormatReader.java:202)
> at
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:94)
> at
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:426)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:382)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:663)
> at
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1335)
> at
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:878)
> at
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2119)
> at
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:567)
> at
> org.apache.hadoop.hbase.master.HMaster$$Lambda$42/1930759883.run(Unknown
> Source)
> {code}
> In this case, it took 1.25 hours before we starting showing progress in the
> log. Let me add progress output (Separately am looking at how to do speedup).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)