[ 
https://issues.apache.org/jira/browse/HBASE-21165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21165:
--------------------------
    Description: 
I  have a Master that crashed on a large cluster with hundreds of outstanding 
Procedure WALs and ~1M of Procedures to load. It is taking a long time (two 
hours) to load... 

There were STUCK procedures that were preventing clean-up of the old WALs.

I can tell we are making progress by enabling TRACE on the Procedure Store. 
Better would be an emission as we made progress through the files with an 
emission after every so many procedures loaded.

Seems like post-load, there is a long time spent sorting-out the Procedure 
image... We are in here for ages:

{code}
"master/vc0207:22001:becomeActiveMaster" #98 daemon prio=5 os_prio=0 
tid=0x0000000000d31800 nid=0x1efc0 runnable [0x00007f0a3c17d000]
   java.lang.Thread.State: RUNNABLE
        at 
org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader$WalProcedureMap.removeFromMap(ProcedureWALFormatReader.java:837)
        at 
org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader$WalProcedureMap.fetchReady(ProcedureWALFormatReader.java:614)
        at 
org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.finish(ProcedureWALFormatReader.java:201)
        at 
org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:94)
        at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:426)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:382)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:663)
        at 
org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1335)
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:878)
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2119)
        at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:567)
        at 
org.apache.hadoop.hbase.master.HMaster$$Lambda$42/1930759883.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:748)
{code}



  was:
I  have a Master that crashed on a large cluster with hundreds of ourstanding 
Procedure WALs and millions of Procedures to load. It is taking a long time 
(hours) to load... There were STUCK procedures that were preventing clean-up of 
the old WALs.

I can tell we are making progress by enabling TRACE on the Procedure Store. 
Better would be an emission as we made progress through the files with an 
emission after every so many procedures loaded.




> During ProcedureStore load, there is no listing of progress...
> --------------------------------------------------------------
>
>                 Key: HBASE-21165
>                 URL: https://issues.apache.org/jira/browse/HBASE-21165
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2, Operability
>            Reporter: stack
>            Assignee: stack
>            Priority: Major
>
> I  have a Master that crashed on a large cluster with hundreds of outstanding 
> Procedure WALs and ~1M of Procedures to load. It is taking a long time (two 
> hours) to load... 
> There were STUCK procedures that were preventing clean-up of the old WALs.
> I can tell we are making progress by enabling TRACE on the Procedure Store. 
> Better would be an emission as we made progress through the files with an 
> emission after every so many procedures loaded.
> Seems like post-load, there is a long time spent sorting-out the Procedure 
> image... We are in here for ages:
> {code}
> "master/vc0207:22001:becomeActiveMaster" #98 daemon prio=5 os_prio=0 
> tid=0x0000000000d31800 nid=0x1efc0 runnable [0x00007f0a3c17d000]
>    java.lang.Thread.State: RUNNABLE
>         at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader$WalProcedureMap.removeFromMap(ProcedureWALFormatReader.java:837)
>         at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader$WalProcedureMap.fetchReady(ProcedureWALFormatReader.java:614)
>         at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormatReader.finish(ProcedureWALFormatReader.java:201)
>         at 
> org.apache.hadoop.hbase.procedure2.store.wal.ProcedureWALFormat.load(ProcedureWALFormat.java:94)
>         at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.load(WALProcedureStore.java:426)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:382)
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:663)
>         at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1335)
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:878)
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2119)
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:567)
>         at 
> org.apache.hadoop.hbase.master.HMaster$$Lambda$42/1930759883.run(Unknown 
> Source)
>         at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to