bolao created HBASE-26302:
-----------------------------

             Summary: Due to many procedures reload from master:store, hmaster 
takes too much time to initialize
                 Key: HBASE-26302
                 URL: https://issues.apache.org/jira/browse/HBASE-26302
             Project: HBase
          Issue Type: Bug
          Components: master
    Affects Versions: 2.3.5
            Reporter: bolao
         Attachments: image-2021-09-28-11-33-23-375.png, 
image-2021-09-28-11-33-41-612.png

      when the hbase restart, we found hmaster takes much time to initialize. 
we add some logs for jars and found it's stuck in reloading procedure form 
master:store in ProcedureExecutor's init method

 
{panel:title=1. the ProcedureExecutor logs only have}
2021-09-24 11:22:13 [master/fx-hd-sc-hbase-backup-0:16000:becomeActiveMaster] 
INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(569) -Starting 
30 core workers (bigger of cpus/4 or 16) with max (burst) worker count=300
2021-09-24 11:22:13 [master/fx-hd-sc-hbase-backup-0:16000:becomeActiveMaster] 
INFO org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(589) -Recovered 
RegionProcedureStore lease in 1 msec
and don't have logs for load:

[https://github.com/apache/hbase/blob/cbebf85b3cfefc443ac8592908e8a6e95b020611/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java#L602]

 
{panel}
2. we add some logs like 
that(org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore#load):

 
{code:java}
// code placeholder
    loader.setMaxProcId(maxProcId);
    LOG.info("there are {} procedures load from master:store", procs.size());
    ProcedureTree tree = ProcedureTree.build(procs);
    loader.load(tree.getValidProcs());
    loader.handleCorrupted(tree.getCorruptedProcs());
{code}
and grep log found that:

2021-09-24 11:23:16 [master/fx-hd-sc-hbase-backup-0:16000:becomeActiveMaster] 
INFO 
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(294) 
-there are 3357861 procedures load form master:store

3. we add some logs 
(org.apache.hadoop.hbase.procedure2.ProcedureExecutor#restoreLocks()) 
{code:java}
// code placeholder
 private void restoreLocks() {
    Set<Long> restored = new HashSet<>();
    Deque<Procedure<TEnvironment>> stack = new ArrayDeque<>();
    AtomicInteger num = new AtomicInteger();
    procedures.values().forEach(proc -> {
      for (;;) {
        LOG.info("this is num {}", num.incrementAndGet());
        if (restored.contains(proc.getProcId())) {
          restoreLocks(stack, restored);
          return;
        }
        if (!proc.hasParent()) {
          restoreLock(proc, restored);
          restoreLocks(stack, restored);
          return;
        }
        stack.push(proc);
        proc = procedures.get(proc.getParentProcId());
      }
    });
  }
{code}
found when the num added to 16W, it's spended about 20 minutes.

4. By viewing the metadata of the hfile, the Earliest time is 28th June.

!image-2021-09-28-11-33-41-612.png!

5. review the souce code, the master:store ttl is default 
value(HConstants.FOREVER)

[https://github.com/apache/hbase/blob/fd3fdc08d1cd43eb3432a1a70d31c3aece6ecabe/hbase-server/src/main/java/org/apache/hadoop/hbase/master/region/MasterRegionFactory.java#L82]

and the scan for maste:store don't have filter too.

[https://github.com/apache/hbase/blob/cbebf85b3cfefc443ac8592908e8a6e95b020611/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure2/store/region/RegionProcedureStore.java#L263]

 

so we have some questions:

1. Is it reasonable to set master:store ttl is HConstants.FOREVER?
2. can we keep a small number for master:store by deleting some historical 
procedure?
Look forward to your reply! thanks!

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to