[ 
https://issues.apache.org/jira/browse/HBASE-24090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072462#comment-17072462
 ] 

Pankaj Kumar commented on HBASE-24090:
--------------------------------------

{quote}Can you confirm whether the RS 'RS-IP,RS-Port,1585060304365' is still 
alive?
{quote}
Yes, server was online.
{quote}We will remove the region from the RIT map when finishing the TRSP, 
where we will call RegionStateNode.unsetProcedure.
{quote}
Yeah, but AM#handleRegionOverStuckWarningThreshold() and HM WEB UI referĀ 
RegionStates#getRegionsInTransition()
{quote}And IIRC, we will not use the OFFLINE state any more, unless you call 
offlineRegion explicitly, so why the region is in OFFLINE state when restarting
{quote}
I also observed it, region state is not set OFFLINE explicity.

> Regions Stuck in RIT in OPEN state
> ----------------------------------
>
>                 Key: HBASE-24090
>                 URL: https://issues.apache.org/jira/browse/HBASE-24090
>             Project: HBase
>          Issue Type: Bug
>          Components: amv2
>            Reporter: Pankaj Kumar
>            Priority: Major
>
> Observed few regions stuck in RIT in OPEN state in a cluster restart scenario.
> Analysis:
>  1. All RS were killed abruptly.
> 2. HMaster start SCP and initiated region assignments
> {noformat}
> 2020-03-24 22:27:08,821 | INFO  | PEWorker-20 | Initialized subprocedures=[
> {pid=49703, ppid=46611, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=usertable18, 
> region=75a79e978362d6f4ee1a3e27dfc5d4b6, ASSIGN},...] | 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1697)
> {noformat}
> But HMaster failover happens before it complete.
> 4. New active master load the previous procedures and restore to RIT
> {noformat}
> 2020-03-24 22:30:04,815 | INFO  | master/HM-IP:HM-PORT:becomeActiveMaster | 
> Attach pid=49703, ppid=46611, 
> state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=usertable18, 
> region=75a79e978362d6f4ee1a3e27dfc5d4b6, ASSIGN to rit=OFFLINE, 
> location=null, table=usertable18, region=75a79e978362d6f4ee1a3e27dfc5d4b6 to 
> restore RIT | 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.lambda$setupRIT$0(AssignmentManager.java:280)
> ---
> 2020-03-24 22:32:52,153 | WARN  | ProcExecTimeout | STUCK 
> Region-In-Transition rit=OPEN, location=RS-IP,RS-Port,1585057875346, 
> table=usertable18, region=75a79e978362d6f4ee1a3e27dfc5d4b6 | 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.handleRegionOverStuckWarningThreshold(AssignmentManager.java:1340)
> ---
> 2020-03-24 22:41:51,837 | WARN  | master/HM-IP:HM-PORT.Chore.1 | 
> unknown_server=RS-IP,RS-Port,1585057875346/usertable01,user10268,1585053943990.871858cf2ef25a9e0e6b4f022a16ebc9.,....
>  | org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:181)
> {noformat}
> Region assignment was slow as we are testing with huge number of regions per 
> RS, so RIT WARN message logged.
> 5. Finally region was assigned 
>  HM log:
> {noformat}
> 2020-03-24 22:42:26,386 | INFO  | PEWorker-11 | Took xlock for pid=49703, 
> ppid=46611, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE; 
> TransitRegionStateProcedure table=usertable18, 
> region=75a79e978362d6f4ee1a3e27dfc5d4b6, ASSIGN | 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureScheduler.waitRegions(MasterProcedureScheduler.java:737)
> 2020-03-24 22:42:26,446 | INFO  | PEWorker-11 | Starting pid=49703, 
> ppid=46611, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, 
> locked=true; TransitRegionStateProcedure table=usertable18, 
> region=75a79e978362d6f4ee1a3e27dfc5d4b6, ASSIGN; rit=OPEN, location=null; 
> forceNewPlan=true, retain=false | 
> org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure.queueAssign(TransitRegionStateProcedure.java:189)
> 2020-03-24 22:42:26,717 | INFO  | PEWorker-17 | pid=49703 updating hbase:meta 
> row=75a79e978362d6f4ee1a3e27dfc5d4b6, regionState=OPENING, 
> regionLocation=RS-IP,RS-Port,1585060304365 | 
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateUserRegionLocation(RegionStateStore.java:201)
> 2020-03-24 22:42:27,439 | INFO  | PEWorker-19 | pid=49703 updating hbase:meta 
> row=75a79e978362d6f4ee1a3e27dfc5d4b6, regionState=OPEN, openSeqNum=5, 
> regionLocation=RS-IP,RS-Port,1585060304365 | 
> org.apache.hadoop.hbase.master.assignment.RegionStateStore.updateUserRegionLocation(RegionStateStore.java:201)
> 2020-03-24 22:42:27,701 | INFO  | PEWorker-19 | Finished subprocedure 
> pid=73705, resume processing parent pid=49703, ppid=46611, 
> state=RUNNABLE:REGION_STATE_TRANSITION_CONFIRM_OPENED, locked=true; 
> TransitRegionStateProcedure table=usertable18, 
> region=75a79e978362d6f4ee1a3e27dfc5d4b6, ASSIGN | 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.countDownChildren(ProcedureExecutor.java:1837)
> 2020-03-24 22:42:27,821 | INFO  | PEWorker-15 | Finished pid=49703, 
> ppid=46611, state=SUCCESS; TransitRegionStateProcedure table=usertable18, 
> region=75a79e978362d6f4ee1a3e27dfc5d4b6, ASSIGN in 15mins, 18.888sec | 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1427)
> {noformat}
> RS Log:
> {noformat}
> 2020-03-24 22:42:27,230 | INFO  | 
> RS_OPEN_REGION-regionserver/RS-IP:RS-Port-34 | Open 
> usertable18,user29616,1585055007688.75a79e978362d6f4ee1a3e27dfc5d4b6. | 
> org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:123)
> 2020-03-24 22:42:27,241 | INFO  | 
> StoreOpener-75a79e978362d6f4ee1a3e27dfc5d4b6-1 | Created cacheConfig: 
> cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, 
> cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, 
> prefetchOnOpen=false for family {NAME => 'family', VERSIONS => '1', 
> EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', 
> DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', 
> REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 
> 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', 
> PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 
> 'true', BLOCKSIZE => '65536'} with blockCache=LruBlockCache{blockCount=0, 
> currentSize=5.74 MB, freeSize=7.64 GB, maxSize=7.65 GB, heapSize=5.74 MB, 
> minSize=7.27 GB, minFactor=0.95, multiSize=3.63 GB, multiFactor=0.5, 
> singleSize=1.82 GB, singleFactor=0.25} | 
> org.apache.hadoop.hbase.io.hfile.CacheConfig.<init>(CacheConfig.java:174)
> 2020-03-24 22:42:27,242 | INFO  | 
> StoreOpener-75a79e978362d6f4ee1a3e27dfc5d4b6-1 | size [128 MB, 8.00 EB, 8.00 
> EB); files [6, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point 
> 1610612736; major period 604800000, major jitter 0.500000, min locality to 
> compact 0.000000; tiered compaction: max_age 9223372036854775807, incoming 
> window min 6, compaction policy for tiered window 
> org.apache.hadoop.hbase.regionserver.compactions.ExploringCompactionPolicy, 
> single output for minor true, compaction window factory 
> org.apache.hadoop.hbase.regionserver.compactions.ExponentialCompactionWindowFactory
>  | 
> org.apache.hadoop.hbase.regionserver.compactions.CompactionConfiguration.<init>(CompactionConfiguration.java:147)
> 2020-03-24 22:42:27,243 | INFO  | 
> StoreOpener-75a79e978362d6f4ee1a3e27dfc5d4b6-1 | Store=family,  memstore 
> type=DefaultMemStore, storagePolicy=HOT, verifyBulkLoads=false, 
> parallelPutCountPrintThreshold=50, encoding=NONE, compression=NONE | 
> org.apache.hadoop.hbase.regionserver.HStore.<init>(HStore.java:335)
> 2020-03-24 22:42:27,252 | INFO  | 
> RS_OPEN_REGION-regionserver/RS-IP:RS-Port-34 | Opened 
> 75a79e978362d6f4ee1a3e27dfc5d4b6; next sequenceid=5 | 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1067)
> 2020-03-24 22:42:27,254 | INFO  | 
> RS_OPEN_REGION-regionserver/RS-IP:RS-Port-34 | Post open deploy tasks for 
> usertable18,user29616,1585055007688.75a79e978362d6f4ee1a3e27dfc5d4b6., 
> openProcId=73705, masterSystemTime=1585060947225 | 
> org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegionServer.java:2379)
> 2020-03-24 22:42:27,320 | INFO  | 
> RS_OPEN_REGION-regionserver/RS-IP:RS-Port-34 | Opened 
> usertable18,user29616,1585055007688.75a79e978362d6f4ee1a3e27dfc5d4b6. | 
> org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:141)
> {noformat}
> 6. Evn though region was opened successfully but still the region in RIT, in 
> OPEN state
> {noformat}
> 2020-03-24 22:49:05,432 | WARN  | ProcExecTimeout | STUCK 
> Region-In-Transition rit=OPEN, location=RS-IP,RS-Port,1585060304365, 
> table=usertable18, region=75a79e978362d6f4ee1a3e27dfc5d4b6 | 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.handleRegionOverStuckWarningThreshold(AssignmentManager.java:1340)
> {noformat}
> This WARN message keep occuring in HM log.
> HBase version: 2.2.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to