stack created HBASE-20824:
-----------------------------

             Summary: IllegalArgumentException: 
state=WAITING:SPLIT_TABLE_REGION_CREATE_DAUGHTER_REGIONS 
SplitTableRegionProcedure
                 Key: HBASE-20824
                 URL: https://issues.apache.org/jira/browse/HBASE-20824
             Project: HBase
          Issue Type: Bug
          Components: amv2
            Reporter: stack


Running tests to check HBASE-20796 STUCK RIT though region successfully 
assigned, I came across this case.

Log is throwing out these:
{code}
STUCK Region-In-Transition rit=CLOSED, 
location=ve0536.X.Y.Z.com,16020,1530233509248, 
table=IntegrationTestBigLinkedList, region=18643f59d7db3689eeb133062ae4bc4f
{code}

Backup up I got this:

{code}
2018-06-28 17:57:32,744 INFO  [PEWorker-3] procedure.MasterProcedureScheduler: 
pid=2716, state=WAITING:SPLIT_TABLE_REGION_CREATE_DAUGHTER_REGIONS; 
SplitTableRegionProcedure table=IntegrationTestBigLinkedList, 
parent=18643f59d7db3689eeb133062ae4bc4f, 
daughterA=6b0d2a0d22113c529dd49fc923a35ad9, 
daughterB=be72b5916a6a11d9ddce7d8af24f129f checking lock on 
18643f59d7db3689eeb133062ae4bc4f
2018-06-28 17:57:32,745 WARN  [PEWorker-3] procedure2.ProcedureExecutor: Worker 
terminating UNNATURALLY null                                                    
                                                                                
                       java.lang.IllegalArgumentException: pid=2716, 
state=WAITING:SPLIT_TABLE_REGION_CREATE_DAUGHTER_REGIONS; 
SplitTableRegionProcedure table=IntegrationTestBigLinkedList, 
parent=18643f59d7db3689eeb133062ae4bc4f, 
daughterA=6b0d2a0d22113c529dd49fc923a35ad9, 
daughterB=be72b5916a6a11d9ddce7d8af24f129f                                      
                                                                                
                                                                                
                                     at 
org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:134)
                                                                                
                                                                              
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1449)
                                                                                
                                                                                
     at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1240)
                                                                                
                                                                                
  at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
                                                                                
                                                                                
          at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
{code}

Just before this I see the below:
{code}
2018-06-28 17:57:32,604 WARN  [master/ve0524:16000.Chore.1] 
master.CatalogJanitor: Failed initial scan of catalog table                     
                                                                                
                                           
org.apache.hadoop.hbase.PleaseHoldException: Master is initializing             
                                                                                
                                                                                
                         at 
org.apache.hadoop.hbase.master.HMaster.isInMaintenanceMode(HMaster.java:2823)   
                                                                                
                                                                                
                    at 
org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:281)     
                                                                                
                                                                                
                    at 
org.apache.hadoop.hbase.master.CatalogJanitor.initialChore(CatalogJanitor.java:81)
  at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:184)        
                                                                                
                                                                                
                         at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)           
                                                                                
                                                                                
                         at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
  at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
                                                                                
                                                                      at 
org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:111)
                                                                                
                                   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
                                                                                
                                                                                
                    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
                                                                                
                                                                                
                    at java.lang.Thread.run(Thread.java:748)
{code}

We are failing this check

    Preconditions.checkArgument(procedure.getState() == ProcedureState.RUNNABLE,
        procedure.toString());

Oh, almost in the same millisecond, we get this:

2018-06-28 17:57:32,743 INFO  [master/ve0524:16000] master.HMaster: Master has 
completed initialization 13.246sec

Is it a race? The RUNNING state has not been set yet?

Here was what was in meta. We'd just loaded the region:
2018-06-28 17:57:32,443 INFO  [master/ve0524:16000] 
assignment.RegionStateStore: Load hbase:meta entry 
region=18643f59d7db3689eeb133062ae4bc4f, regionState=CLOSED, 
lastHost=ve0536.halxg.cloudera.com,16020,1530233509248, 
regionLocation=ve0536.X.Y.Z,16020,1530233509248, openSeqNum=488250

And before the crash, last thing was

2018-06-28 17:57:08,609 INFO  [PEWorker-16] assignment.RegionStateStore: 
pid=2717 updating hbase:meta row=18643f59d7db3689eeb133062ae4bc4f, 
regionState=CLOSED

Before that...
2018-06-28 17:57:05,457 INFO  [PEWorker-10] procedure.MasterProcedureScheduler: 
pid=2717, ppid=2716, state=RUNNABLE:REGION_TRANSITION_DISPATCH; 
UnassignProcedure table=IntegrationTestBigLinkedList, 
region=18643f59d7db3689eeb133062ae4bc4f, 
server=ve0536.X.Y.Z.com,16020,1530233509248 checking lock on 
18643f59d7db3689eeb133062ae4bc4f

pid=2716 is the split.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to