stack created HBASE-20978:
-----------------------------

             Summary: [amv2] UNATURAL! termination of worker during 
MoveRegionProcedure
                 Key: HBASE-20978
                 URL: https://issues.apache.org/jira/browse/HBASE-20978
             Project: HBase
          Issue Type: Bug
          Components: amv2
    Affects Versions: 2.0.1
            Reporter: stack
            Assignee: stack
             Fix For: 2.0.2


Testing tip of branch-2.0, ran into this:

{code}
2018-07-29 01:45:33,002 INFO  [master/ve0524:16000] master.HMaster: Master has 
completed initialization 13.854sec                                              
                                                             2018-07-29 
01:45:33,003 INFO  [PEWorker-4] procedure.MasterProcedureScheduler: pid=1820, 
state=WAITING:MOVE_REGION_ASSIGN; MoveRegionProcedure 
hri=533fb79ba23b27e9e0715b51daeb30c1, 
source=ve0538.halxg.cloudera.com,16020,1532847421672, 
destination=ve0540.halxg.cloudera.com,16020,1532853151031 checking lock on 
533fb79ba23b27e9e0715b51daeb30c1                                                
                                                  2018-07-29 01:45:33,003 WARN  
[PEWorker-4] procedure2.ProcedureExecutor: Worker terminating UNNATURALLY null
java.lang.IllegalArgumentException: pid=1820, state=WAITING:MOVE_REGION_ASSIGN; 
MoveRegionProcedure hri=533fb79ba23b27e9e0715b51daeb30c1, 
source=ve0538.halxg.cloudera.com,16020,1532847421672, 
destination=ve0540.halxg.cloudera.com,16020,1532853151031
  at 
org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:134)
                                                                                
                                   at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1458)
  at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1249)
                                                                                
                                       at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:76)
  at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1763)
{code}

It then shows as the below in the UI:
{code}

Id      Parent  State   Owner   Type    Start Time      Last Update     Errors  
Parameters
1820            WAITING stack   MoveRegionProcedure 
hri=533fb79ba23b27e9e0715b51daeb30c1, 
source=ve0538.halxg.cloudera.com,16020,1532847421672, 
destination=ve0540.halxg.cloudera.com,16020,1532853151031       Sun Jul 29 
01:33:37 PDT 2018    Sun Jul 29 01:33:38 PDT 2018            [ { state => [ 
'1', '2' ] }, { regionId => '1532851768240', tableName => { namespace => 
'ZGVmYXVsdA==', qualifier => 'SW50ZWdyYXRpb25UZXN0QmlnTGlua2VkTGlzdA==' }, 
startKey => 'VttDLvXHdcmzwqNdrNoUFg==', endKey => 'WGFV8k+hFqhcIJGiKZ8L4Q==', 
offline => 'false', split => 'false', replicaId => '0' }, { sourceServer => { 
hostName => 've0538.halxg.cloudera.com', port => '16020', startCode => 
'1532847421672' }, destinationServer => { hostName => 
've0540.halxg.cloudera.com', port => '16020', startCode => '1532853151031' } } ]
{code}

This is what we'd just read from hbase:meta:

{code}
2018-07-29 01:45:32,802 INFO  [master/ve0524:16000] 
assignment.RegionStateStore: Load hbase:meta entry 
region=533fb79ba23b27e9e0715b51daeb30c1, regionState=CLOSED, 
lastHost=ve0538.halxg.cloudera.com,16020,1532847421672, 
regionLocation=ve0538.halxg.cloudera.com,16020,1532847421672, openSeqNum=1544600
{code}

Before this, we'd just logged this:

2018-07-29 01:33:39,786 INFO  [PEWorker-14] assignment.RegionStateStore: 
pid=1823 updating hbase:meta row=533fb79ba23b27e9e0715b51daeb30c1, 
regionState=CLOSED


Going back in history, we do the above each time the Master gets restarted so 
the region is offlined and never brought back online.

It is failing here:

{code}
  private void execProcedure(final RootProcedureState procStack,
      final Procedure<TEnvironment> procedure) {
    Preconditions.checkArgument(procedure.getState() == ProcedureState.RUNNABLE,
        procedure.toString());
{code}

Its the parent move region that is trying to run and failing. It is not 
RUNNABLE? Because the subprocedure was 'done' but not fully?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to