stack created HBASE-20366: ----------------------------- Summary: Procedure State != ProcedureState.RUNNABLE; IllegalArgumentException Key: HBASE-20366 URL: https://issues.apache.org/jira/browse/HBASE-20366 Project: HBase Issue Type: Bug Components: amv2 Reporter: stack
PE Worker dies and Region offlined because Procedure not runable when procedure goes to run it. It looks like this: {code} 2018-04-07 19:58:50,589 INFO [PEWorker-5] procedure.MasterProcedureScheduler: pid=8304, state=WAITING:MOVE_REGION_ASSIGN; MoveRegionProcedure hri=IntegrationTestBigLinkedList,p\xC3\x11\xB2,1523155040553.187ee18fb3dd1a7ac1f9f2b667160729., source=ve0534.halxg.cloudera.com,16020,1523153184521, destination=ve0542.halxg.cloudera.com,16020,1523155964184 checking lock on 187ee18fb3dd1a7ac1f9f2b667160729 2018-04-07 19:58:50,589 INFO [PEWorker-14] procedure.MasterProcedureScheduler: pid=8302, state=RUNNABLE:MOVE_REGION_ASSIGN; MoveRegionProcedure hri=IntegrationTestBigLinkedList,\xEC0\x83\x96*\x86Qsh\xD82\x1E\xAB\x06$\x89,1523151456082.84e97ce42aeb78a2abaf8f17a278b735., source=ve0534.halxg.cloudera.com,16020,1523153184521, destination=ve0542.halxg.cloudera.com,16020,1523155964184 checking lock on 84e97ce42aeb78a2abaf8f17a278b735 2018-04-07 19:58:50,591 WARN [PEWorker-5] procedure2.ProcedureExecutor: Worker terminating UNNATURALLY null java.lang.IllegalArgumentException: pid=8304, state=WAITING:MOVE_REGION_ASSIGN; MoveRegionProcedure hri=IntegrationTestBigLinkedList,p\xC3\x11\xB2,1523155040553.187ee18fb3dd1a7ac1f9f2b667160729., source=ve0534.halxg.cloudera.com,16020,1523153184521, destination=ve0542.halxg.cloudera.com,16020,1523155964184 at org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:134) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1430) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1221) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) {code} This killed my job because it offlined a region. Narrative: * Balancer moves this region.... * Move procedure does dispatch to unassign... * Suspiciously, the close comes in unannounced.. .its as though it a close from another procedure... 2018-04-07 19:58:24,296 INFO [PEWorker-9] assignment.RegionStateStore: pid=8305 updating hbase:meta row=IntegrationTestBigLinkedList,p\xC3\x11\xB2,1523155040553.187ee18fb3dd1a7ac1f9f2b667160729., regionState=CLOSED * Master is killed by monkey. * Recovery. Region is in CLOSED state. * We go to schedule the move region procedure again... Its state must have not been updated on master crash. 2018-04-07 19:58:50,589 INFO [PEWorker-5] procedure.MasterProcedureScheduler: pid=8304, state=WAITING:MOVE_REGION_ASSIGN; MoveRegionProcedure hri=IntegrationTestBigLinkedList,p\xC3\x11\xB2,1523155040553.187ee18fb3dd1a7ac1f9f2b667160729., source=ve0534.halxg.cloudera.com,16020,1523153184521, destination=ve0542.halxg.cloudera.com,16020,1523155964184 checking lock on 187ee18fb3dd1a7ac1f9f2b667160729 * And then we get 2018-04-07 19:58:50,591 WARN [PEWorker-5] procedure2.ProcedureExecutor: Worker terminating UNNATURALLY null java.lang.IllegalArgumentException: pid=8304, state=WAITING:MOVE_REGION_ASSIGN; MoveRegionProcedure hri=IntegrationTestBigLinkedList,p\xC3\x11\xB2,1523155040553.187ee18fb3dd1a7ac1f9f2b667160729., source=ve0534.halxg.cloudera.com,16020,1523153184521, destination=ve0542.halxg.cloudera.com,16020,1523155964184 at org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:134) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1430) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1221) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741) -- This message was sent by Atlassian JIRA (v7.6.3#76005)