stack created HBASE-20366:
-----------------------------
Summary: Procedure State != ProcedureState.RUNNABLE;
IllegalArgumentException
Key: HBASE-20366
URL: https://issues.apache.org/jira/browse/HBASE-20366
Project: HBase
Issue Type: Bug
Components: amv2
Reporter: stack
PE Worker dies and Region offlined because Procedure not runable when procedure
goes to run it. It looks like this:
{code}
2018-04-07 19:58:50,589 INFO [PEWorker-5] procedure.MasterProcedureScheduler:
pid=8304, state=WAITING:MOVE_REGION_ASSIGN; MoveRegionProcedure
hri=IntegrationTestBigLinkedList,p\xC3\x11\xB2,1523155040553.187ee18fb3dd1a7ac1f9f2b667160729.,
source=ve0534.halxg.cloudera.com,16020,1523153184521,
destination=ve0542.halxg.cloudera.com,16020,1523155964184 checking lock on
187ee18fb3dd1a7ac1f9f2b667160729
2018-04-07 19:58:50,589 INFO [PEWorker-14] procedure.MasterProcedureScheduler:
pid=8302, state=RUNNABLE:MOVE_REGION_ASSIGN; MoveRegionProcedure
hri=IntegrationTestBigLinkedList,\xEC0\x83\x96*\x86Qsh\xD82\x1E\xAB\x06$\x89,1523151456082.84e97ce42aeb78a2abaf8f17a278b735.,
source=ve0534.halxg.cloudera.com,16020,1523153184521,
destination=ve0542.halxg.cloudera.com,16020,1523155964184 checking lock on
84e97ce42aeb78a2abaf8f17a278b735
2018-04-07 19:58:50,591
WARN [PEWorker-5] procedure2.ProcedureExecutor: Worker terminating UNNATURALLY
null
java.lang.IllegalArgumentException: pid=8304, state=WAITING:MOVE_REGION_ASSIGN;
MoveRegionProcedure
hri=IntegrationTestBigLinkedList,p\xC3\x11\xB2,1523155040553.187ee18fb3dd1a7ac1f9f2b667160729.,
source=ve0534.halxg.cloudera.com,16020,1523153184521,
destination=ve0542.halxg.cloudera.com,16020,1523155964184
at
org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:134)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1430)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1221)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
{code}
This killed my job because it offlined a region.
Narrative:
* Balancer moves this region....
* Move procedure does dispatch to unassign...
* Suspiciously, the close comes in unannounced.. .its as though it a close
from another procedure...
2018-04-07 19:58:24,296 INFO [PEWorker-9] assignment.RegionStateStore:
pid=8305 updating hbase:meta
row=IntegrationTestBigLinkedList,p\xC3\x11\xB2,1523155040553.187ee18fb3dd1a7ac1f9f2b667160729.,
regionState=CLOSED
* Master is killed by monkey.
* Recovery. Region is in CLOSED state.
* We go to schedule the move region procedure again... Its state must have not
been updated on master crash.
2018-04-07 19:58:50,589 INFO [PEWorker-5] procedure.MasterProcedureScheduler:
pid=8304, state=WAITING:MOVE_REGION_ASSIGN; MoveRegionProcedure
hri=IntegrationTestBigLinkedList,p\xC3\x11\xB2,1523155040553.187ee18fb3dd1a7ac1f9f2b667160729.,
source=ve0534.halxg.cloudera.com,16020,1523153184521,
destination=ve0542.halxg.cloudera.com,16020,1523155964184 checking lock on
187ee18fb3dd1a7ac1f9f2b667160729
* And then we get
2018-04-07 19:58:50,591 WARN [PEWorker-5] procedure2.ProcedureExecutor:
Worker terminating UNNATURALLY null
java.lang.IllegalArgumentException: pid=8304, state=WAITING:MOVE_REGION_ASSIGN;
MoveRegionProcedure
hri=IntegrationTestBigLinkedList,p\xC3\x11\xB2,1523155040553.187ee18fb3dd1a7ac1f9f2b667160729.,
source=ve0534.halxg.cloudera.com,16020,1523153184521,
destination=ve0542.halxg.cloudera.com,16020,1523155964184 at
org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:134)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1430)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1221)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)