stack created HBASE-20015:
-----------------------------
Summary: TestMergeTableRegionsProcedure and
TestRegionMergeTransactionOnCluster flakey
Key: HBASE-20015
URL: https://issues.apache.org/jira/browse/HBASE-20015
Project: HBase
Issue Type: Sub-task
Components: flakey
Reporter: stack
Assignee: stack
Fix For: 2.0.0-beta-2
MergeRegionProcedure seems incomplete. The ProcedureExecutor framework can run
in a test mode such that it kills the Procedure before it can persist state and
it does this repeatedly to shake out areas where Procedures may not be
preserving all needed state at each Procedural step. The kill will cause the
Procedure to 'fail'. It'll then run the rollback procedure. The
MergeRegionProcedure is not able to roll back the last few steps of Merge....
It throws an UnsupportedException (the hope was that the missing steps would be
filled in ... but they are hard to complete in that they themselves are
stepped).
So....
Well it turns out that Split has a mechanism where it will not fail the
Procedure if gets to a stage from which it cannot rollback. Instead, it will
just retry and keep retrying till it succeeds.... eventually. Merge has this
facility half-implemented. Merge tests are therefore flakey. They do stuff like
this:
{code}
2018-02-17 04:04:02,999 WARN [PEWorker-1]
assignment.MergeTableRegionsProcedure(311): Failed rollback attempt step
MERGE_TABLE_REGIONS_UPDATE_META for merging the regions
[485dd0c2a5d14601d61fed791f793158, 8af34a614f064c162ab1d05eac7fca4c] in table
testRollbackAndDoubleExecution
java.lang.UnsupportedOperationException: pid=44,
state=FAILED:MERGE_TABLE_REGIONS_UPDATE_META,
exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via
MergeTableRegionsProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException:
abort requested; MergeTableRegionsProcedure
table=testRollbackAndDoubleExecution,
regions=[485dd0c2a5d14601d61fed791f793158, 8af34a614f064c162ab1d05eac7fca4c],
forcibly=false unhandled state=MERGE_TABLE_REGIONS_UPDATE_META
at
org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.rollbackState(MergeTableRegionsProcedure.java:291)
at
org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.rollbackState(MergeTableRegionsProcedure.java:78)
at
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:199)
at
org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:859)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1356)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1312)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1181)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734)
2018-02-17 04:04:03,007 ERROR [PEWorker-1] helpers.MarkerIgnoringBase(159):
CODE-BUG: Uncaught runtime exception for pid=44,
state=FAILED:MERGE_TABLE_REGIONS_UPDATE_META,
exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via
MergeTableRegionsProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException:
abort requested; MergeTableRegionsProcedure
table=testRollbackAndDoubleExecution,
regions=[485dd0c2a5d14601d61fed791f793158, 8af34a614f064c162ab1d05eac7fca4c],
forcibly=false
java.lang.UnsupportedOperationException: pid=44,
state=FAILED:MERGE_TABLE_REGIONS_UPDATE_META,
exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via
MergeTableRegionsProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException:
abort requested; MergeTableRegionsProcedure
table=testRollbackAndDoubleExecution,
regions=[485dd0c2a5d14601d61fed791f793158, 8af34a614f064c162ab1d05eac7fca4c],
forcibly=false unhandled state=MERGE_TABLE_REGIONS_UPDATE_META
at
org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.rollbackState(MergeTableRegionsProcedure.java:291)
at
org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.rollbackState(MergeTableRegionsProcedure.java:78)
at
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:199)
at
org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:859)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1356)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1312)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1181)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734)
{code}
i.e. throw up their hands which makes for a CODE-BUG... a condition the
framework can not process.... The test fails.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)