[
https://issues.apache.org/jira/browse/HBASE-20973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660244#comment-16660244
]
Allan Yang commented on HBASE-20973:
------------------------------------
Pushed to branch-2.0+, thanks for reviewing, [~stack],[~Apache9].
> ArrayIndexOutOfBoundsException when rolling back procedure
> ----------------------------------------------------------
>
> Key: HBASE-20973
> URL: https://issues.apache.org/jira/browse/HBASE-20973
> Project: HBase
> Issue Type: Sub-task
> Components: amv2
> Affects Versions: 2.1.0, 2.0.1
> Reporter: Allan Yang
> Assignee: Allan Yang
> Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-20973.branch-2.0.001.patch,
> HBASE-20973.branch-2.0.002.patch
>
>
> Find this one while investigating HBASE-20921. After the root
> procedure(ModifyTableProcedure in this case) rolled back, a
> ArrayIndexOutOfBoundsException was thrown
> {code}
> 2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159):
> CODE-BUG: Uncaught runtime exception for pid=5973,
> state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
> interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973,
> state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED;
> ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
> ang.NullPointerException; ModifyTableProcedure
> table=IntegrationTestBigLinkedList
> java.lang.UnsupportedOperationException: unhandled
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS
> at
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
> at
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
> at
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> 2018-07-18 01:39:10,243 WARN [PEWorker-8]
> procedure2.ProcedureExecutor(1756): Worker terminating UNNATURALLY null
> java.lang.ArrayIndexOutOfBoundsException: 1
> at
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
> at
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
> at
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
> at
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
> at
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
> at
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
> at
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
> {code}
> This is a very serious condition, After this exception thrown, the exclusive
> lock held by ModifyTableProcedure was never released. All the procedure
> against this table were blocked. Until the master restarted, and since the
> lock info for the procedure won't be restored, the other procedures can go
> again, it is quite embarrassing that a bug save us...(this bug will be fixed
> in HBASE-20846)
> I tried to reproduce this one using the test case in HBASE-20921 but I just
> can't reproduce it.
> A easy way to resolve this is add a try catch, making sure no matter what
> happens, the table's exclusive lock can always be relased.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)