Allan Yang created HBASE-20973:
----------------------------------

             Summary: ArrayIndexOutOfBoundsException when rolling back procedure
                 Key: HBASE-20973
                 URL: https://issues.apache.org/jira/browse/HBASE-20973
             Project: HBase
          Issue Type: Sub-task
          Components: amv2
    Affects Versions: 2.0.1, 2.1.0
            Reporter: Allan Yang
            Assignee: Allan Yang


Find this one while investigating HBASE-20921. After the root 
procedure(ModifyTableProcedure  in this case) rolled back, a 
ArrayIndexOutOfBoundsException was thrown
{code}
2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
CODE-BUG: Uncaught runtime exception for pid=5973, 
state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
ang.NullPointerException; ModifyTableProcedure 
table=IntegrationTestBigLinkedList
java.lang.UnsupportedOperationException: unhandled 
state=MODIFY_TABLE_REOPEN_ALL_REGIONS
        at 
org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
        at 
org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
        at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
2018-07-18 01:39:10,243 WARN  [PEWorker-8] procedure2.ProcedureExecutor(1756): 
Worker terminating UNNATURALLY null
java.lang.ArrayIndexOutOfBoundsException: 1
        at 
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
        at 
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
        at 
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
        at 
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
        at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
        at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
        at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
{code}

This is a very serious condition, After this exception thrown, the exclusive 
lock held by ModifyTableProcedure was never released. All the procedure against 
this table were blocked. Until the master restarted, and since the lock info 
for the procedure won't be restored, the other procedures can go again, it is 
quite embarrassing that a bug save us...(this bug will be fixed in HBASE-20846)

I tried to reproduce this one using the test case in HBASE-20921 but I just 
can't reproduce it.
A easy way to resolve this is add a try catch, making sure no matter what 
happens, the table's exclusive lock can always be relased.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to