[
https://issues.apache.org/jira/browse/HBASE-14016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612715#comment-14612715
]
Stephen Yuan Jiang commented on HBASE-14016:
--------------------------------------------
[~mbertozzi] I think we should do something like the following:
{code}
public boolean tryAcquireTableWrite(final TableName table, final String
purpose) {
boolean lockAcquired = false;
lock.lock();
try {
lockAcquired = getRunQueueOrCreate(table).tryWrite(lockManager, table,
purpose);
} finally {
lock.unlock();
}
return lockAcquired;
}
{code}
> Procedure V2: NPE in a delete table follow by create table closely
> ------------------------------------------------------------------
>
> Key: HBASE-14016
> URL: https://issues.apache.org/jira/browse/HBASE-14016
> Project: HBase
> Issue Type: Bug
> Components: proc-v2
> Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0
> Reporter: Stephen Yuan Jiang
> Assignee: Stephen Yuan Jiang
>
> In our internal test for HBASE 1.1, we found a race condition that delete
> table followed by create table closely would leak zk lock due to NPE in
> ProcedureFairRunQueues
> {noformat}
> Exception in thread "ProcedureExecutorThread-0" java.lang.NullPointerException
> at
> org.apache.hadoop.hbase.master.procedure.MasterProcedureQueue.releaseTableWrite(MasterProcedureQueue.java:279)
> at
> org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:280)
> at
> org.apache.hadoop.hbase.master.procedure.CreateTableProcedure.releaseLock(CreateTableProcedure.java:58)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:674)
> {noformat}
> Here is the code that cause the race condition:
> {code}
> protected boolean markTableAsDeleted(final TableName table) {
> TableRunQueue queue = getRunQueue(table);
> if (queue != null) {
> ...
> if (queue.isEmpty() && !queue.isLocked()) {
> fairq.remove(table);
> ...
> }
> public boolean tryWrite(final TableLockManager lockManager,
> final TableName tableName, final String purpose) {
> ...
> tableLock = lockManager.writeLock(tableName, purpose);
> try {
> tableLock.acquire();
> ...
> wlock = true;
> ...
> }
> {code}
> The root cause is: wlock is set too late and not protect the queue be deleted.
> - Thread 1: create table is running; queue is empty - tryWrite() acquire the
> lock (now wlock is still false)
> - Thread 2: markTableAsDeleted see the queue empty and wlock= false
> - Thread 1: set wlock=true - too late
> - Thread 2: delete the queue
> - Thread 1: never able to release the lock - NPE trying to get queue
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)