Andrew Purtell created HBASE-20445:
--------------------------------------
Summary: Defer work when a row lock is busy
Key: HBASE-20445
URL: https://issues.apache.org/jira/browse/HBASE-20445
Project: HBase
Issue Type: Improvement
Reporter: Andrew Purtell
Instead of blocking on row locks, defer the call and make the call runner
available so it can service other activity. Have runners pick up deferred calls
in the background after servicing the other request.
Spin briefly on tryLock() wherever we are now using lock() to acquire a row
lock. Introduce two new configuration parameters: one for the amount of time to
wait between lock acquisition attempts, and another for the total number of
times we wait before deferring the work. If the lock cannot be acquired, put
the call back into the call queue. Call queues therefore should be priority
queues sorted by deadline. Currently they are implemented with
LinkedBlockingQueue (which isn't), or AdaptiveLifoCoDelCallQueue (which is) if
the CoDel scheduler is enabled. Perhaps we could just require use of
AdaptiveLifoCoDelCallQueue. Runners will be picking up work from the head of
the queues as long as they are not empty, so deferred calls will be serviced
again, or dropped if the deadline has passed.
Implementing continuations for simple operations should be straightforward.
Batch mutations try to acquire as many rowlocks as they can, then do the
partial batch over the successfully locked rows, then loop back to attempt the
remaining work. This is a partial implementation of what we need so we can
build on it. Rather than loop around, save the partial batch completion state
and put a pointer to it along with the call back into the RPC queue.
For scans where allowPartialResults has been set to true we can simply complete
the call at the point we fail to acquire a row lock. The client will handle the
rest. For scans where allowPartialResults is false we have to save the scanner
state and partial results, and put a pointer to this state along with the call
back into the queue.
We could approach this in phases:
Phase 0 - Sort out the call queuing details. Do we require
AdaptiveLifoCoDelCallQueue? Certainly we can make use of it. Can we also have
RWQueueRpcExecutor create queues as PriorityBlockingQueue instead of
LinkedBlockingQueue? There must be a reason why not already.
Phase 1 - Implement deferral of simple ops only. (Batch mutations and scans
will still block on rowlocks.)
Phase 2 - Implement deferral of batch mutations. (Scans will still block on
rowlocks.)
Phase 3 - Implement deferral of scans where allowPartialResults is false.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)