[
https://issues.apache.org/jira/browse/HBASE-20445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441535#comment-16441535
]
Andrew Purtell commented on HBASE-20445:
----------------------------------------
I'm not 100% familiar with the state of things in trunk. This is written from a
branch-1 perspective, and is for brainstorming and discussion not a complete
proposal. Ideally the results can be ported back to a branch-1 minor.
Description subject to change as this idea is thought through.
> Defer work when a row lock is busy
> ----------------------------------
>
> Key: HBASE-20445
> URL: https://issues.apache.org/jira/browse/HBASE-20445
> Project: HBase
> Issue Type: Improvement
> Reporter: Andrew Purtell
> Priority: Major
>
> Instead of blocking on row locks, defer the call and make the call runner
> available so it can service other activity. Have runners pick up deferred
> calls in the background after servicing the other request.
> Spin briefly on tryLock() wherever we are now using lock() to acquire a row
> lock. Introduce two new configuration parameters: one for the amount of time
> to wait between lock acquisition attempts, and another for the total number
> of times we wait before deferring the work. If the lock cannot be acquired,
> put the call back into the call queue. Call queues therefore should be
> priority queues sorted by deadline. Currently they are implemented with
> LinkedBlockingQueue (which isn't), or AdaptiveLifoCoDelCallQueue (which is)
> if the CoDel scheduler is enabled. Perhaps we could just require use of
> AdaptiveLifoCoDelCallQueue. Runners will be picking up work from the head of
> the queues as long as they are not empty, so deferred calls will be serviced
> again, or dropped if the deadline has passed.
> Implementing continuations for simple operations should be straightforward.
> Batch mutations try to acquire as many rowlocks as they can, then do the
> partial batch over the successfully locked rows, then loop back to attempt
> the remaining work. This is a partial implementation of what we need so we
> can build on it. Rather than loop around, save the partial batch completion
> state and put a pointer to it along with the call back into the RPC queue.
> For scans where allowPartialResults has been set to true we can simply
> complete the call at the point we fail to acquire a row lock. The client will
> handle the rest. For scans where allowPartialResults is false we have to save
> the scanner state and partial results, and put a pointer to this state along
> with the call back into the queue.
> We could approach this in phases:
> Phase 0 - Sort out the call queuing details. Do we require
> AdaptiveLifoCoDelCallQueue? Certainly we can make use of it. Can we also have
> RWQueueRpcExecutor create queues as PriorityBlockingQueue instead of
> LinkedBlockingQueue? There must be a reason why not already.
> Phase 1 - Implement deferral of simple ops only. (Batch mutations and scans
> will still block on rowlocks.)
> Phase 2 - Implement deferral of batch mutations. (Scans will still block on
> rowlocks.)
> Phase 3 - Implement deferral of scans where allowPartialResults is false.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)