[ 
https://issues.apache.org/jira/browse/HBASE-20445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442657#comment-16442657
 ] 

Mike Drob commented on HBASE-20445:
-----------------------------------

Initial concern would be about increased memory footprint if we're pausing 
requests to service others then everything ends up on the heap. I think this 
could make failures worse if there is one row lock that is stuck for whatever 
reason and we have multiple requests coming in that need the same row then they 
will all fill up on partial results. Maybe that's not possible though, I'm 
speaking in design hypotheticals.

> Defer work when a row lock is busy
> ----------------------------------
>
>                 Key: HBASE-20445
>                 URL: https://issues.apache.org/jira/browse/HBASE-20445
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
>            Priority: Major
>
> Instead of blocking on row locks, defer the call and make the call runner 
> available so it can service other activity. Have runners pick up deferred 
> calls in the background after servicing the other request. 
> Spin briefly on tryLock() wherever we are now using lock() to acquire a row 
> lock. Introduce two new configuration parameters: one for the amount of time 
> to wait between lock acquisition attempts, and another for the total number 
> of times we wait before deferring the work. If the lock cannot be acquired, 
> put the call back into the call queue. Call queues therefore should be 
> priority queues sorted by deadline. Currently they are implemented with 
> LinkedBlockingQueue (which isn't), or AdaptiveLifoCoDelCallQueue (which is) 
> if the CoDel scheduler is enabled. Perhaps we could just require use of 
> AdaptiveLifoCoDelCallQueue. Runners will be picking up work from the head of 
> the queues as long as they are not empty, so deferred calls will be serviced 
> again, or dropped if the deadline has passed.
> Implementing continuations for simple operations should be straightforward. 
> Batch mutations try to acquire as many rowlocks as they can, then do the 
> partial batch over the successfully locked rows, then loop back to attempt 
> the remaining work. This is a partial implementation of what we need so we 
> can build on it. Rather than loop around, save the partial batch completion 
> state and put a pointer to it along with the call back into the RPC queue.
> For scans where allowPartialResults has been set to true we can simply 
> complete the call at the point we fail to acquire a row lock. The client will 
> handle the rest. For scans where allowPartialResults is false we have to save 
> the scanner state and partial results, and put a pointer to this state along 
> with the call back into the queue. 
> We could approach this in phases:
> Phase 0 - Sort out the call queuing details. Do we require 
> AdaptiveLifoCoDelCallQueue? Certainly we can make use of it. Can we also have 
> RWQueueRpcExecutor create queues as PriorityBlockingQueue instead of 
> LinkedBlockingQueue? There must be a reason why not already.
> Phase 1 - Implement deferral of simple ops only. (Batch mutations and scans 
> will still block on rowlocks.)
> Phase 2 - Implement deferral of batch mutations. (Scans will still block on 
> rowlocks.)
> Phase 3 - Implement deferral of scans where allowPartialResults is false.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to