[jira] [Commented] (HBASE-17924) Consider sorting the row order when processing multi() ops before taking rowlocks

Allan Yang (JIRA) Sun, 21 May 2017 00:18:07 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-17924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16018713#comment-16018713
 ]


Allan Yang commented on HBASE-17924:
------------------------------------

{quote}
Thanks for pointing me here from HBASE-18074 Allan Yang. Did you find any perf 
benefit sorting? 
{quote}
No, we haven't run some performance data for the patch yet, I will provide some 
if I have time to do so. But the patch should help with situation which 
parallel threads are executing multi with disordered keys(some kyes are the 
same). 
{quote}
The sort costs but knowing that mutations coming in are sorted, perhaps in 
doMiniBatch I could have a single lock cover all mutations on the same row that 
follow (but this would involve a bunch of row compares).
{quote}
Now, row lock is a ReadWrite lock. It is a reentrantlock. I don't know which is 
more costing, lock the same row or compare the rows, avoiding lock the same row 
twice. If locking the locked reentrantlock is more costing than compare the 
rows in bytes, then I think your idea is doable。


> Consider sorting the row order when processing multi() ops before taking 
> rowlocks
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-17924
>                 URL: https://issues.apache.org/jira/browse/HBASE-17924
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.0.0, 1.1.8
>            Reporter: Andrew Purtell
>            Assignee: Allan Yang
>             Fix For: 2.0.0, 1.4.0
>
>         Attachments: HBASE-17924.patch, HBASE-17924.v0.patch, 
> HBASE-17924.v2.patch, HBASE-17924.v3.patch, HBASE-17924.v4.patch, 
> HBASE-17924.v5.patch
>
>
> When processing a batch mutation, we take row locks in whatever order the 
> mutations were added to the multi op by the client.
>  
> {noformat}
> RSRpcServices#multi -> RSRpcServices#mutateRows -> HRegion#mutateRow -> 
> HRegion#mutateRowsWithLocks -> HRegion#processRowsWithLocks
> {noformat}
> Or
> {noformat}
> RSRpcServices#multi -> RSRpcServices#doNonAtomicRegionMutation ->
>       HRegion#get 
>     | HRegion#append 
>     | HRegion#increment 
>     | HRegionServer#doBatchOp -> HRegion#batchMutate -> 
> HRegion#doMiniBatchMutation
> {noformat}
>  
> multi() is fed by client APIs that accept a RowMutations object containing 
> actions for multiple rows. The container for ops inside RowMutations is an 
> ArrayList, which doesn't change the ordering of objects added to it. The 
> protobuf implementation of the messages for multi ops do not reorder the list 
> of actions. When processing multi ops we iterate over the actions in the 
> order rehydrated from protobuf.
> We should discuss sorting the order of ops by row key when processing multi() 
> ops before taking row locks. Does this make lock ordering more predictable 
> for server side operations? Yes, but potentially surprising for the client, 
> right? Is there any legitimate reason we should take locks out of row key 
> sorted order because the client has structured the request as such?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HBASE-17924) Consider sorting the row order when processing multi() ops before taking rowlocks

Reply via email to