[ 
https://issues.apache.org/jira/browse/HBASE-8806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700110#comment-13700110
 ] 

Dave Latham commented on HBASE-8806:
------------------------------------

{quote}The changes in pom.xml are not needed for this fix, right ?{quote}
That's right, sorry about that.  Just helps get m2eclipse to compile everything 
correctly.

{quote}We need change all the CP hooks signatures? CPs will(should) be able to 
acquire locks and release?{quote}
The CP changes required would be in RegionObserver:
 - {{preBatchMutate(...,MiniBatchOperationInProgress<Pair<Mutation, Integer>> 
miniBatchOp)}} becomes
 - {{preBatchMutate(...,MiniBatchOperationInProgress<Mutation> miniBatchOp)}}
 - {{postBatchMutate(...,MiniBatchOperationInProgress<Pair<Mutation, Integer>> 
miniBatchOp)}} becomes
 - {{postBatchMutate(...,MiniBatchOperationInProgress<Mutation> miniBatchOp)}}
I don't have much experience with coprocessor usage, but it seems unlikely to 
me someone would require manipulating the set of row locks in the middle of a 
miniBatch mutation.  Would it be prudent to poll the mailing list first?

Re: patch v5
Looks good to me.  Anoop's notion of favoring the case where the lock is new 
versus the lock is already owned is an interesting one.  Two possibilities of 
code with two cases:
Code A:  patch v5 - first check rowsAlreadyLocked then getLock
 - Case 1 (lock is new): hash comparison in rowsAlreadyLocked likely returns 
!contains quickly.  getLock likely also quickly finds no hash match in 
concurrent hashmap and inserts.
 - Case 2 (lock is already owned): hashCode match then bytewise equals 
comparison in rowsAlreadyLocked.contains.  getLock not called

Code B: first try getLock, then if unable to getLock check rowsAlreadyLocked
 - Case 1 (lock is new): getLock quickly finds no hash match in concurrent 
hashmap and inserts.
 - Case 2 (lock is already owned): getLock checks concurrent hashmap finds 
hashCode match then does bytewise equals comparison.  then rowsAlreadyLocked 
also gets hashCode match then bytewise equals comparison

So Code A has an extra hash check for Case 1 and Code B has an extra hash check 
and bytewise equal compare for Case 2.  I'd favor current patch (Code A) as 
it's a smaller and constant time addition and I think the code is a bit more 
intuitive, but like the property in Code B that it has no impact on Case 1 
compared to the existing 0.94 releases.  I'd be content if Rahul's tests on the 
latest patch show imapct is small.

{quote}Is it ok if we sort it out on the client side itself? {quote}
Not for 0.94 as it would be changing the contract on existing clients which may 
suddenly fail to respect row locks if they aren't updated.  Though it would be 
interesting to ask clients to sort for performance gains and then re-sort with 
a stable sort on the server.  But if people are happy with the other approaches 
discussed, they sound good to me.

{quote}Should we commit something like the 0.94 patch to trunk as well, for 
now; and then regroup and add re-entrant locks in a different jira?{quote}
+1 so long as people are comfortable with performance.  Let's give Rahul a 
chance to run his perf test on this patch if no one's in a hurry.

I will try to update my reentrant locks patch for trunk.  After posting it up I 
was considering introducing that lock groups idea too and using it for 
checkAndPut.  May play with the code there unless someone beats me to it.

Thanks to everyone for the attention on this issue!
                
> Row locks are acquired repeatedly in HRegion.doMiniBatchMutation for 
> duplicate rows.
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-8806
>                 URL: https://issues.apache.org/jira/browse/HBASE-8806
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.5
>            Reporter: rahul gidwani
>            Priority: Critical
>             Fix For: 0.95.2, 0.94.10
>
>         Attachments: 8806-0.94-v4.txt, 8806-0.94-v5.txt, 
> HBASE-8806-0.94.10.patch, HBASE-8806-0.94.10-v2.patch, 
> HBASE-8806-0.94.10-v3.patch, HBASE-8806.patch, 
> HBASE-8806-threadBasedRowLocks.patch
>
>
> If we already have the lock in the doMiniBatchMutation we don't need to 
> re-acquire it. The solution would be to keep a cache of the rowKeys already 
> locked for a miniBatchMutation and If we already have the 
> rowKey in the cache, we don't repeatedly try and acquire the lock.  A fix to 
> this problem would be to keep a set of rows we already locked and not try to 
> acquire the lock for these rows.  
> We have tested this fix in our production environment and has improved 
> replication performance quite a bit.  We saw a replication batch go from 3+ 
> minutes to less than 10 seconds for batches with duplicate row keys.
> {code}
> static final int ACQUIRE_LOCK_COUNT = 0;
>   @Test
>   public void testRedundantRowKeys() throws Exception {
>     final int batchSize = 100000;
>     
>     String tableName = getClass().getSimpleName();
>     Configuration conf = HBaseConfiguration.create();
>     conf.setClass(HConstants.REGION_IMPL, MockHRegion.class, HeapSize.class);
>     MockHRegion region = (MockHRegion) 
> TestHRegion.initHRegion(Bytes.toBytes(tableName), tableName, conf, 
> Bytes.toBytes("a"));
>     List<Pair<Mutation, Integer>> someBatch = Lists.newArrayList();
>     int i = 0;
>     while (i < batchSize) {
>       if (i % 2 == 0) {
>         someBatch.add(new Pair<Mutation, Integer>(new Put(Bytes.toBytes(0)), 
> null));
>       } else {
>         someBatch.add(new Pair<Mutation, Integer>(new Put(Bytes.toBytes(1)), 
> null));
>       }
>       i++;
>     }
>     long startTime = System.currentTimeMillis();
>     region.batchMutate(someBatch.toArray(new Pair[0]));
>     long endTime = System.currentTimeMillis();
>     long duration = endTime - startTime;
>     System.out.println("duration: " + duration + " ms");
>     assertEquals(2, ACQUIRE_LOCK_COUNT);
>   }
>   @Override
>   public Integer getLock(Integer lockid, byte[] row, boolean waitForLock) 
> throws IOException {
>     ACQUIRE_LOCK_COUNT++;
>     return super.getLock(lockid, row, waitForLock);
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to