[jira] [Commented] (HBASE-4195) Possible inconsistency in a memstore read after a reseek, possible performance improvement

nkeywal (JIRA) Fri, 16 Sep 2011 12:06:35 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13106718#comment-13106718
 ]


nkeywal commented on HBASE-4195:
--------------------------------

Thank you. What is the next step? Should I write a new patch taking into 
account all the different comment?

> Possible inconsistency in a memstore read after a reseek, possible 
> performance improvement
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-4195
>                 URL: https://issues.apache.org/jira/browse/HBASE-4195
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>         Environment: all
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: 20110824_4195_MemStore.patch, 
> 20110824_4195_TestHRegion.patch, 20110915_4195_4188_MemStore.patch, 
> 4195-v2.txt
>
>
> This follows the dicussion around HBASE-3855, and the random errors (20% 
> failure on trunk) on the unit test 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting
> I saw some points related to numIterReseek, used in the 
> MemStoreScanner#getNext (line 690):
> {noformat}679     protected KeyValue getNext(Iterator it) {
> 680         KeyValue ret = null;
> 681         long readPoint = ReadWriteConsistencyControl.getThreadReadPoint();
> 682         //DebugPrint.println( " MS@" + hashCode() + ": threadpoint = " + 
> readPoint);
> 683    
> 684         while (ret == null && it.hasNext()) {
> 685           KeyValue v = it.next();
> 686           if (v.getMemstoreTS() <= readPoint) {
> 687             // keep it.
> 688             ret = v;
> 689           }
> 690           numIterReseek--;
> 691           if (numIterReseek == 0) {
> 692             break;
> 693            }
> 694         }
> 695         return ret;
> 696       }{noformat}
> This function is called by seek, reseek, and next. The numIterReseek is only 
> usefull for reseek.
> There are some issues, I am not totally sure it's the root cause of the test 
> case error, but it could explain partly the randomness of the error, and one 
> point is for sure a bug.
> 1) In getNext, numIterReseek is decreased, then compared to zero. The seek 
> function sets numIterReseek to zero before calling getNext. It means that the 
> value will be actually negative, hence the test will always fail, and the 
> loop will continue. It is the expected behaviour, but it's quite smart.
> 2) In "reseek", numIterReseek is not set between the loops on the two 
> iterators. If the numIterReseek is equals to zero after the loop on the first 
> one, the loop on the second one will never call seek, as numIterReseek will 
> be negative.
> 3) Still in "reseek", the test to call "seek" is (kvsetNextRow == null && 
> numIterReseek == 0). In other words, if kvsetNextRow is not null when 
> numIterReseek equals zero, numIterReseek will start to be negative at the 
> next iteration and seek will never be called.
> 4) You can have side effects if reseek ends with a numIterReseek > 0: the 
> following calls to the "next" function will decrease numIterReseek to zero, 
> and getNext will break instead of continuing the loop. As a result, later 
> calls to next() may return null or not depending on how is configured the 
> default value for numIterReseek.
> To check if the issue comes from point 4, you can set the numIterReseek to 
> zero before returning in reseek:
> {noformat}      numIterReseek = 0;
>       return (kvsetNextRow != null || snapshotNextRow != null);
>     }{noformat}
> On my env, on trunk, it seems to work, but as it's random I am not really 
> sure. I also had to modify the test (I added a loop) to make it fails more 
> often, the original test was working quite well here.
> It has to be confirmed that this totally fix (it could be partial or 
> unrelated) 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting 
> before implementing a complete solution.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4195) Possible inconsistency in a memstore read after a reseek, possible performance improvement

Reply via email to