[jira] [Commented] (HBASE-4195) Possible inconsistency in a memstore read after a reseek, possible performance improvement

nkeywal (JIRA) Tue, 23 Aug 2011 04:57:20 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089407#comment-13089407
 ]


nkeywal commented on HBASE-4195:
--------------------------------

Update: I have two other scenarios for failure, both linked to flush occuring 
during a get.

1) With the current/optimized implementation of reseek, the set snapshot and 
kvset can be changed by the thread doing the flush right in the middle of the 
reseek. This will lead to an unconsistant state.

A second effect of the flush process creating a new kvset is that the latter 
writes may not be seen by the MemStore scanner, as it will still be connected 
to the previous kvset.

2) More important, and actually not linked to the reseek optimization itself, 
the following scenario will lead to see a write on multiple families as non 
atomic.

t1 : put, it finishes at t2. Write a single row with multiple families.
t3 : get starts, it finishes at t9
t4 : the get continues, reads the value for the first family (scanner)
t5 : put, it finishes at t6. Change the values of the row previously written.
t7 : flush start,  it finishes at t8. 
t9 : the get  continues, reads the value for the second family (scanner) from 
the FileScanner
t10: get finishes

In this case, the get will have the values of the first write for the first 
families, and the value of the second write for the last families.

This is due to the fact that the flush process create a file, and notifies the 
scanners. The scanners then refreshes their view. The notification is "java 
synchronized" with the next in the StoreFileScanner, so it does not happen 
during a scan within a family, but it occurs between the families within a 
read, as there is one scanner per family. If you add a read lock in the next() 
(on HRegion.updatelock), the problem does not occur, as the flush will not take 
place during a read.

As it's a random bug, there can be other scenarios. In my environnement, when 
there is a failure:
- it's always with a KV with a memstoreTS equals to 0
- the column is always "qual1"

As said, this second is actually not linked to the modifications on 
"MemStoreScanner#reseek", but is linked to the flush/get parallel execution. I 
would tend to think that the issue happens in production as well.


> Possible inconsistency in a memstore read after a reseek, possible 
> performance improvement
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-4195
>                 URL: https://issues.apache.org/jira/browse/HBASE-4195
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.90.4
>         Environment: all
>            Reporter: nkeywal
>            Priority: Critical
>
> This follows the dicussion around HBASE-3855, and the random errors (20% 
> failure on trunk) on the unit test 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting
> I saw some points related to numIterReseek, used in the 
> MemStoreScanner#getNext (line 690):
> {noformat}679     protected KeyValue getNext(Iterator it) {
> 680         KeyValue ret = null;
> 681         long readPoint = ReadWriteConsistencyControl.getThreadReadPoint();
> 682         //DebugPrint.println( " MS@" + hashCode() + ": threadpoint = " + 
> readPoint);
> 683    
> 684         while (ret == null && it.hasNext()) {
> 685           KeyValue v = it.next();
> 686           if (v.getMemstoreTS() <= readPoint) {
> 687             // keep it.
> 688             ret = v;
> 689           }
> 690           numIterReseek--;
> 691           if (numIterReseek == 0) {
> 692             break;
> 693            }
> 694         }
> 695         return ret;
> 696       }{noformat}
> This function is called by seek, reseek, and next. The numIterReseek is only 
> usefull for reseek.
> There are some issues, I am not totally sure it's the root cause of the test 
> case error, but it could explain partly the randomness of the error, and one 
> point is for sure a bug.
> 1) In getNext, numIterReseek is decreased, then compared to zero. The seek 
> function sets numIterReseek to zero before calling getNext. It means that the 
> value will be actually negative, hence the test will always fail, and the 
> loop will continue. It is the expected behaviour, but it's quite smart.
> 2) In "reseek", numIterReseek is not set between the loops on the two 
> iterators. If the numIterReseek is equals to zero after the loop on the first 
> one, the loop on the second one will never call seek, as numIterReseek will 
> be negative.
> 3) Still in "reseek", the test to call "seek" is (kvsetNextRow == null && 
> numIterReseek == 0). In other words, if kvsetNextRow is not null when 
> numIterReseek equals zero, numIterReseek will start to be negative at the 
> next iteration and seek will never be called.
> 4) You can have side effects if reseek ends with a numIterReseek > 0: the 
> following calls to the "next" function will decrease numIterReseek to zero, 
> and getNext will break instead of continuing the loop. As a result, later 
> calls to next() may return null or not depending on how is configured the 
> default value for numIterReseek.
> To check if the issue comes from point 4, you can set the numIterReseek to 
> zero before returning in reseek:
> {noformat}      numIterReseek = 0;
>       return (kvsetNextRow != null || snapshotNextRow != null);
>     }{noformat}
> On my env, on trunk, it seems to work, but as it's random I am not really 
> sure. I also had to modify the test (I added a loop) to make it fails more 
> often, the original test was working quite well here.
> It has to be confirmed that this totally fix (it could be partial or 
> unrelated) 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting 
> before implementing a complete solution.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4195) Possible inconsistency in a memstore read after a reseek, possible performance improvement

Reply via email to