[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

Hadoop QA (JIRA) Wed, 30 Jul 2014 18:27:07 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14080367#comment-14080367
 ]


Hadoop QA commented on HBASE-11591:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658788/HBASE-11591.patch
  against trunk revision .
  ATTACHMENT ID: 12658788

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified tests.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
                       
org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction
                  org.apache.hadoop.hbase.migration.TestNamespaceUpgrade
                  org.apache.hadoop.hbase.regionserver.TestScannerWithBulkload
                  
org.apache.hadoop.hbase.master.TestMasterOperationsForRegionReplicas
                  org.apache.hadoop.hbase.regionserver.TestRegionReplicas
                  
org.apache.hadoop.hbase.mapreduce.TestImportTSVWithVisibilityLabels
                  org.apache.hadoop.hbase.client.TestReplicasClient
                  org.apache.hadoop.hbase.master.TestRestartCluster
                  org.apache.hadoop.hbase.TestIOFencing

     {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s): 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/10236//console

This message is automatically generated.

> Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
> than the cell's mvcc in a non-bulk loaded file
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-11591
>                 URL: https://issues.apache.org/jira/browse/HBASE-11591
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.99.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.99.0
>
>         Attachments: HBASE-11591.patch, TestBulkload.java
>
>
> See discussion in HBASE-11339.
> When we have a case where there are same KVs in two files one produced by 
> flush/compaction and the other thro the bulk load.
> Both the files have some same kvs which matches even in timestamp.
> Steps:
> Add some rows with a specific timestamp and flush the same.  
> Bulk load a file with the same data.. Enusre that "assign seqnum" property is 
> set.
> The bulk load should use HFileOutputFormat2 (or ensure that we write the 
> bulk_time_output key).
> This would ensure that the bulk loaded file has the highest seq num.
> Assume the cell in the flushed/compacted store file is 
> row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
> row1,cf,cq,ts1,value2 
> (There are no parallel scans).
> Issue a scan on the table in 0.96. The retrieved value is 
> row1,cf1,cq,ts1,value2
> But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
> This is a behaviour change.  This is because of this code 
> {code}
>     public int compare(KeyValueScanner left, KeyValueScanner right) {
>       int comparison = compare(left.peek(), right.peek());
>       if (comparison != 0) {
>         return comparison;
>       } else {
>         // Since both the keys are exactly the same, we break the tie in favor
>         // of the key which came latest.
>         long leftSequenceID = left.getSequenceID();
>         long rightSequenceID = right.getSequenceID();
>         if (leftSequenceID > rightSequenceID) {
>           return -1;
>         } else if (leftSequenceID < rightSequenceID) {
>           return 1;
>         } else {
>           return 0;
>         }
>       }
>     }
> {code}
> Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
> the comparison will happen from the else condition .  Where the seq id of the 
> bulk loaded file is greater and would sort out first ensuring that the scan 
> happens from that bulk loaded file.
> In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
> mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
> the cell in the flushed/compacted file.  Which means though we know the 
> lateset file is the bulk loaded file we don't scan the data.
> Seems to be a behaviour change.  Will check on other corner cases also but we 
> are trying to know the behaviour of bulk load because we are evaluating if it 
> can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file

Reply via email to