[
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105185#comment-14105185
]
ramkrishna.s.vasudevan commented on HBASE-11591:
------------------------------------------------
[~jeffreyz]
First of all thanks a lot for taking a look at this issue and providing a patch.
I debugged this issue with 2 cases - with my patch and with Jeffrey's patch.
Observed the following things
-> The testcase that was added as part of this testcase is for same KVs in the
Store file and the bulk loaded Store file and it was specific for that issue.
-> After Jeffrey's first patch few testcases failed and those were not having
this case of same KVs. All were different row keys. Things were working fine
because it was purely based on row key comparison and no mvcc would have even
come into it. (I mean even before the patch). I think that exposed some of the
bug that was inside.
-> Another important observation is that when we are scanning the KVs in the
bulk loaded file (atleast those created new LoadIncrementalHFile cases) there
is no mvcc info added to the metadata also. So
{code}
return new StoreFileScanner(this,
getScanner(cacheBlocks, pread, isCompaction),
!isCompaction, reader.hasMVCCInfo(), readPt);
{code}
will say has mvccInfo as false and hence skipNewerThanReadPoint() would never
be called because
{code}
if (hasMVCCInfo)
skipKVsNewerThanReadpoint();
{code}
So before the patch too, the scenario in the failed test case
TestWALReplay.testCompactedBulkLoadedFiles() though our seqID for the bulk
loaded files were 5, and the read point for all the scanners created in the
test case was 4 - we were trying to read the bulk loaded file also. But we
were not able to skip the kvs in the bulk loaded file just because hasMvccInfo
was false. So the tests were passing.
Ok so what happens after Jeffrey's patch(the first patch without HREgion's
change) is that on seeing any bulk loaded file we just assign the file's seqid
to the KV's seqId. And so after compaction still the read pt is not modified
to the latest (ie 5) and hence all the KVs that were written to the compacted
file from the bulk loaded files were missing.
I think the change in HRegion.java to set the write Sequence number is a bug
fix? I still feel the patch would cause issue in the following scenario after
the above changes
-> Assume a scan started and the read point is 20 at that time
-> Bulk load is just getting completed and the scanner heap gets reset. The
new bulk loaded file with seqId 22 (for eg) gets added now to the scanner heap.
But remember that the read point is still 20.
-> After this change we would just set the bulk load file's seqId to all its
KVs which is 22.
-> Because there is no mvcc info in this bulk loaded file the scan would not be
able to skipTheKvsWithNewerReadPt() and hence the scan would still see the Kvs
with 22 as the seqId though the intention is to see only KVs with seqID 20.
I may be wrong. Am I missing something here? I may be wrong because for bulk
loaded files because there is no mvcc we are allowed to read anything in that
irrespective of the read pt?
> Scanner fails to retrieve KV from bulk loaded file with highest sequence id
> than the cell's mvcc in a non-bulk loaded file
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-11591
> URL: https://issues.apache.org/jira/browse/HBASE-11591
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.99.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Priority: Critical
> Fix For: 0.99.0
>
> Attachments: HBASE-11591.patch, HBASE-11591_1.patch,
> HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java,
> hbase-11591-03-02.patch, hbase-11591-03-jeff.patch
>
>
> See discussion in HBASE-11339.
> When we have a case where there are same KVs in two files one produced by
> flush/compaction and the other thro the bulk load.
> Both the files have some same kvs which matches even in timestamp.
> Steps:
> Add some rows with a specific timestamp and flush the same.
> Bulk load a file with the same data.. Enusre that "assign seqnum" property is
> set.
> The bulk load should use HFileOutputFormat2 (or ensure that we write the
> bulk_time_output key).
> This would ensure that the bulk loaded file has the highest seq num.
> Assume the cell in the flushed/compacted store file is
> row1,cf,cq,ts1, value1 and the cell in the bulk loaded file is
> row1,cf,cq,ts1,value2
> (There are no parallel scans).
> Issue a scan on the table in 0.96. The retrieved value is
> row1,cf1,cq,ts1,value2
> But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1.
> This is a behaviour change. This is because of this code
> {code}
> public int compare(KeyValueScanner left, KeyValueScanner right) {
> int comparison = compare(left.peek(), right.peek());
> if (comparison != 0) {
> return comparison;
> } else {
> // Since both the keys are exactly the same, we break the tie in favor
> // of the key which came latest.
> long leftSequenceID = left.getSequenceID();
> long rightSequenceID = right.getSequenceID();
> if (leftSequenceID > rightSequenceID) {
> return -1;
> } else if (leftSequenceID < rightSequenceID) {
> return 1;
> } else {
> return 0;
> }
> }
> }
> {code}
> Here in 0.96 case the mvcc of the cell in both the files will have 0 and so
> the comparison will happen from the else condition . Where the seq id of the
> bulk loaded file is greater and would sort out first ensuring that the scan
> happens from that bulk loaded file.
> In case of 0.98+ as we are retaining the mvcc+seqid we are not making the
> mvcc as 0 (remains a non zero positive value). Hence the compare() sorts out
> the cell in the flushed/compacted file. Which means though we know the
> lateset file is the bulk loaded file we don't scan the data.
> Seems to be a behaviour change. Will check on other corner cases also but we
> are trying to know the behaviour of bulk load because we are evaluating if it
> can be used for MOB design.
--
This message was sent by Atlassian JIRA
(v6.2#6252)