huaxiang sun commented on HBASE-16578:

Hi [~jingcheng...@intel.com], thanks for the reply. I did not do enough 
thinking yesterday.
The case I described is invalid as you mentioned that the compacted new 
reference file will get a bigger seqId.

You patch looks good to me so + 1 from me.

Looking through the code, I found that it is possible for the following 
sequence which could cause an issue. 

1. put mob cell r1, flush, it will create ref1 and mobFile1.
2. put mob cell r2, flush, it will create ref2 and mobFile2.
3. put normal cell r3, do not flush.
4. mob compact, it will flush r3 to hfile1 and create a new reference file.
   In this case, the maxSeqId in hfile1 is same as the seqId in the new 
reference file, let's say it is 10
5. Since in step 4, flush happens before bulkload hfile. After flush, 
compaction may kick in and compacts ref1, ref2, hfile1 into hfile2 (with 
maxSeqId to be 10).
6. bulkloaded hfile finishes and it creates *_seqId_10_.
7. In this case, references  in hfile2 and *_seqId_10 may mess up.

I think we need to change the following line:
 it needs to be applied to mob bulkloaded file as well to avoid the case.

> Mob data loss after mob compaction and normal compcation
> --------------------------------------------------------
>                 Key: HBASE-16578
>                 URL: https://issues.apache.org/jira/browse/HBASE-16578
>             Project: HBase
>          Issue Type: Bug
>          Components: mob
>    Affects Versions: 2.0.0
>            Reporter: huaxiang sun
>            Assignee: Jingcheng Du
>         Attachments: HBASE-16578-V2.patch, HBASE-16578.patch, 
> TestMobCompaction.java, TestMobCompaction.java
> StoreFileScanners on MOB cells rely on the scannerOrder to find the latest 
> cells after mob compaction. The value of scannerOrder is assigned by the 
> order of maxSeqId of StoreFile, and this maxSeqId is valued only after the 
> reader of the StoreFile is created.
> In {{Compactor.compact}}, the compacted store files are cloned and their 
> readers are not created. And in {{StoreFileScanner.getScannersForStoreFiles}} 
> the StoreFiles are sorted before the readers are created and at that time the 
> maxSeqId for each file is -1 (the default value). This will lead  to a chaos 
> in scanners in the following normal compaction. Some older cells might be 
> chosen during the normal compaction.
> We need to create readers either before the sorting in the method 
> {{StoreFileScanner.getScannersForStoreFiles}}, or create readers just after 
> the store files are cloned in {{Compactor.compact}}.

This message was sent by Atlassian JIRA

Reply via email to