[ 
https://issues.apache.org/jira/browse/HBASE-29716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18040452#comment-18040452
 ] 

Kodey Converse commented on HBASE-29716:
----------------------------------------

I have a patch available [here|https://github.com/apache/hbase/pull/7480]. 
After digging more into the incremental backup restore process, I don't believe 
this is a problem there because we bulkload the incremental HFiles in order. So 
the only place this is a problem is when using tooling such as the 
ClientSideRegionScanner.


> Incremental backup does not properly preserve sequence IDs
> ----------------------------------------------------------
>
>                 Key: HBASE-29716
>                 URL: https://issues.apache.org/jira/browse/HBASE-29716
>             Project: HBase
>          Issue Type: Bug
>          Components: backup&restore
>    Affects Versions: 3.0.0, 2.5.13, 2.6.5
>            Reporter: Kodey Converse
>            Priority: Minor
>              Labels: pull-request-available
>
> When an incremental backup is taken, WAL files are re-written as HFiles using 
> the WAL player. These HFiles are not formatted properly, and the sequence IDs 
> for cells (which are required for correctness) are ignored by the 
> RegionScanner.
> This is a follow up to HBASE-27649; that fix plumbed sequence IDs from the 
> WAL to the HFiles generated by WALPlayer. However, the HFiles generated by 
> WALPlayer are marked to be bulk loaded [by metadata on the 
> HFile|https://github.com/apache/hbase/blob/b8d803c0f1156219cc965e4c749e7ab7c9a65f31/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java#L461],
>  and RegionScanner [will reset cell-level sequence 
> IDs|https://github.com/apache/hbase/blob/b8d803c0f1156219cc965e4c749e7ab7c9a65f31/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStoreFile.java#L427-L450]
>  for HFiles with this metadata, instead relying on the sequence ID generated 
> at time of bulkload (which won't ever happen for these HFiles intended for 
> incremental backups).
> The result is that cell versions that have been overwritten (and therefore 
> rely on sequence IDs for correctness) will return an incorrect value when 
> read by HBase or by tooling such as the ClientSideRegionScanner. Instead, I 
> believe the cell value that is returned will be decided based on [sorting the 
> HFiles by their 
> size|https://github.com/apache/hbase/blob/b8d803c0f1156219cc965e4c749e7ab7c9a65f31/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileComparators.java#L36-L39].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to