Kodey Converse created HBASE-29716:
--------------------------------------
Summary: Incremental backup does not properly preserve sequence IDs
Key: HBASE-29716
URL: https://issues.apache.org/jira/browse/HBASE-29716
Project: HBase
Issue Type: Bug
Components: backup&restore
Affects Versions: 2.5.13, 3.0.0, 2.6.5
Reporter: Kodey Converse
When an incremental backup is taken, WAL files are re-written as HFiles using
the WAL player. These HFiles are not formatted properly, and the sequence IDs
for cells (which are required for correctness) are ignored by the RegionScanner.
This is a follow up to HBASE-27649; that fix plumbed sequence IDs from the WAL
to the HFiles generated by WALPlayer. However, the HFiles generated by
WALPlayer are marked to be bulk loaded [by metadata on the
HFile|https://github.com/apache/hbase/blob/b8d803c0f1156219cc965e4c749e7ab7c9a65f31/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat2.java#L461],
and RegionScanner [will reset cell-level sequence
IDs|https://github.com/apache/hbase/blob/b8d803c0f1156219cc965e4c749e7ab7c9a65f31/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStoreFile.java#L427-L450]
for HFiles with this metadata, instead relying on the sequence ID generated at
time of bulkload (which won't ever happen for these HFiles intended for
incremental backups).
The result is that cell versions that have been overwritten (and therefore rely
on sequence IDs for correctness) will return an incorrect value when read by
HBase or by tooling such as the ClientSideRegionScanner. Instead, I believe the
cell value that is returned will be decided based on [sorting the HFiles by
their
size|https://github.com/apache/hbase/blob/b8d803c0f1156219cc965e4c749e7ab7c9a65f31/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileComparators.java#L36-L39].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)