Vladimir Rodionov created HBASE-14142:
-----------------------------------------
Summary: HBase Backup/Restore Phase 2: Cells deduplication during
backup
Key: HBASE-14142
URL: https://issues.apache.org/jira/browse/HBASE-14142
Project: HBase
Issue Type: New Feature
Reporter: Vladimir Rodionov
Assignee: Vladimir Rodionov
As since we do not record last backed up sequence ids (MVCC) and do not restore
up to that sequence id - that is kind of tricky, there will be some duplicates
of KVs in store files after first incremental restore after full backup. These
duplicates are result of how we do full backup and first incremental backup
after full one. During full backup we perform distributed log roll and record,
for every RS, last WAL timestamp, then we do snapshot. The next WAL after
recorded one will make it into a next incremental backup set, but it will
contains some edits (puts, deletes) which have been recorded by a previous
snapshot. During restore, we, first, restore snapshot, then we will re-play
WALs and this operation can create some duplicates of KVs in different store
files.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)