[
https://issues.apache.org/jira/browse/HBASE-29003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911408#comment-17911408
]
Dieter De Paepe commented on HBASE-29003:
-----------------------------------------
I agree having a "table instance id" would be ideal. It would allow the backup
system to check whether it is dealing with the same table as the previous
backup. In the lack of this, using an observer to register table re-creation
(as I do in the PR) is possible. Only downside of using an observer is that it
needs to be configured by the user (and the fact that existing users might
forget this).
An advantage of using an observer which does bookkeeping in the backup system
table, is that the same bookkeeping system can be used by HBASE-28084 (= it
should be impossible to create an incremental backup if you delete the latest
incremental backup).
> Proper bulk load tracking
> -------------------------
>
> Key: HBASE-29003
> URL: https://issues.apache.org/jira/browse/HBASE-29003
> Project: HBase
> Issue Type: Bug
> Components: backup&restore
> Affects Versions: 2.6.0, 3.0.0, 4.0.0-alpha-1, 2.6.1
> Reporter: Dieter De Paepe
> Assignee: Dieter De Paepe
> Priority: Critical
> Labels: pull-request-available
>
> As part of the incremental backup mechanism, HBase tracks which files were
> bulk-loaded (since the last backup).
> This data is stored in the backup:system_bulk table. Entries are added when a
> bulk load occurs through the BackupObserver co-processor. Entries are deleted
> when an incremental backup is completed.
> There are 2 flaws in this implementation:
> 1) Performing a full backup should clear the list. Imagine following scenario:
> * Create a full backup B1 of table T.
> * Perform a bulk load L1.
> * Take a full backup B2 of table T.
> * Take an incremental backup of table T.
> ** The data stored for this backup will include L1, even though that data is
> already present due to B2. (This is an inefficiency, not a real error.)
> 2) Performing a table deletion should clear the list of bulk-loaded files.
> Imagine the following scenario:
> * Create a full backup of table T.
> * Perform a bulk-load B1 into T.
> * Disable, delete and recreate T.
> * Create an incremental backup (taking a full backup instead is similar to
> the previous case)
> ** The backup will contain B1, even though it doesn't belong there.
>
> Note that this *can also cause backup corruption* after a backup restores
> (which is how we encountered this issue), which makes this problem less niche
> than the above scenarios indicate. Backup restore effectively uses bulk loads
> as well, so users could run into following scenario, where they are trying to
> restore data corruption:
> * (create an environment with backup B1 (time t), backup B2 (time t2 > t).
> * Users notice data corruption, and restore backup B2 after clearing the
> table
> * Users notice data corruption is already present, and restore backup B1
> after clearing the table.
> * Users find data corruption solved, and resume regular backup cycle from
> here on.
> ** Any incremental backup taken will contain the (possible corrupt) data
> from B2 (due to the restore operation using bulk operations). The backups
> will be affected until a FULL backup is taken after an incremental backup (so
> this could span a period of weeks assuming bi-weekly/monthly full backups).
> A minimal reproduction example:
> {code:java}
> echo "create 'table', 'cf'; put 'table', 'row1', 'cf:a', 'value1',
> 1400523142819" | bin/hbase shell -n
> bin/hbase backup create full file:/tmp/backup -t table -i
> echo "disable 'table'; drop 'table'" | bin/hbase shell -n
> # Empty
> echo "scan 'backup:system_bulk'" | bin/hbase shell -n
> bin/hbase restore file:/tmp/backup backup_1732787972748 -t "table"
> # 1 entry
> echo "scan 'backup:system_bulk'" | bin/hbase shell -n
> echo "disable 'table'; drop 'table'" | bin/hbase shell -n
> # 1 entry
> echo "scan 'backup:system_bulk'" | bin/hbase shell -necho "create 'table',
> 'cf'; put 'table', 'row1', 'cf:b', 'value2', 1400523142819" | bin/hbase shell
> -n
> bin/hbase backup create full file:/tmp/backup -t table -i
> echo "scan 'backup:system_bulk'" | bin/hbase shell -n
> echo "put 'table', 'row1', 'cf:b', 'value3', 1400523142819" | bin/hbase shell
> -n
> bin/hbase backup create incremental file:/tmp/backup -t table -i
> # Emtpy
> echo "scan 'backup:system_bulk'" | bin/hbase shell -n
> echo "disable 'table'; drop 'table'" | bin/hbase shell -n
> bin/hbase restore file:/tmp/backup backup_1732788098586 -t "table"
> # Will contain "value1" (unexpected) and "value3" (expected)
> echo "scan 'table'" | bin/hbase shell -n
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)