[ 
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16271903#comment-16271903
 ] 

stack commented on HBASE-17852:
-------------------------------

bq. Can you post your comments on RB, Stack?

Traditionally it is the contributors' job keeping the feedback in order and 
making sure it all addressed whether in JIRA or RB. Not addressing reviewers 
feedback or dropping it w/o comment is a total no-no.

bq. I have explained this many times already ... 

You don't answer the question. You just make asserts that we have to rollback 
w/o justification other than backups 'become corrupt' or a backup is only 
'safe' if it completes? Sounds like it needs to be 'transactional' but you 
don't describe the transaction (correct me if I'm wrong). I don't get why a 
completed backup can't just write a completion marker to the backup table. W/o 
it the backup is corrupt/incomplete and we just move on.

bq. Running backup repair automatically in case of a backup failure won't hurt 
and can be incorporated into cron job

Don't follow. An operator sets up a cron job. Works great for a few days. Then 
it stops. Operator needs to figure that he has to run a repair. Operator sets 
up two cron jobs? Or cron probes first for breakage...



> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental 
> backup)
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-17852
>                 URL: https://issues.apache.org/jira/browse/HBASE-17852
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>             Fix For: 2.0.0-beta-1
>
>         Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch, 
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch, 
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup 
> meta-table (backup system table). This procedure is lightweight because meta 
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning 
> up partial data in backup destination, followed by restoring backup 
> meta-table from a snapshot. 
> # When operation fails on a client side (abnormal termination, for example), 
> next time user will try create/merge/delete he(she) will see error message, 
> that system is in inconsistent state and repair is required, he(she) will 
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and 
> BackupObserver's) we introduce small table ONLY to keep listing of bulk 
> loaded files. All backup observers will work only with this new tables. The 
> reason: in case of a failure during backup create/delete/merge/restore, when 
> system performs automatic rollback, some data written by backup observers 
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about 
> consistency of this table, because bulk load is idempotent operation and can 
> be repeated after failure. Partially written data in second table does not 
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded 
> files) correspond to a files which have not been loaded yet successfully and, 
> hence - are not visible to the system 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to