[
https://issues.apache.org/jira/browse/HBASE-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16271978#comment-16271978
]
Vladimir Rodionov edited comment on HBASE-17852 at 11/30/17 1:27 AM:
---------------------------------------------------------------------
{quote}
You don't answer the question
{quote}
What question? What does "corrupt" mean? Why do I need to restore meta table? I
am afraid, I can't add anything else to my answers above.
{quote}
Don't follow. An operator sets up a cron job. Works great for a few days. Then
it stops. Operator needs to figure that he has to run a repair. Operator sets
up two cron jobs? Or cron probes first for breakage...
{quote}
Stops means fails. If cron job fails, operator will need to intervene, read
logs, manuals and figure out that repair is required. Not a big deal, imo. We
clearly log message, that repair tool has to be run. But for lazy operators I
will add auto-repir mode of execution (see above ticket).
Stack, can you be more technical and specific in your questions? The patch is
no 8 already. Do you have any code - related questions and comments? If yes,
then RB is the right place to put thm on.
was (Author: vrodionov):
{quote}
You don't answer the question
{quote}
What question? What does "corrupt" mean? Why do I need to restore meta table? I
am afraid, I can't add anything else to my answers above.
{quote}
Don't follow. An operator sets up a cron job. Works great for a few days. Then
it stops. Operator needs to figure that he has to run a repair. Operator sets
up two cron jobs? Or cron probes first for breakage...
{quote}
Stops means fails. If cron job fails, operator will need to intervene, read
logs, manuals and figure out that repair is required. Not a big deal, imo. We
clearly log message, that repair tool has to be run. But for lazy operators I
will add auto-repir mode of execution (see above ticket).
> Add Fault tolerance to HBASE-14417 (Support bulk loaded files in incremental
> backup)
> ------------------------------------------------------------------------------------
>
> Key: HBASE-17852
> URL: https://issues.apache.org/jira/browse/HBASE-17852
> Project: HBase
> Issue Type: Sub-task
> Reporter: Vladimir Rodionov
> Assignee: Vladimir Rodionov
> Fix For: 2.0.0-beta-1
>
> Attachments: HBASE-17852-v1.patch, HBASE-17852-v2.patch,
> HBASE-17852-v3.patch, HBASE-17852-v4.patch, HBASE-17852-v5.patch,
> HBASE-17852-v6.patch, HBASE-17852-v7.patch, HBASE-17852-v8.patch
>
>
> Design approach rollback-via-snapshot implemented in this ticket:
> # Before backup create/delete/merge starts we take a snapshot of the backup
> meta-table (backup system table). This procedure is lightweight because meta
> table is small, usually should fit a single region.
> # When operation fails on a server side, we handle this failure by cleaning
> up partial data in backup destination, followed by restoring backup
> meta-table from a snapshot.
> # When operation fails on a client side (abnormal termination, for example),
> next time user will try create/merge/delete he(she) will see error message,
> that system is in inconsistent state and repair is required, he(she) will
> need to run backup repair tool.
> # To avoid multiple writers to the backup system table (backup client and
> BackupObserver's) we introduce small table ONLY to keep listing of bulk
> loaded files. All backup observers will work only with this new tables. The
> reason: in case of a failure during backup create/delete/merge/restore, when
> system performs automatic rollback, some data written by backup observers
> during failed operation may be lost. This is what we try to avoid.
> # Second table keeps only bulk load related references. We do not care about
> consistency of this table, because bulk load is idempotent operation and can
> be repeated after failure. Partially written data in second table does not
> affect on BackupHFileCleaner plugin, because this data (list of bulk loaded
> files) correspond to a files which have not been loaded yet successfully and,
> hence - are not visible to the system
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)