[
https://issues.apache.org/jira/browse/HBASE-15227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426451#comment-16426451
]
stack commented on HBASE-15227:
-------------------------------
What is 'Done'? Where is it described?
> HBase Backup Phase 3: Fault tolerance (client/server) support
> -------------------------------------------------------------
>
> Key: HBASE-15227
> URL: https://issues.apache.org/jira/browse/HBASE-15227
> Project: HBase
> Issue Type: Task
> Reporter: Vladimir Rodionov
> Assignee: Vladimir Rodionov
> Priority: Major
> Labels: backup
> Attachments: HBASE-15227-v3.patch, HBASE-15277-v1.patch
>
>
> System must be tolerant to faults:
> # Backup operations MUST be atomic (no partial completion state in the backup
> system table)
> # Process must detect any type of failures which can result in a data loss
> (partial backup or partial restore)
> # Proper system table state restore and cleanup must be done in case of a
> failure
> # Additional utility to repair backup system table and corresponding file
> system cleanup must be implemented
> h3. Backup
> h4. General FT framework implementation
> Before actual backup operation starts, snapshot of a backup system table is
> taken and system table is updated with *ACTIVE_SNAPSHOT* flag. The flag will
> be removed upon backup completion.
> In case of *any* server-side failures, client catches errors/exceptions and
> handles them:
> # Cleans up backup destination (removes partial backup data)
> # Cleans up any temporary data
> # Deletes any active snapshots of a tables being backed up (during full
> backup we snapshot tables)
> # Restores backup system table from snapshot
> # Deletes backup system table snapshot (we read snapshot name from backup
> system table before)
> In case of *any* client-side failures:
> Before any backup or restore operation run we check backup system table on
> *ACTIVE_SNAPSHOT*, if flag is present, operation aborts with a message that
> backup repair tool (see below) must be run
> h4. Backup repair tool
> The command line tool *backup repair* which executes the following steps:
> # Reads info of a last failed backup session
> # Cleans up backup destination (removes partial backup data)
> # Cleans up any temporary data
> # Deletes any active snapshots of a tables being backed up (during full
> backup we snapshot tables)
> # Restores backup system table from snapshot
> # Deletes backup system table snapshot (we read snapshot name from backup
> system table before)
> h4. Detection of a partial loss of data
> h5. Full backup
> Export snapshot operation (?).
> We count files and check sizes before and after DistCp run
> h5. Incremental backup
> Conversion of WAL to HFiles, when WAL file is moved from active to archive
> directory. The code is in place to handle this situation
> During DistCp run (same as above)
> h3. Restore
> This operation does not modify backup system table and is idempotent. No
> special FT is required.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)