[ 
https://issues.apache.org/jira/browse/HBASE-15227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426451#comment-16426451
 ] 

stack commented on HBASE-15227:
-------------------------------

What is 'Done'? Where is it described?

> HBase Backup Phase 3: Fault tolerance (client/server) support
> -------------------------------------------------------------
>
>                 Key: HBASE-15227
>                 URL: https://issues.apache.org/jira/browse/HBASE-15227
>             Project: HBase
>          Issue Type: Task
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>            Priority: Major
>              Labels: backup
>         Attachments: HBASE-15227-v3.patch, HBASE-15277-v1.patch
>
>
> System must be tolerant to faults: 
> # Backup operations MUST be atomic (no partial completion state in the backup 
> system table)
> # Process must detect any type of failures which can result in a data loss 
> (partial backup or partial restore) 
> # Proper system table state restore and cleanup must be done in case of a 
> failure
> # Additional utility to repair backup system table and corresponding file 
> system cleanup must be implemented
> h3. Backup
> h4. General FT framework implementation 
> Before actual backup operation starts, snapshot of a backup system table is 
> taken and system table is updated with *ACTIVE_SNAPSHOT* flag. The flag will 
> be removed upon backup completion. 
> In case of *any* server-side failures, client catches errors/exceptions and 
> handles them:
> # Cleans up backup destination (removes partial backup data)
> # Cleans up any temporary data
> # Deletes  any active snapshots of a tables being backed up (during full 
> backup we snapshot tables)
> # Restores backup system table from snapshot
> # Deletes backup system table snapshot (we read snapshot name from backup 
> system table before)
> In case of *any* client-side failures:
> Before any backup or restore operation run we check backup system table on 
> *ACTIVE_SNAPSHOT*, if flag is present, operation aborts with a message that 
> backup repair tool (see below) must be run
> h4. Backup repair tool
> The command line tool *backup repair* which executes the following steps:
> # Reads info of a last failed backup session
> # Cleans up backup destination (removes partial backup data)
> # Cleans up any temporary data
> # Deletes  any active snapshots of a tables being backed up (during full 
> backup we snapshot tables)
> # Restores backup system table from snapshot
> # Deletes backup system table snapshot (we read snapshot name from backup 
> system table before)
> h4. Detection of a partial loss of data
> h5. Full backup  
> Export snapshot operation (?).
> We count files and check sizes before and after DistCp run
> h5. Incremental backup 
> Conversion of WAL to HFiles, when WAL file is moved from active to archive 
> directory. The code is in place to handle this situation
> During DistCp run (same as above)
> h3. Restore
> This operation does not modify backup system table and is idempotent. No 
> special FT is required.   
>  
>      



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to