[jira] [Commented] (HBASE-15227) HBase Backup Phase 3: Fault tolerance (client/server) support

Josh Elser (JIRA) Fri, 06 Apr 2018 14:32:34 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-15227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429033#comment-16429033
 ]


Josh Elser commented on HBASE-15227:
------------------------------------

{quote}Where is it described?
{quote}
Agreed – high-level release notes on this parent Jira issue should be present 
at a minimum.

We have some limitations of the implementation that are present at 
[http://hbase.apache.org/book.html#backuprestore]. I would think that a 
sub-section which covers how administrators are expected to interact with the 
system would be best. The existing docs already cover the "why things are how 
they are", but making sure admins are heading in the right direction would be 
beneficial (e.g. high-level that keeps folks from trying to use the feature in 
a way it was not designed to be used).

> HBase Backup Phase 3: Fault tolerance (client/server) support
> -------------------------------------------------------------
>
>                 Key: HBASE-15227
>                 URL: https://issues.apache.org/jira/browse/HBASE-15227
>             Project: HBase
>          Issue Type: Task
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>            Priority: Major
>              Labels: backup
>         Attachments: HBASE-15227-v3.patch, HBASE-15277-v1.patch
>
>
> System must be tolerant to faults: 
> # Backup operations MUST be atomic (no partial completion state in the backup 
> system table)
> # Process must detect any type of failures which can result in a data loss 
> (partial backup or partial restore) 
> # Proper system table state restore and cleanup must be done in case of a 
> failure
> # Additional utility to repair backup system table and corresponding file 
> system cleanup must be implemented
> h3. Backup
> h4. General FT framework implementation 
> Before actual backup operation starts, snapshot of a backup system table is 
> taken and system table is updated with *ACTIVE_SNAPSHOT* flag. The flag will 
> be removed upon backup completion. 
> In case of *any* server-side failures, client catches errors/exceptions and 
> handles them:
> # Cleans up backup destination (removes partial backup data)
> # Cleans up any temporary data
> # Deletes  any active snapshots of a tables being backed up (during full 
> backup we snapshot tables)
> # Restores backup system table from snapshot
> # Deletes backup system table snapshot (we read snapshot name from backup 
> system table before)
> In case of *any* client-side failures:
> Before any backup or restore operation run we check backup system table on 
> *ACTIVE_SNAPSHOT*, if flag is present, operation aborts with a message that 
> backup repair tool (see below) must be run
> h4. Backup repair tool
> The command line tool *backup repair* which executes the following steps:
> # Reads info of a last failed backup session
> # Cleans up backup destination (removes partial backup data)
> # Cleans up any temporary data
> # Deletes  any active snapshots of a tables being backed up (during full 
> backup we snapshot tables)
> # Restores backup system table from snapshot
> # Deletes backup system table snapshot (we read snapshot name from backup 
> system table before)
> h4. Detection of a partial loss of data
> h5. Full backup  
> Export snapshot operation (?).
> We count files and check sizes before and after DistCp run
> h5. Incremental backup 
> Conversion of WAL to HFiles, when WAL file is moved from active to archive 
> directory. The code is in place to handle this situation
> During DistCp run (same as above)
> h3. Restore
> This operation does not modify backup system table and is idempotent. No 
> special FT is required.   
>  
>      



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-15227) HBase Backup Phase 3: Fault tolerance (client/server) support

Reply via email to