[ 
https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151595#comment-13151595
 ] 

Karthik Ranganathan commented on HBASE-4655:
--------------------------------------------

<< For '...incremental backups at the Stage 1 (RBU) level', won't the time 
between step between b and d be 'large' and during the copy time, the list of 
files could change on you; i.e. when you go to copy a file, it maybe have been 
removed because it'd been compacted. What do you do in this case? (Your list 
may not included the compacted file)? >>

We look for the deleted files in .Trash and reclaim. If they are not present, 
we fail the backup for the region. The backup job runs in loops - the first 
loop starts out with all regions. The failed regions are output and the second 
loop works only on the failed regions. The number of loops is configurable - we 
have defaulted at 5.


<< For "a.The backups rely on the clocks across the various region-servers for 
determining the point in time to which the edits are re-played", so, say a 
server is lagging the others by a good bit? When replaying the edits, you'd 
replay edits from when this lagging server said the backup began? >>

No, right now we just subtract a configurable amount of time (say 5 mins) to 
the start time of the MR job to keep things simple. We could totally do what 
you say as an enhancement.

<< How will you know which hlogs to replay? You'll open it and look at first 
and last edits in the file? Or should we write out metadata files for hlogs? Or 
is it enough relying on hdfs modtime? >>

The hlog files are of the format hlog.TIMESTAMP, TIMESTAMP is time when log is 
created. We look at this time to determine the file set. We need all files 
where TIMESTAMP > start time and TIMESTAMP < finish time. We need the latest 
file where TIMESTAMP < start time.

                
> Document architecture of backups
> --------------------------------
>
>                 Key: HBASE-4655
>                 URL: https://issues.apache.org/jira/browse/HBASE-4655
>             Project: HBase
>          Issue Type: Sub-task
>          Components: documentation, regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
>         Attachments: HBase Backups Architecture.docx
>
>
> Basic idea behind the backup architecture for HBase

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to