[
https://issues.apache.org/jira/browse/HBASE-4655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151595#comment-13151595
]
Karthik Ranganathan commented on HBASE-4655:
--------------------------------------------
<< For '...incremental backups at the Stage 1 (RBU) level', won't the time
between step between b and d be 'large' and during the copy time, the list of
files could change on you; i.e. when you go to copy a file, it maybe have been
removed because it'd been compacted. What do you do in this case? (Your list
may not included the compacted file)? >>
We look for the deleted files in .Trash and reclaim. If they are not present,
we fail the backup for the region. The backup job runs in loops - the first
loop starts out with all regions. The failed regions are output and the second
loop works only on the failed regions. The number of loops is configurable - we
have defaulted at 5.
<< For "a.The backups rely on the clocks across the various region-servers for
determining the point in time to which the edits are re-played", so, say a
server is lagging the others by a good bit? When replaying the edits, you'd
replay edits from when this lagging server said the backup began? >>
No, right now we just subtract a configurable amount of time (say 5 mins) to
the start time of the MR job to keep things simple. We could totally do what
you say as an enhancement.
<< How will you know which hlogs to replay? You'll open it and look at first
and last edits in the file? Or should we write out metadata files for hlogs? Or
is it enough relying on hdfs modtime? >>
The hlog files are of the format hlog.TIMESTAMP, TIMESTAMP is time when log is
created. We look at this time to determine the file set. We need all files
where TIMESTAMP > start time and TIMESTAMP < finish time. We need the latest
file where TIMESTAMP < start time.
> Document architecture of backups
> --------------------------------
>
> Key: HBASE-4655
> URL: https://issues.apache.org/jira/browse/HBASE-4655
> Project: HBase
> Issue Type: Sub-task
> Components: documentation, regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
> Attachments: HBase Backups Architecture.docx
>
>
> Basic idea behind the backup architecture for HBase
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira