[
https://issues.apache.org/jira/browse/HBASE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166670#comment-13166670
]
Lars Hofhansl commented on HBASE-4662:
--------------------------------------
Thanks for writing this up!
I had a few questions:
* How do you currently back up your HLogs? Do you have a process that watches
.[old]logs and copies/archives every new file appearing there?
* How do you back up the HFiles? Do you issue a flush before you do this?
* That tool you mention in D. Is not completebulkload, right? Will that tool
deal with replaying the logs you placed in B.5.?
* I found that distributed log splitting relies on region names in the HLog in
order to do the splitting. If any region splits happened after the HLog was
written, or this is a new table, the replay will fail for regions that do no
longer exist. Do you plan to change the distributed log splitter to deal with
this? (It would need to map the rowkeys back to the now-current set of regions.)
* HLogs have entries of many tables. In the approach above whatever replays the
log would need to only replay those entries pertaining to the HFiles copied
over, right?
Thanks again...
> Replay the required hlog edits to make the backup preserve row atomicity.
> -------------------------------------------------------------------------
>
> Key: HBASE-4662
> URL: https://issues.apache.org/jira/browse/HBASE-4662
> Project: HBase
> Issue Type: Sub-task
> Components: documentation, regionserver
> Reporter: Karthik Ranganathan
> Assignee: Karthik Ranganathan
>
> The algorithm is as follows:
> A. For HFiles:
> 1. Need to track t1,t2 for each backup (start and end times of the backup)
> 2. For point in time restore to time t, pick a HFile snapshot which has t2 < t
> 3. Copy HFile snapshot to a temp location - HTABLE_RESTORE_t
> B. For HLogs:
> for each regionserver do
> for .logs and .oldlogs do
> 1. log file is hlog.TIME
> 2. if (t > TIME and hlog.TIME is open for write) fail restore for t
> 3. Pick the latest HLog whose create time is < t1
> 4. Pick all HLogs whose create time is > t1 and <= t2
> 5. Copy hlogs to the right structures inside HTABLE_RESTORE_t
> C. Split logs
> 1. Enhance HLog.splitLog to take timestamp t
> 2. Enhance distributed log split tool to pass HTABLE_RESTORE_t, so that log
> split is picked up and put in the right location
> 3. Enhance distributed log split tool to pass t so that all edits till t are
> included and others ignored
> D. Import the directory into the running HBase with META entries, etc (this
> already exists)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira