[ 
https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235640#comment-13235640
 ] 

Lars Hofhansl commented on HBASE-5604:
--------------------------------------

This is definitely not to replace distributed log splitting as result of a 
crash but for dealing with accidentally deleted data.

Relational databases usually support point in time recovery from a backup by 
taking periodic baseline backups and archiving the WAL. Upon recovery the base 
backup closest before the PIT is used and then the logs are replayed to the 
desired to PIT.

Since HBase has not snapshotting, yet, any backup solution will necessary lead 
to an inconsistent copy that can only be made consistent by replaying some of 
the logs (to cover the duration the backup took).

Log replay in HBase is either slow (standalone client using the highlevel API) 
or can only be used for crash recovery (log splitting, because the logs are 
split by region names, wouldn't be able to deal with split regions).

This would take the part of log replaying for a thje log replay part in a PITR 
scenario.
Look at this as an M/R version of HBASE-3752.
                
> HLog replay tool that generates HFiles for use by LoadIncrementalHFiles.
> ------------------------------------------------------------------------
>
>                 Key: HBASE-5604
>                 URL: https://issues.apache.org/jira/browse/HBASE-5604
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Lars Hofhansl
>
> Just an idea I had. Might be useful for restore of a backup using the HLogs.
> This could an M/R (with a mapper per HLog file).
> The tool would get a timerange and a (set of) table(s). We'd pick the right 
> HLogs based on time before the M/R job is started and then have a mapper per 
> HLog file.
> The mapper would then go through the HLog, filter all WALEdits that didn't 
> fit into the time range or are not any of the tables and then uses 
> HFileOutputFormat to generate HFiles.
> Would need to indicate the splits we want, probably from a live table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to