[ 
https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242067#comment-13242067
 ] 

Lars Hofhansl commented on HBASE-5604:
--------------------------------------

I have something basic working now. Needs a bunch of polishing, but it works in 
principle.

Right now I have this hooked up with Import. I.e. Import can optionally import 
from a directory that contains HLog files (at any depth).

Does it make sense to keep this with Import? The advantage is that there is one 
place for Importing stuff, and things like CF mapping exist already. On the 
other hand Import is becoming a bit overloaded with options now and something 
PlayLogs might be better along ImportTsv and Import. Can still work out how to 
share some of the code.

Also from the name of an HLog file I can tell when the first entry was written 
to it, but not what the last entry is. Is there a way to find this out? It 
would allow me to filter HLog files of it is known that they do not fall in the 
requested time range.

                
> HLog replay tool that generates HFiles for use by LoadIncrementalHFiles.
> ------------------------------------------------------------------------
>
>                 Key: HBASE-5604
>                 URL: https://issues.apache.org/jira/browse/HBASE-5604
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Lars Hofhansl
>
> Just an idea I had. Might be useful for restore of a backup using the HLogs.
> This could an M/R (with a mapper per HLog file).
> The tool would get a timerange and a (set of) table(s). We'd pick the right 
> HLogs based on time before the M/R job is started and then have a mapper per 
> HLog file.
> The mapper would then go through the HLog, filter all WALEdits that didn't 
> fit into the time range or are not any of the tables and then uses 
> HFileOutputFormat to generate HFiles.
> Would need to indicate the splits we want, probably from a live table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to