[jira] [Updated] (HBASE-4125) WALInputFormat

stack (JIRA) Fri, 22 Jul 2011 09:44:24 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-4125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


stack updated HBASE-4125:
-------------------------

    Description: 
A coworker suggested doing an hbase backup that was based on WAL logs.  The 
details still need to be worked out but heres' a couple of notes:

+ Backup would not require our running some additional process with attendant 
cpu burn and i/o loading over cluster that is to be backed-up.  The WALs have 
already been written.
+ WALs currently are not compressed.  We could keep them compressed in backup 
store (this inputformat should take compressed and non-compressed WALs).
+ Hard part is figuring some global sequenceid, or set of sequenceids, from 
which to start replaying edits.  I'd imagine that you'd want to replay a backup 
from some particular point (It could be a 'date' for first version but this is 
a little sloppy especially around counters).  
+ MapReduce jobs replaying WALs could be scoped to a table or even to a region 
(though, we'd be looking at lots of edits if replaying all WALs from a cluster 
-- perhaps we need to dump some metadata when we close WALs; e.g. the regions 
that have edits in a particular WAL)

This input format is needed whether we do backup or not for replay of logs that 
may have been moved aside in an emergency getting a cluster off the ground 
again.

We should have a script that can use this input format to replay single digit 
numbers of WALs w/o resort to mapreduce too.

> WALInputFormat
> --------------
>
>                 Key: HBASE-4125
>                 URL: https://issues.apache.org/jira/browse/HBASE-4125
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>
> A coworker suggested doing an hbase backup that was based on WAL logs.  The 
> details still need to be worked out but heres' a couple of notes:
> + Backup would not require our running some additional process with attendant 
> cpu burn and i/o loading over cluster that is to be backed-up.  The WALs have 
> already been written.
> + WALs currently are not compressed.  We could keep them compressed in backup 
> store (this inputformat should take compressed and non-compressed WALs).
> + Hard part is figuring some global sequenceid, or set of sequenceids, from 
> which to start replaying edits.  I'd imagine that you'd want to replay a 
> backup from some particular point (It could be a 'date' for first version but 
> this is a little sloppy especially around counters).  
> + MapReduce jobs replaying WALs could be scoped to a table or even to a 
> region (though, we'd be looking at lots of edits if replaying all WALs from a 
> cluster -- perhaps we need to dump some metadata when we close WALs; e.g. the 
> regions that have edits in a particular WAL)
> This input format is needed whether we do backup or not for replay of logs 
> that may have been moved aside in an emergency getting a cluster off the 
> ground again.
> We should have a script that can use this input format to replay single digit 
> numbers of WALs w/o resort to mapreduce too.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4125) WALInputFormat

Reply via email to