[
https://issues.apache.org/jira/browse/HBASE-4125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-4125:
-------------------------
Description:
A coworker suggested doing an hbase backup that was based on WAL logs. The
details still need to be worked out but heres' a couple of notes:
+ Backup would not require our running some additional process with attendant
cpu burn and i/o loading over cluster that is to be backed-up. The WALs have
already been written.
+ WALs currently are not compressed. We could keep them compressed in backup
store (this inputformat should take compressed and non-compressed WALs).
+ Hard part is figuring some global sequenceid, or set of sequenceids, from
which to start replaying edits. I'd imagine that you'd want to replay a backup
from some particular point (It could be a 'date' for first version but this is
a little sloppy especially around counters).
+ MapReduce jobs replaying WALs could be scoped to a table or even to a region
(though, we'd be looking at lots of edits if replaying all WALs from a cluster
-- perhaps we need to dump some metadata when we close WALs; e.g. the regions
that have edits in a particular WAL)
This input format is needed whether we do backup or not for replay of logs that
may have been moved aside in an emergency getting a cluster off the ground
again.
We should have a script that can use this input format to replay single digit
numbers of WALs w/o resort to mapreduce too.
> WALInputFormat
> --------------
>
> Key: HBASE-4125
> URL: https://issues.apache.org/jira/browse/HBASE-4125
> Project: HBase
> Issue Type: New Feature
> Reporter: stack
>
> A coworker suggested doing an hbase backup that was based on WAL logs. The
> details still need to be worked out but heres' a couple of notes:
> + Backup would not require our running some additional process with attendant
> cpu burn and i/o loading over cluster that is to be backed-up. The WALs have
> already been written.
> + WALs currently are not compressed. We could keep them compressed in backup
> store (this inputformat should take compressed and non-compressed WALs).
> + Hard part is figuring some global sequenceid, or set of sequenceids, from
> which to start replaying edits. I'd imagine that you'd want to replay a
> backup from some particular point (It could be a 'date' for first version but
> this is a little sloppy especially around counters).
> + MapReduce jobs replaying WALs could be scoped to a table or even to a
> region (though, we'd be looking at lots of edits if replaying all WALs from a
> cluster -- perhaps we need to dump some metadata when we close WALs; e.g. the
> regions that have edits in a particular WAL)
> This input format is needed whether we do backup or not for replay of logs
> that may have been moved aside in an emergency getting a cluster off the
> ground again.
> We should have a script that can use this input format to replay single digit
> numbers of WALs w/o resort to mapreduce too.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira