[
https://issues.apache.org/jira/browse/HBASE-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831764#action_12831764
]
Jean-Daniel Cryans commented on HBASE-2197:
-------------------------------------------
Basically I see two ways of solving this problem:
- The master cluster itself reads the old edits and streams them to the slave
cluster using a distributed queue for all files to process.
- The master cluster does a distcp-like operation to ship all files to the
slave which will be responsible to replay them.
The second solution can be implemented very easily by making some operations
sequentially. First the user runs a jruby script on the master cluster to move
all files to the slave cluster sequentially, then run another script on the
slave cluster to apply all edits sequentially from all files.
An upgrade would be to copy (or move) the desired files on the master cluster
to a different folder and start a real distcp to the slave cluster. Then on the
slave cluster start a MapReduce job to process all those files. It's still
manual but much faster.
The next problem to tackle is how to switch from sending files to streaming the
edits. An easy solution would be to start the replication as soon as the
initial distcp of all the data is done and then start a second copying of all
the files to reapply. There's 2 issues with that:
- How do we make sure that when we start the second distcp that all the files
we need are in the HLog archive folder? There's a possibility that some are
still "active" in some region servers' .logs folder and that we may miss them.
A workaround would be to wait long enough to make sure everything is archived.
- Currently random reads have that problem where the can return older versions
of the data if a timestamp in the memstore is older than data in the store
files. That can easily be the case here because we stream newer edit before
applying the old ones. This problem applies only for those who wants to serve
ASAP out of the slave cluster although they will also have to deal with missing
data (which is in transfer). So this could even be a non-issue.
> Start replication from a point in time
> --------------------------------------
>
> Key: HBASE-2197
> URL: https://issues.apache.org/jira/browse/HBASE-2197
> Project: Hadoop HBase
> Issue Type: Sub-task
> Reporter: Jean-Daniel Cryans
> Fix For: 0.21.0
>
>
> One way to set up a cluster for replication is to distcp all files then start
> the replication. We need a way to make sure we don't miss any edits by being
> able to start a process that reads old log files from a defined point in time
> and send them to a specific slave cluster and then catch up with normal
> replication.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.