Hi,

I am trying to understand in detail how HBase replication works.

First of all, I assume that it is required for replication to work correct that all edits are replayed on the replica HBase cluster in the same order as they were executed on the source HBase cluster. Correct?

If so, I am trying to understand how that is guaranteed.

I can see that this is trivially true by reading the edits in the HLog, and using that as a source for replication.

However, what if a region is moved to another region server. Can we not end up in the following sitation?

1) region A is originally hosted by region server X.
2) replication in region server X is replicating edits of region A. Say that it is lagging behind a bit, so it has a number of edits still to do.
3) region A is moved to region server Y.
4) edits for region A arrive on region server Y, and replication on region server Y starts replicating them 5) replication in region server X is still busy with some left over edits from region A, so these are replicated out of order

So the question really is whether there is a mechanism to prevent the replication source from reading edits in a HLog for a region that was meanwhile already moved to another region server.

It could be that it has something to do with log splitting and recovery, but I was under the assumption that HBase only splits logs in case of recovery and/or master restart, and not in case of region moves.

I hope somebody can shed some light on this topic.

Thanks,
Jan

Reply via email to