[ 
https://issues.apache.org/jira/browse/HBASE-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842572#comment-13842572
 ] 

Demai Ni commented on HBASE-9047:
---------------------------------

[~lhofhansl], many thanks for the review, and good point about the performance 
concern. As pointed out by you, it is on the error path so only impact a small 
portion of the code flow. But it still will be good if I can find a way to run 
loop only for the syncup tool. One idea(well, not the best interface design) is 
to see whether insider replicationsource, the code can recognize the host name 
made up by the syncup tool, and use it as indicator.
{code:title=ReplicationSyncUp:DummyServer|borderStyle=solid}
...
 hostname = System.currentTimeMillis() + ".SyncUpTool.replication.org";
...
{code}
I will give it a try. And, any suggestions are gratefully appreciated.

Demai




> Tool to handle finishing replication when the cluster is offline
> ----------------------------------------------------------------
>
>                 Key: HBASE-9047
>                 URL: https://issues.apache.org/jira/browse/HBASE-9047
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 0.96.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Demai Ni
>             Fix For: 0.98.0
>
>         Attachments: HBASE-9047-0.94.9-v0.PATCH, HBASE-9047-trunk-v0.patch, 
> HBASE-9047-trunk-v1.patch, HBASE-9047-trunk-v2.patch, 
> HBASE-9047-trunk-v3.patch, HBASE-9047-trunk-v4.patch, 
> HBASE-9047-trunk-v4.patch, HBASE-9047-trunk-v5.patch
>
>
> We're having a discussion on the mailing list about replicating the data on a 
> cluster that was shut down in an offline fashion. The motivation could be 
> that you don't want to bring HBase back up but still need that data on the 
> slave.
> So I have this idea of a tool that would be running on the master cluster 
> while it is down, although it could also run at any time. Basically it would 
> be able to read the replication state of each master region server, finish 
> replicating what's missing to all the slave, and then clear that state in 
> zookeeper.
> The code that handles replication does most of that already, see 
> ReplicationSourceManager and ReplicationSource. Basically when 
> ReplicationSourceManager.init() is called, it will check all the queues in ZK 
> and try to grab those that aren't attached to a region server. If the whole 
> cluster is down, it will grab all of them.
> The beautiful thing here is that you could start that tool on all your 
> machines and the load will be spread out, but that might not be a big concern 
> if replication wasn't lagging since it would take a few seconds to finish 
> replicating the missing data for each region server.
> I'm guessing when starting ReplicationSourceManager you'd give it a fake 
> region server ID, and you'd tell it not to start its own source.
> FWIW the main difference in how replication is handled between Apache's HBase 
> and Facebook's is that the latter is always done separately of HBase itself. 
> This jira isn't about doing that.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to