[ https://issues.apache.org/jira/browse/HBASE-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13766927#comment-13766927 ]
Jean-Daniel Cryans commented on HBASE-9047: ------------------------------------------- Hi Demai, First, a friendly reminder to follow the coding standards as listed in this section: http://hbase.apache.org/book.html#developing Also, when writing a tool in HBase, we try to use the Tool class. See the tail of this file for an example: https://github.com/apache/hbase/blob/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java Specific comments on the patch: If you look at our other tools, we rarely give detailed logs on what it is doing. I'd just skip that one: {noformat} + System.out.println("getting config..."); {noformat} Are you sure we should turn on replication here? Our tools rarely turn on features: {noformat} + conf.setBoolean(HConstants.REPLICATION_ENABLE_KEY, true); {noformat} Typo: {noformat} + zkw = new ZooKeeperWatcher(conf, "syncupReplcation" {noformat} I would skip the exclamation and all the dashes on the sides, same for the other messages. Also your message says "start" twice: {noformat} + System.out.println("------Start Replication Server start!------------"); {noformat} This sleep here I worry about. It's a very short time. Isn't there a better way to detect that we're done? In the description I was saying: " when starting ReplicationSourceManager you'd give it a fake region server ID, and you'd tell it not to start its own source.", so if the manager doesn't start a source and only recovers them, it seems to me that you could loop and wait until getSources() returns 0. {noformat} + Thread.sleep(30000); {noformat} Regarding your test case, I don't think we need two peer, also it should extend TestReplicationBase which will setup a lot of infrastructure for your. I know that you copied code from another class, but new classes shouldn't have this line: {noformat} + * Copyright 2011 The Apache Software Foundation {noformat} Your method names shouldn't have underscores and they don't seem to convey what's really happening in them. For example, "testSyncUpTool_AfterDelete" does that plus it shuts down and restarts clusters. And what about this? {noformat} + // utilitySource.restartHBaseCluster(1); {noformat} Thanks Demai. > Tool to handle finishing replication when the cluster is offline > ---------------------------------------------------------------- > > Key: HBASE-9047 > URL: https://issues.apache.org/jira/browse/HBASE-9047 > Project: HBase > Issue Type: New Feature > Reporter: Jean-Daniel Cryans > Assignee: Demai Ni > Attachments: HBASE-9047-0.94.9-v0.PATCH, HBASE-9047-trunk-v0.patch > > > We're having a discussion on the mailing list about replicating the data on a > cluster that was shut down in an offline fashion. The motivation could be > that you don't want to bring HBase back up but still need that data on the > slave. > So I have this idea of a tool that would be running on the master cluster > while it is down, although it could also run at any time. Basically it would > be able to read the replication state of each master region server, finish > replicating what's missing to all the slave, and then clear that state in > zookeeper. > The code that handles replication does most of that already, see > ReplicationSourceManager and ReplicationSource. Basically when > ReplicationSourceManager.init() is called, it will check all the queues in ZK > and try to grab those that aren't attached to a region server. If the whole > cluster is down, it will grab all of them. > The beautiful thing here is that you could start that tool on all your > machines and the load will be spread out, but that might not be a big concern > if replication wasn't lagging since it would take a few seconds to finish > replicating the missing data for each region server. > I'm guessing when starting ReplicationSourceManager you'd give it a fake > region server ID, and you'd tell it not to start its own source. > FWIW the main difference in how replication is handled between Apache's HBase > and Facebook's is that the latter is always done separately of HBase itself. > This jira isn't about doing that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira