[jira] [Commented] (HBASE-9047) Tool to handle finishing replication when the cluster is offline

Jean-Daniel Cryans (JIRA) Fri, 13 Sep 2013 13:48:04 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13766927#comment-13766927
 ]


Jean-Daniel Cryans commented on HBASE-9047:
-------------------------------------------

Hi Demai,

First, a friendly reminder to follow the coding standards as listed in this 
section: http://hbase.apache.org/book.html#developing

Also, when writing a tool in HBase, we try to use the Tool class. See the tail 
of this file for an example: 
https://github.com/apache/hbase/blob/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/HLogPerformanceEvaluation.java

Specific comments on the patch:

If you look at our other tools, we rarely give detailed logs on what it is 
doing. I'd just skip that one:

{noformat}
+               System.out.println("getting config...");
{noformat}

Are you sure we should turn on replication here? Our tools rarely turn on 
features:

{noformat}
+               conf.setBoolean(HConstants.REPLICATION_ENABLE_KEY, true);
{noformat}

Typo:

{noformat}
+               zkw = new ZooKeeperWatcher(conf, "syncupReplcation"
{noformat}

I would skip the exclamation and all the dashes on the sides, same for the 
other messages. Also your message says "start" twice:

{noformat}
+               System.out.println("------Start Replication Server 
start!------------");
{noformat}

This sleep here I worry about. It's a very short time. Isn't there a better way 
to detect that we're done? In the description I was saying: " when starting 
ReplicationSourceManager you'd give it a fake region server ID, and you'd tell 
it not to start its own source.", so if the manager doesn't start a source and 
only recovers them, it seems to me that you could loop and wait until 
getSources() returns 0.

{noformat}
+                       Thread.sleep(30000);
{noformat}

Regarding your test case, I don't think we need two peer, also it should extend 
TestReplicationBase which will setup a lot of infrastructure for your.

I know that you copied code from another class, but new classes shouldn't have 
this line:

{noformat}
+ * Copyright 2011 The Apache Software Foundation
{noformat}

Your method names shouldn't have underscores and they don't seem to convey 
what's really happening in them. For example, "testSyncUpTool_AfterDelete" does 
that plus it shuts down and restarts clusters. And what about this?

{noformat}
+               // utilitySource.restartHBaseCluster(1);
{noformat}

Thanks Demai.
                
> Tool to handle finishing replication when the cluster is offline
> ----------------------------------------------------------------
>
>                 Key: HBASE-9047
>                 URL: https://issues.apache.org/jira/browse/HBASE-9047
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Jean-Daniel Cryans
>            Assignee: Demai Ni
>         Attachments: HBASE-9047-0.94.9-v0.PATCH, HBASE-9047-trunk-v0.patch
>
>
> We're having a discussion on the mailing list about replicating the data on a 
> cluster that was shut down in an offline fashion. The motivation could be 
> that you don't want to bring HBase back up but still need that data on the 
> slave.
> So I have this idea of a tool that would be running on the master cluster 
> while it is down, although it could also run at any time. Basically it would 
> be able to read the replication state of each master region server, finish 
> replicating what's missing to all the slave, and then clear that state in 
> zookeeper.
> The code that handles replication does most of that already, see 
> ReplicationSourceManager and ReplicationSource. Basically when 
> ReplicationSourceManager.init() is called, it will check all the queues in ZK 
> and try to grab those that aren't attached to a region server. If the whole 
> cluster is down, it will grab all of them.
> The beautiful thing here is that you could start that tool on all your 
> machines and the load will be spread out, but that might not be a big concern 
> if replication wasn't lagging since it would take a few seconds to finish 
> replicating the missing data for each region server.
> I'm guessing when starting ReplicationSourceManager you'd give it a fake 
> region server ID, and you'd tell it not to start its own source.
> FWIW the main difference in how replication is handled between Apache's HBase 
> and Facebook's is that the latter is always done separately of HBase itself. 
> This jira isn't about doing that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-9047) Tool to handle finishing replication when the cluster is offline

Reply via email to