[jira] [Commented] (HBASE-11992) Backport HBASE-11367 (Pluggable replication endpoint) to 0.98

Andrew Purtell (JIRA) Tue, 16 Sep 2014 11:25:55 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-11992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135889#comment-14135889
 ]


Andrew Purtell commented on HBASE-11992:
----------------------------------------

Preliminary comments on the patch, or things that should be in it.

In hbase-client, the following classes don't have interface stability 
annotations but should all be marked Private: ReplicationFactory, 
ReplicationPeersZKImpl, ReplicationQueueInfo, ReplicationQueuesClient,  
ReplicationQueuesClientZKImpl , ReplicationQueuesZKImpl, 
ReplicationStateZKBase, ReplicationTrackerZKImpl. I don't think users should 
have expected these be other than internal implementation. Changes to these 
should be fine. 

We don't have known users referencing these types. I checked Phoenix. 

Remove this TODO? We're not going to do this in 0.98, right?

{code}
@@ -80,6 +86,8 @@ public class ReplicationAdmin implements Closeable {
       .toString(HConstants.REPLICATION_SCOPE_GLOBAL);
 
   private final HConnection connection;
+  // TODO: replication should be managed by master. All the classes except 
ReplicationAdmin should
+  // be moved to hbase-server. Resolve it in HBASE-11392.
   private final ReplicationQueuesClient replicationQueuesClient;
   private final ReplicationPeers replicationPeers;
{code}

Synchronize methods that call reconnect() in the new class 
HBaseReplicationEndpoint:

{code}
diff --git 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/HBaseReplicationEndpoint.java
 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/HBaseReplicationEndpoint.java
new file mode 100644
index 0000000..4b9a28f
--- /dev/null
+++ 
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/HBaseReplicationEndpoint.java
[...]
+  @Override
+  public UUID getPeerUUID() {
+    UUID peerUUID = null;
+    try {
+      peerUUID = ZKClusterId.getUUIDForCluster(zkw);
+    } catch (KeeperException ke) {
---> +      reconnect(ke);
+    }
+    return peerUUID;
+  }
[...]
+  public List<ServerName> getRegionServers() {
+    try {
+      setRegionServers(fetchSlavesAddresses(this.getZkw()));
+    } catch (KeeperException ke) {
+      if (LOG.isDebugEnabled()) {
+        LOG.debug("Fetch salves addresses failed.", ke);
+      }
---> +      reconnect(ke);
+    }
+    return regionServers;
+  }
[...]
{code}

The additional state in the ReplicationPeer znode pbuf introduces 
mixed-version-compatibility issues:

{code}
diff --git hbase-protocol/src/main/protobuf/ZooKeeper.proto 
hbase-protocol/src/main/protobuf/ZooKeeper.proto
index 37816da..598385c 100644
--- hbase-protocol/src/main/protobuf/ZooKeeper.proto
+++ hbase-protocol/src/main/protobuf/ZooKeeper.proto
@@ -119,6 +119,9 @@ message ReplicationPeer {
   // clusterkey is the concatenation of the slave cluster's
   // 
hbase.zookeeper.quorum:hbase.zookeeper.property.clientPort:zookeeper.znode.parent
   required string clusterkey = 1;
+  optional string replicationEndpointImpl = 2;
+  repeated BytesBytesPair data = 3;
+  repeated NameStringPair configuration = 4;
 }
{code}

Here, will this be enough for a mixed version deployment where add_peer might 
be executed on an older version and won't write the new fields of 
ReplicationState into the ZK node for the peer? Should we be checking a site 
file setting also? 

{code}
@@ -372,9 +381,32 @@ public class ReplicationSourceManager implements 
ReplicationListener {
       LOG.warn("Passed replication source implementation throws errors, " +
           "defaulting to ReplicationSource", e);
       src = new ReplicationSource();
+    }
 
+    ReplicationEndpoint replicationEndpoint = null;
+    try {
+      String replicationEndpointImpl = peerConfig.getReplicationEndpointImpl();
+      if (replicationEndpointImpl == null) {
+        // Default to HBase inter-cluster replication endpoint
+        replicationEndpointImpl = 
HBaseInterClusterReplicationEndpoint.class.getName();
+      }
+      @SuppressWarnings("rawtypes")
+      Class c = Class.forName(replicationEndpointImpl);
+      replicationEndpoint = (ReplicationEndpoint) c.newInstance();
+    } catch (Exception e) {
+      LOG.warn("Passed replication endpoint implementation throws errors", e);
+      throw new IOException(e);
     }
{code}

Likewise, what happens when you add a peer with a custom replication endpoint 
impl setting but the work gets picked up by an older server? We can't ask a 
user or admin to disable replication while performing a rolling upgrade of 
either the source or destination clusters so I think it's necessary to document 
that customized replication endpoints are not supported in mixed version 
deployments, the fleet must be up to date before customized peer settings can 
be put in place. If we could ask older servers that cannot honor the new 
customization settings to stop replicating and let their work be transferred to 
the newer servers, that would also work, but we do not have this capability in 
the older code as far as I know. It may still be a good idea to add the notion 
of replication work versioning in case additional changes happen in this area, 
and forward port it to branch-1 and trunk. Thoughts?

> Backport HBASE-11367 (Pluggable replication endpoint) to 0.98
> -------------------------------------------------------------
>
>                 Key: HBASE-11992
>                 URL: https://issues.apache.org/jira/browse/HBASE-11992
>             Project: HBase
>          Issue Type: Task
>            Reporter: Andrew Purtell
>            Assignee: ramkrishna.s.vasudevan
>         Attachments: hbase-11367_0.98.patch
>
>
> ReplicationSource tails the logs for each peer. HBASE-11367 introduces 
> ReplicationEndpoint which is customizable per peer. ReplicationEndpoint is 
> run in the same RS process and instantiated per replication peer per region 
> server. Implementations of this interface handle the actual shipping of WAL 
> edits to the remote cluster.
> This issue is for backporting HBASE-11367 to 0.98.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-11992) Backport HBASE-11367 (Pluggable replication endpoint) to 0.98

Reply via email to