[
https://issues.apache.org/jira/browse/HBASE-11992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135889#comment-14135889
]
Andrew Purtell commented on HBASE-11992:
----------------------------------------
Preliminary comments on the patch, or things that should be in it.
In hbase-client, the following classes don't have interface stability
annotations but should all be marked Private: ReplicationFactory,
ReplicationPeersZKImpl, ReplicationQueueInfo, ReplicationQueuesClient,
ReplicationQueuesClientZKImpl , ReplicationQueuesZKImpl,
ReplicationStateZKBase, ReplicationTrackerZKImpl. I don't think users should
have expected these be other than internal implementation. Changes to these
should be fine.
We don't have known users referencing these types. I checked Phoenix.
Remove this TODO? We're not going to do this in 0.98, right?
{code}
@@ -80,6 +86,8 @@ public class ReplicationAdmin implements Closeable {
.toString(HConstants.REPLICATION_SCOPE_GLOBAL);
private final HConnection connection;
+ // TODO: replication should be managed by master. All the classes except
ReplicationAdmin should
+ // be moved to hbase-server. Resolve it in HBASE-11392.
private final ReplicationQueuesClient replicationQueuesClient;
private final ReplicationPeers replicationPeers;
{code}
Synchronize methods that call reconnect() in the new class
HBaseReplicationEndpoint:
{code}
diff --git
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/HBaseReplicationEndpoint.java
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/HBaseReplicationEndpoint.java
new file mode 100644
index 0000000..4b9a28f
--- /dev/null
+++
hbase-server/src/main/java/org/apache/hadoop/hbase/replication/HBaseReplicationEndpoint.java
[...]
+ @Override
+ public UUID getPeerUUID() {
+ UUID peerUUID = null;
+ try {
+ peerUUID = ZKClusterId.getUUIDForCluster(zkw);
+ } catch (KeeperException ke) {
---> + reconnect(ke);
+ }
+ return peerUUID;
+ }
[...]
+ public List<ServerName> getRegionServers() {
+ try {
+ setRegionServers(fetchSlavesAddresses(this.getZkw()));
+ } catch (KeeperException ke) {
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("Fetch salves addresses failed.", ke);
+ }
---> + reconnect(ke);
+ }
+ return regionServers;
+ }
[...]
{code}
The additional state in the ReplicationPeer znode pbuf introduces
mixed-version-compatibility issues:
{code}
diff --git hbase-protocol/src/main/protobuf/ZooKeeper.proto
hbase-protocol/src/main/protobuf/ZooKeeper.proto
index 37816da..598385c 100644
--- hbase-protocol/src/main/protobuf/ZooKeeper.proto
+++ hbase-protocol/src/main/protobuf/ZooKeeper.proto
@@ -119,6 +119,9 @@ message ReplicationPeer {
// clusterkey is the concatenation of the slave cluster's
//
hbase.zookeeper.quorum:hbase.zookeeper.property.clientPort:zookeeper.znode.parent
required string clusterkey = 1;
+ optional string replicationEndpointImpl = 2;
+ repeated BytesBytesPair data = 3;
+ repeated NameStringPair configuration = 4;
}
{code}
Here, will this be enough for a mixed version deployment where add_peer might
be executed on an older version and won't write the new fields of
ReplicationState into the ZK node for the peer? Should we be checking a site
file setting also?
{code}
@@ -372,9 +381,32 @@ public class ReplicationSourceManager implements
ReplicationListener {
LOG.warn("Passed replication source implementation throws errors, " +
"defaulting to ReplicationSource", e);
src = new ReplicationSource();
+ }
+ ReplicationEndpoint replicationEndpoint = null;
+ try {
+ String replicationEndpointImpl = peerConfig.getReplicationEndpointImpl();
+ if (replicationEndpointImpl == null) {
+ // Default to HBase inter-cluster replication endpoint
+ replicationEndpointImpl =
HBaseInterClusterReplicationEndpoint.class.getName();
+ }
+ @SuppressWarnings("rawtypes")
+ Class c = Class.forName(replicationEndpointImpl);
+ replicationEndpoint = (ReplicationEndpoint) c.newInstance();
+ } catch (Exception e) {
+ LOG.warn("Passed replication endpoint implementation throws errors", e);
+ throw new IOException(e);
}
{code}
Likewise, what happens when you add a peer with a custom replication endpoint
impl setting but the work gets picked up by an older server? We can't ask a
user or admin to disable replication while performing a rolling upgrade of
either the source or destination clusters so I think it's necessary to document
that customized replication endpoints are not supported in mixed version
deployments, the fleet must be up to date before customized peer settings can
be put in place. If we could ask older servers that cannot honor the new
customization settings to stop replicating and let their work be transferred to
the newer servers, that would also work, but we do not have this capability in
the older code as far as I know. It may still be a good idea to add the notion
of replication work versioning in case additional changes happen in this area,
and forward port it to branch-1 and trunk. Thoughts?
> Backport HBASE-11367 (Pluggable replication endpoint) to 0.98
> -------------------------------------------------------------
>
> Key: HBASE-11992
> URL: https://issues.apache.org/jira/browse/HBASE-11992
> Project: HBase
> Issue Type: Task
> Reporter: Andrew Purtell
> Assignee: ramkrishna.s.vasudevan
> Attachments: hbase-11367_0.98.patch
>
>
> ReplicationSource tails the logs for each peer. HBASE-11367 introduces
> ReplicationEndpoint which is customizable per peer. ReplicationEndpoint is
> run in the same RS process and instantiated per replication peer per region
> server. Implementations of this interface handle the actual shipping of WAL
> edits to the remote cluster.
> This issue is for backporting HBASE-11367 to 0.98.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)