[
https://issues.apache.org/jira/browse/HBASE-12814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
churro morales resolved HBASE-12814.
------------------------------------
Resolution: Not A Problem
Most likely everyone is off the 94 branch.
> Zero downtime upgrade from 94 to 98
> ------------------------------------
>
> Key: HBASE-12814
> URL: https://issues.apache.org/jira/browse/HBASE-12814
> Project: HBase
> Issue Type: New Feature
> Affects Versions: 0.94.26, 0.98.10
> Reporter: churro morales
> Assignee: churro morales
> Attachments: HBASE-12814-0.94.patch, HBASE-12814-0.98.patch
>
>
> Here at Flurry we want to upgrade our HBase cluster from 94 to 98 while not
> having any downtime and maintaining master / master replication.
> Summary:
> Replication is done via thrift RPC between clusters. It is configurable on a
> peer by peer basis and the one caveat is that a thrift server starts up on
> every node which proxies the request to the ReplicationSink.
> For the upgrade process:
> * in hbase-site.xml two new configuration parameters are added:
> ** *Required*
> *** hbase.replication.sink.enable.thrift -> true
> *** hbase.replication.thrift.server.port -> <thrit_server_port>
> ** *Optional*
> *** hbase.replication.thrift.protection {default: AUTHENTICATION}
> *** hbase.replication.thrift.framed {default: false}
> *** hbase.replication.thrift.compact {default: true}
> - All regionservers can be rolling restarted (no downtime), all clusters must
> have the respective patch for this to work.
> - the hbase shell add_peer command takes an additional parameter for rpc
> protocol
> - example: {code} add_peer '1' "hbase-101:2181:/hbase", "THRIFT" {code}
> Now comes the fun part when you want to upgrade your cluster from 94 to 98
> you simply pause replication to the cluster being upgraded, do the upgrade
> and un-pause replication. Once you have a pair of clusters only replicating
> inbound and outbound with the 98 release. You can start replicating via the
> native rpc protocol by adding the peer again without the _THRIFT_ parameter
> and subsequently deleting the peer with the thrift protocol. Because
> replication is idempotent I don't see any issues as long as you wait for the
> backlog to drain after un-pausing replication.
> Special thanks to Francis Liu at Yahoo for laying the groundwork and Mr. Dave
> Latham for his invaluable knowledge and assistance.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)