[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

Jean-Marc Spaggiari (JIRA) Tue, 12 Nov 2013 04:06:32 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820048#comment-13820048
 ]


Jean-Marc Spaggiari commented on HBASE-9360:
--------------------------------------------

Hi [~jeffreyz], I have a 0.94.12+1.0.3 and a 0.96.0+2.2.0 clusters. I will most 
probably be able to give this a try. We can move the discussion to he mailing 
list if you want. So far I have done stopped 0.94, a distcp from 0.94 to 0.96, 
started 0.96, as a test since I'm still using 0.94 (Need to try my MR jobs in 
0.96 first). So I can clean 0.96 and give a try to the procedure above.

Only question is, if writes are happening in 0.94, how are your perfectly sure 
of the start time of the replication? Might be one ms later of before what you 
think it is. Is this information stored somewhere?

> Enable 0.94 -> 0.96 replication to minimize upgrade down time
> -------------------------------------------------------------
>
>                 Key: HBASE-9360
>                 URL: https://issues.apache.org/jira/browse/HBASE-9360
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: migration
>    Affects Versions: 0.98.0, 0.96.0
>            Reporter: Jeffrey Zhong
>
> As we know 0.96 is a singularity release, as of today a 0.94 hbase user has 
> to do in-place upgrade: make corresponding client changes, recompile client 
> application code, fully shut down existing 0.94 hbase cluster, deploy 0.96 
> binary, run upgrade script and then start the upgraded cluster. You can image 
> the down time will be extended if something is wrong in between. 
> To minimize the down time, another possible way is to setup a secondary 0.96 
> cluster and then setup replication between the existing 0.94 cluster and the 
> new 0.96 slave cluster. Once the 0.96 cluster is synced, a user can switch 
> the traffic to the 0.96 cluster and decommission the old one.
> The ideal steps will be:
> 1) Setup a 0.96 cluster
> 2) Setup replication between a running 0.94 cluster to the newly created 0.96 
> cluster
> 3) Wait till they're in sync in replication
> 4) Starts duplicated writes to both 0.94 and 0.96 clusters(could stop 
> relocation now)
> 5) Forward read traffic to the slave 0.96 cluster
> 6) After a certain period, stop writes to the original 0.94 cluster if 
> everything is good and completes upgrade
> To get us there, there are two tasks:
> 1) Enable replication from 0.94 -> 0.96
> I've run the idea with [~jdcryans], [~devaraj] and [~ndimiduk]. Currently it 
> seems the best approach is to build a very similar service or on top of 
> https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep with support 
> three commands replicateLogEntries, multi and delete. Inside the three 
> commands, we just pass down the corresponding requests to the destination 
> 0.96 cluster as a bridge. The reason to support the multi and delete is for 
> CopyTable to copy data from a 0.94 cluster to a 0.96 one.
> The other approach is to provide limited support of 0.94 RPC protocol in 
> 0.96. While an issue on this is that a 0.94 client needs to talk to zookeeper 
> firstly before it can connect to a 0.96 region server. Therefore, we need a 
> faked Zookeeper setup in front of a 0.96 cluster for a 0.94 client to 
> connect. It may also pollute 0.96 code base with 0.94 RPC code.
> 2) To support writes to a 0.96 cluster and a 0.94 at the same time, we need 
> to load both hbase clients into one single JVM using different class loader.
> Let me know if you think this is worth to do and any better approach we could 
> take.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9360) Enable 0.94 -> 0.96 replication to minimize upgrade down time

Reply via email to