[ 
https://issues.apache.org/jira/browse/HBASE-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-2223:
--------------------------------------

    Attachment: HBASE-2223.patch

This is the patch I've been working on for a while now, almost all new code. It 
is not in a committable state since there's a lot of debug information printed 
out, it needs an up-to-date hadoop jar WRT HDFS-142 and HDFS-200, it's missing 
some more unit tests, code is commented all over the place, etc.

It basically works like I described it in my Feb 18th comment, that is:

 - We keep track of the HLogs to replicate in Zookeeper in different folders 
for each region servers. The oldest hlog's znode contains the position to seek 
to for the next batch to replicate.
 - The region servers tail their own HLogs in all situations, and listen to log 
rolling to figure if a log is archived while it needs to be replicated. Since 
there's no real tailing in HDFS, we have to reopen and seek every time and this 
hits the Namenode. So, in order to not DDOS it, we wait a second by default 
when no data is available for replication. Each time we hit an EOF, we wait a 
second more than the last time up to 10 seconds (so SLA here is it takes as 
most as 10 seconds + time to apply data on slave cluster for the data to be 
available on the other end). The same kind of waiting happens when region 
servers on a slave cluster aren't reachable.
 - When a region server fails on the master side, its logs queue is taken over 
in Zookeeper by another RS during a race for a lock. This can failover as many 
times as we want e.g. a RS could end up finishing the replication for a queue 
that was passed on 10 times. Also it is important to note that a failover'ed 
queue will be processed in parallel. This means that if you have only 1 slave 
cluster and a RS died, the one that takes over the queue will send edits to the 
slave cluster in 2 different threads. When the failover'ed queue is done with, 
that replication stream is closed.
 - The sink side of the region server still works like in the original 
implementation, it has to be either changed to work like ReplicationSource or 
remove the part where we log to a file.

I'd be happy to sit down with other people to review the patch.

> Handle 10min+ network partitions between clusters
> -------------------------------------------------
>
>                 Key: HBASE-2223
>                 URL: https://issues.apache.org/jira/browse/HBASE-2223
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.21.0
>
>         Attachments: HBASE-2223.patch
>
>
> We need a nice way of handling long network partitions without impacting a 
> master cluster (which pushes the data). Currently it will just retry over and 
> over again.
> I think we could:
>  - Stop replication to a slave cluster if it didn't respond for more than 10 
> minutes
>  - Keep track of the duration of the partition
>  - When the slave cluster comes back, initiate a MR job like HBASE-2221 
> Maybe we want less than 10 minutes, maybe we want this to be all automatic or 
> just the first 2 parts. Discuss.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to