[jira] [Commented] (HBASE-12386) Replication gets stuck following a transient zookeeper error to remote peer cluster

Lars Hofhansl (JIRA) Thu, 30 Oct 2014 15:35:16 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-12386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190960#comment-14190960
 ]


Lars Hofhansl commented on HBASE-12386:
---------------------------------------

{{"Current list of sinks is out of date or empty, updating"}} seems clear 
enough to me.

+1 on patch.

One thing we have to think through is what happens when the slave cluster is 
down for a bit. We'd chose sinks again on each call. I think that's OK 
especially since we dialed down the retry interval to 5mins recently after a 
bit.

Also, we can still be a bad situation where RegionServers die and restart at 
the slave cluster, we could go down to a single RS at the peers before we try 
to choose sinks again. That's for another issue.

> Replication gets stuck following a transient zookeeper error to remote peer 
> cluster
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-12386
>                 URL: https://issues.apache.org/jira/browse/HBASE-12386
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 0.98.7
>            Reporter: Adrian Muraru
>         Attachments: HBASE-12386.patch
>
>
> Following a transient ZK error replication gets stuck and remote peers are 
> never updated.
> Source region servers are reporting continuously the following error in logs:
> "No replication sinks are available"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12386) Replication gets stuck following a transient zookeeper error to remote peer cluster

Reply via email to