[jira] [Updated] (HBASE-9591) [replication] getting "Current list of sinks is out of date" all the time when a source is recovered

stack (JIRA) Mon, 16 Dec 2013 10:44:05 -0800

     [ 
https://issues.apache.org/jira/browse/HBASE-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


stack updated HBASE-9591:
-------------------------

    Fix Version/s:     (was: 0.96.1)
                   0.99.0

> [replication] getting "Current list of sinks is out of date" all the time 
> when a source is recovered
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-9591
>                 URL: https://issues.apache.org/jira/browse/HBASE-9591
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.0
>            Reporter: Jean-Daniel Cryans
>            Priority: Minor
>             Fix For: 0.99.0
>
>
> I tried killing a region server when the slave cluster was down, from that 
> point on my log was filled with:
> {noformat}
> 2013-09-20 00:31:03,942 INFO  [regionserver60020.replicationSource,1] 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager: 
> Current list of sinks is out of date, updating
> 2013-09-20 00:31:04,226 INFO  
> [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-4,60020,1379636329634]
>  org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager: 
> Current list of sinks is out of date, updating
> {noformat}
> The first log line is from the normal source, the second is the recovered 
> one. When we try to replicate, we call 
> replicationSinkMgr.getReplicationSink() and if the list of machines was 
> refreshed since the last time then we call chooseSinks() which in turn 
> refreshes the list of sinks and resets our lastUpdateToPeers. The next source 
> will notice the change, and will call chooseSinks() too. The first source is 
> coming for another round, sees the list was refreshed, calls chooseSinks() 
> again. It happens forever until the recovered queue is gone.
> We could have all the sources going to the same cluster share a thread-safe 
> ReplicationSinkManager. We could also manage the same cluster separately for 
> each source. Or even easier, if the list we get in chooseSinks() is the same 
> we had before, consider it a noop.
> What do you think [~gabriel.reid]?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HBASE-9591) [replication] getting "Current list of sinks is out of date" all the time when a source is recovered

Reply via email to