[jira] [Commented] (HBASE-9591) [replication] getting "Current list of sinks is out of date" all the time when a source is recovered

Lars Hofhansl (JIRA) Thu, 19 Sep 2013 19:53:53 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772582#comment-13772582
 ]


Lars Hofhansl commented on HBASE-9591:
--------------------------------------

Is this 0.96+ only, or a 0.94 issue as well?
                
> [replication] getting "Current list of sinks is out of date" all the time 
> when a source is recovered
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-9591
>                 URL: https://issues.apache.org/jira/browse/HBASE-9591
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.96.0
>            Reporter: Jean-Daniel Cryans
>            Priority: Minor
>             Fix For: 0.96.1
>
>
> I tried killing a region server when the slave cluster was down, from that 
> point on my log was filled with:
> {noformat}
> 2013-09-20 00:31:03,942 INFO  [regionserver60020.replicationSource,1] 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager: 
> Current list of sinks is out of date, updating
> 2013-09-20 00:31:04,226 INFO  
> [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-4,60020,1379636329634]
>  org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager: 
> Current list of sinks is out of date, updating
> {noformat}
> The first log line is from the normal source, the second is the recovered 
> one. When we try to replicate, we call 
> replicationSinkMgr.getReplicationSink() and if the list of machines was 
> refreshed since the last time then we call chooseSinks() which in turn 
> refreshes the list of sinks and resets our lastUpdateToPeers. The next source 
> will notice the change, and will call chooseSinks() too. The first source is 
> coming for another round, sees the list was refreshed, calls chooseSinks() 
> again. It happens forever until the recovered queue is gone.
> We could have all the sources going to the same cluster share a thread-safe 
> ReplicationSinkManager. We could also manage the same cluster separately for 
> each source. Or even easier, if the list we get in chooseSinks() is the same 
> we had before, consider it a noop.
> What do you think [~gabriel.reid]?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-9591) [replication] getting "Current list of sinks is out of date" all the time when a source is recovered

Reply via email to