[
https://issues.apache.org/jira/browse/HBASE-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-9591:
-------------------------
Fix Version/s: (was: 0.96.1)
0.99.0
> [replication] getting "Current list of sinks is out of date" all the time
> when a source is recovered
> ----------------------------------------------------------------------------------------------------
>
> Key: HBASE-9591
> URL: https://issues.apache.org/jira/browse/HBASE-9591
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.96.0
> Reporter: Jean-Daniel Cryans
> Priority: Minor
> Fix For: 0.99.0
>
>
> I tried killing a region server when the slave cluster was down, from that
> point on my log was filled with:
> {noformat}
> 2013-09-20 00:31:03,942 INFO [regionserver60020.replicationSource,1]
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager:
> Current list of sinks is out of date, updating
> 2013-09-20 00:31:04,226 INFO
> [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-4,60020,1379636329634]
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager:
> Current list of sinks is out of date, updating
> {noformat}
> The first log line is from the normal source, the second is the recovered
> one. When we try to replicate, we call
> replicationSinkMgr.getReplicationSink() and if the list of machines was
> refreshed since the last time then we call chooseSinks() which in turn
> refreshes the list of sinks and resets our lastUpdateToPeers. The next source
> will notice the change, and will call chooseSinks() too. The first source is
> coming for another round, sees the list was refreshed, calls chooseSinks()
> again. It happens forever until the recovered queue is gone.
> We could have all the sources going to the same cluster share a thread-safe
> ReplicationSinkManager. We could also manage the same cluster separately for
> each source. Or even easier, if the list we get in chooseSinks() is the same
> we had before, consider it a noop.
> What do you think [~gabriel.reid]?
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)