[
https://issues.apache.org/jira/browse/HBASE-13618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14529894#comment-14529894
]
Lars Hofhansl commented on HBASE-13618:
---------------------------------------
Comments? Concerns?
The issue that I am trying to fix is for a long running region server. If over
(say) a month we successfully replicated 100000's of batches across but just
three batches fail due to random temporary glitches (maybe we rolling restarted
the target cluster a few times), we'll still remove the sink.
> ReplicationSource is too eager to remove sinks
> ----------------------------------------------
>
> Key: HBASE-13618
> URL: https://issues.apache.org/jira/browse/HBASE-13618
> Project: HBase
> Issue Type: Bug
> Reporter: Lars Hofhansl
> Assignee: Lars Hofhansl
> Priority: Minor
> Attachments: 13618.txt
>
>
> Looking at the replication for some other reason I noticed that the
> replication source might be a bit too eager to remove sinks from the list of
> valid sinks.
> The current logic allows a sink to fail N times (default 3) and then it will
> be remove from the sinks. But note that this failure count is never reduced,
> so given enough runtime and some network glitches _every_ sink will
> eventually be removed. When all sink are removed the source pick new sinks
> and the counter is set to 0 for all of them.
> I think we should change to reset the counter each time we successfully
> replicate something to the sink (which proves the sink isn't dead). Or we
> could decrease the counter each time we successfully replication, that might
> be better - if we consistently fail more attempts than we succeed the sink
> should be removed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)