[
https://issues.apache.org/jira/browse/HBASE-12971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314429#comment-14314429
]
Andrew Purtell commented on HBASE-12971:
----------------------------------------
We already have a ton of config parameters.
Will this be good enough?:
{quote}
We can already configure replication.source.socketTimeoutMultiplier, it's just
about a good default.
In fact with that in mind maybe the socketTimeoutMultiplier should just be
maxRetriesMultiplier (we declared maxRetriesMultiplier to be a good maximum
since we configured it that way, on a socket timeout it seems good to wait for
that maximum immediately).
Everybody good with that (socketTimeoutMultiplier = maxRetriesMultiplier)?
{quote}
Because if so let's make the change, add a release note, and we are done here.
> Replication stuck due to large default value for
> replication.source.maxretriesmultiplier
> ----------------------------------------------------------------------------------------
>
> Key: HBASE-12971
> URL: https://issues.apache.org/jira/browse/HBASE-12971
> Project: HBase
> Issue Type: Bug
> Components: hbase
> Affects Versions: 1.0.0, 0.98.10
> Reporter: Adrian Muraru
> Fix For: 2.0.0, 1.0.1, 1.1.0, 0.94.27, 0.98.11
>
>
> We are setting in hbase-site the default value of 300 for
> {{replication.source.maxretriesmultiplier}} introduced in HBASE-11964.
> While this value works fine to recover for transient errors with remote ZK
> quorum from the peer Hbase cluster - it proved to have side effects in the
> code introduced in HBASE-11367 Pluggable replication endpoint, where the
> default is much lower (10).
> See:
> 1.
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java#L169
> 2.
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java#L79
> The the two default values are definitely conflicting - when
> {{replication.source.maxretriesmultiplier}} is set in the hbase-site to 300
> this will lead to a sleep time of 300*300 (25h!) when a sockettimeout
> exception is thrown.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)