Adrian Muraru created HBASE-12971:
-------------------------------------

             Summary: Replication stuck due to large default value for 
replication.source.maxretriesmultiplier
                 Key: HBASE-12971
                 URL: https://issues.apache.org/jira/browse/HBASE-12971
             Project: HBase
          Issue Type: Bug
          Components: hbase
    Affects Versions: 0.98.10, 1.0.0
            Reporter: Adrian Muraru


We are setting in hbase-site the default value of 300 for 
{{replication.source.maxretriesmultiplier}} introduced in HBASE-11964.

While this value works fine to recover for transient errors with remote ZK 
quorum from the peer Hbase cluster - it proved to have side effects in the code 
introduced in HBASE-11367 Pluggable replication endpoint, where the default is 
much lower (10).
See:
1. 
https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java#L169
2. 
https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java#L79

The the two default values are definitely conflicting - when 
{{replication.source.maxretriesmultiplier}} is set in the hbase-site to 300 
this will lead to a  sleep time of 300*300 (25h!) when a sockettimeout 
exception is thrown.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to