Adrian Muraru created HBASE-12971: ------------------------------------- Summary: Replication stuck due to large default value for replication.source.maxretriesmultiplier Key: HBASE-12971 URL: https://issues.apache.org/jira/browse/HBASE-12971 Project: HBase Issue Type: Bug Components: hbase Affects Versions: 0.98.10, 1.0.0 Reporter: Adrian Muraru
We are setting in hbase-site the default value of 300 for {{replication.source.maxretriesmultiplier}} introduced in HBASE-11964. While this value works fine to recover for transient errors with remote ZK quorum from the peer Hbase cluster - it proved to have side effects in the code introduced in HBASE-11367 Pluggable replication endpoint, where the default is much lower (10). See: 1. https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java#L169 2. https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/HBaseInterClusterReplicationEndpoint.java#L79 The the two default values are definitely conflicting - when {{replication.source.maxretriesmultiplier}} is set in the hbase-site to 300 this will lead to a sleep time of 300*300 (25h!) when a sockettimeout exception is thrown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)