[ https://issues.apache.org/jira/browse/HBASE-16499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426894#comment-16426894 ]
Hudson commented on HBASE-16499: -------------------------------- Results for branch master [build #284 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/284/]: (x) *{color:red}-1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/284//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/284//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/284//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > slow replication for small HBase clusters > ----------------------------------------- > > Key: HBASE-16499 > URL: https://issues.apache.org/jira/browse/HBASE-16499 > Project: HBase > Issue Type: Bug > Components: Replication > Reporter: Vikas Vishwakarma > Assignee: Ashish Singhi > Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-16499-addendum.patch, HBASE-16499.patch, > HBASE-16499.patch > > > For small clusters 10-20 nodes we recently observed that replication is > progressing very slowly when we do bulk writes and there is lot of lag > accumulation on AgeOfLastShipped / SizeOfLogQueue. From the logs we observed > that the number of threads used for shipping wal edits in parallel comes from > the following equation in HBaseInterClusterReplicationEndpoint > int n = Math.min(Math.min(this.maxThreads, entries.size()/100+1), > replicationSinkMgr.getSinks().size()); > ... > for (int i=0; i<n; i++) { > entryLists.add(new ArrayList<HLog.Entry>(entries.size()/n+1)); <-- > batch size > } > ... > for (int i=0; i<entryLists.size(); i++) { > ..... > // RuntimeExceptions encountered here bubble up and are handled > in ReplicationSource > pool.submit(createReplicator(entryLists.get(i), i)); <-- > concurrency > futures++; > } > } > maxThreads is fixed & configurable and since we are taking min of the three > values n gets decided based replicationSinkMgr.getSinks().size() when we have > enough edits to replicate > replicationSinkMgr.getSinks().size() is decided based on > int numSinks = (int) Math.ceil(slaveAddresses.size() * ratio); > where ratio is this.ratio = conf.getFloat("replication.source.ratio", > DEFAULT_REPLICATION_SOURCE_RATIO); > Currently DEFAULT_REPLICATION_SOURCE_RATIO is set to 10% so for small > clusters of size 10-20 RegionServers the value we get for numSinks and hence > n is very small like 1 or 2. This substantially reduces the pool concurrency > used for shipping wal edits in parallel effectively slowing down replication > for small clusters and causing lot of lag accumulation in AgeOfLastShipped. > Sometimes it takes tens of hours to clear off the entire replication queue > even after the client has finished writing on the source side. > We are running tests by varying replication.source.ratio and have seen > multi-fold improvement in total replication time (will update the results > here). I wanted to propose here that we should increase the default value for > replication.source.ratio also so that we have sufficient concurrency even for > small clusters. We figured it out after lot of iterations and debugging so > probably slightly higher default will save the trouble. -- This message was sent by Atlassian JIRA (v7.6.3#76005)