Vikas Vishwakarma created HBASE-16499:
-----------------------------------------
Summary: slow replication for small HBase clusters
Key: HBASE-16499
URL: https://issues.apache.org/jira/browse/HBASE-16499
Project: HBase
Issue Type: Bug
Reporter: Vikas Vishwakarma
Assignee: Vikas Vishwakarma
For small clusters 10-20 nodes we recently observed that replication is
progressing very slowly when we do bulk writes and there is lot of lag
accumulation on AgeOfLastShipped / SizeOfLogQueue. From the logs we observed
that the number of threads used for shipping wal edits in parallel comes from
the following equation in HBaseInterClusterReplicationEndpoint
int n = Math.min(Math.min(this.maxThreads, entries.size()/100+1),
replicationSinkMgr.getSinks().size());
...
for (int i=0; i<n; i++) {
entryLists.add(new ArrayList<HLog.Entry>(entries.size()/n+1)); <--
batch size
}
...
for (int i=0; i<entryLists.size(); i++) {
.....
// RuntimeExceptions encountered here bubble up and are handled in
ReplicationSource
pool.submit(createReplicator(entryLists.get(i), i)); <--
concurrency
futures++;
}
}
maxThreads is fixed & configurable and since we are taking min of the three
values n gets decided based replicationSinkMgr.getSinks().size() when we have
enough edits to replicate
replicationSinkMgr.getSinks().size() is decided based on
int numSinks = (int) Math.ceil(slaveAddresses.size() * ratio);
where ratio is this.ratio = conf.getFloat("replication.source.ratio",
DEFAULT_REPLICATION_SOURCE_RATIO);
Currently DEFAULT_REPLICATION_SOURCE_RATIO is set to 10% so for small clusters
of size 10-20 RegionServers the value we get for numSinks and hence n is very
small like 1 or 2. This substantially reduces the pool concurrency used for
shipping wal edits in parallel effectively slowing down replication for small
clusters and causing lot of lag accumulation in AgeOfLastShipped. Sometimes it
takes tens of hours to clear off the entire replication queue even after the
client has finished writing on the source side.
We are running tests by varying replication.source.ratio and have seen
multi-fold improvement in total replication time (will update the results
here). I wanted to propose here that we should increase the default value for
replication.source.ratio also so that we have sufficient concurrency even for
small clusters. We figured it out after lot of iterations and debugging so
probably slightly higher default will save the trouble.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)