[ 
https://issues.apache.org/jira/browse/HBASE-16499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426508#comment-16426508
 ] 

Ashish Singhi commented on HBASE-16499:
---------------------------------------

Thanks for the review. I have pushed the addendum to only master branch.

Addendum didn't apply as we have not committed HBASE-20273 in branch-2 and 
branch-2.0

> slow replication for small HBase clusters
> -----------------------------------------
>
>                 Key: HBASE-16499
>                 URL: https://issues.apache.org/jira/browse/HBASE-16499
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>            Reporter: Vikas Vishwakarma
>            Assignee: Ashish Singhi
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: HBASE-16499-addendum.patch, HBASE-16499.patch, 
> HBASE-16499.patch
>
>
> For small clusters 10-20 nodes we recently observed that replication is 
> progressing very slowly when we do bulk writes and there is lot of lag 
> accumulation on AgeOfLastShipped / SizeOfLogQueue. From the logs we observed 
> that the number of threads used for shipping wal edits in parallel comes from 
> the following equation in HBaseInterClusterReplicationEndpoint
> int n = Math.min(Math.min(this.maxThreads, entries.size()/100+1),
>       replicationSinkMgr.getSinks().size());
> ... 
>       for (int i=0; i<n; i++) {
>         entryLists.add(new ArrayList<HLog.Entry>(entries.size()/n+1)); <-- 
> batch size
>       }
> ...
>         for (int i=0; i<entryLists.size(); i++) {
>          .....
>             // RuntimeExceptions encountered here bubble up and are handled 
> in ReplicationSource
>             pool.submit(createReplicator(entryLists.get(i), i));  <-- 
> concurrency 
>             futures++;
>           }
>         }
> maxThreads is fixed & configurable and since we are taking min of the three 
> values n gets decided based replicationSinkMgr.getSinks().size() when we have 
> enough edits to replicate
> replicationSinkMgr.getSinks().size() is decided based on 
> int numSinks = (int) Math.ceil(slaveAddresses.size() * ratio);
> where ratio is this.ratio = conf.getFloat("replication.source.ratio", 
> DEFAULT_REPLICATION_SOURCE_RATIO);
> Currently DEFAULT_REPLICATION_SOURCE_RATIO is set to 10% so for small 
> clusters of size 10-20 RegionServers  the value we get for numSinks and hence 
> n is very small like 1 or 2. This substantially reduces the pool concurrency 
> used for shipping wal edits in parallel effectively slowing down replication 
> for small clusters and causing lot of lag accumulation in AgeOfLastShipped. 
> Sometimes it takes tens of hours to clear off the entire replication queue 
> even after the client has finished writing on the source side. 
> We are running tests by varying replication.source.ratio and have seen 
> multi-fold improvement in total replication time (will update the results 
> here). I wanted to propose here that we should increase the default value for 
> replication.source.ratio also so that we have sufficient concurrency even for 
> small clusters. We figured it out after lot of iterations and debugging so 
> probably slightly higher default will save the trouble. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to