[
https://issues.apache.org/jira/browse/HBASE-16499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424167#comment-16424167
]
Hadoop QA commented on HBASE-16499:
-----------------------------------
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m
0s{color} | {color:green} The patch appears to include 1 new or modified test
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m
2s{color} | {color:green} branch has no errors when building our shaded
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m
52s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
28s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m
0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m
55s{color} | {color:green} patch has no errors when building our shaded
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}
20m 20s{color} | {color:green} Patch does not cause any errors with Hadoop
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}117m
51s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m
24s{color} | {color:green} The patch does not generate ASF License warnings.
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}163m 54s{color} |
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f |
| JIRA Issue | HBASE-16499 |
| JIRA Patch URL |
https://issues.apache.org/jira/secure/attachment/12917354/HBASE-16499.patch |
| Optional Tests | asflicense javac javadoc unit findbugs shadedjars
hadoopcheck hbaseanti checkstyle compile |
| uname | Linux 8b3fa7e43440 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality |
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
|
| git revision | master / 219625233c |
| maven | version: Apache Maven 3.5.3
(3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC3 |
| Test Results |
https://builds.apache.org/job/PreCommit-HBASE-Build/12274/testReport/ |
| Max. process+thread count | 4181 (vs. ulimit of 10000) |
| modules | C: hbase-server U: hbase-server |
| Console output |
https://builds.apache.org/job/PreCommit-HBASE-Build/12274/console |
| Powered by | Apache Yetus 0.7.0 http://yetus.apache.org |
This message was automatically generated.
> slow replication for small HBase clusters
> -----------------------------------------
>
> Key: HBASE-16499
> URL: https://issues.apache.org/jira/browse/HBASE-16499
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Reporter: Vikas Vishwakarma
> Assignee: Ashish Singhi
> Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-16499.patch, HBASE-16499.patch
>
>
> For small clusters 10-20 nodes we recently observed that replication is
> progressing very slowly when we do bulk writes and there is lot of lag
> accumulation on AgeOfLastShipped / SizeOfLogQueue. From the logs we observed
> that the number of threads used for shipping wal edits in parallel comes from
> the following equation in HBaseInterClusterReplicationEndpoint
> int n = Math.min(Math.min(this.maxThreads, entries.size()/100+1),
> replicationSinkMgr.getSinks().size());
> ...
> for (int i=0; i<n; i++) {
> entryLists.add(new ArrayList<HLog.Entry>(entries.size()/n+1)); <--
> batch size
> }
> ...
> for (int i=0; i<entryLists.size(); i++) {
> .....
> // RuntimeExceptions encountered here bubble up and are handled
> in ReplicationSource
> pool.submit(createReplicator(entryLists.get(i), i)); <--
> concurrency
> futures++;
> }
> }
> maxThreads is fixed & configurable and since we are taking min of the three
> values n gets decided based replicationSinkMgr.getSinks().size() when we have
> enough edits to replicate
> replicationSinkMgr.getSinks().size() is decided based on
> int numSinks = (int) Math.ceil(slaveAddresses.size() * ratio);
> where ratio is this.ratio = conf.getFloat("replication.source.ratio",
> DEFAULT_REPLICATION_SOURCE_RATIO);
> Currently DEFAULT_REPLICATION_SOURCE_RATIO is set to 10% so for small
> clusters of size 10-20 RegionServers the value we get for numSinks and hence
> n is very small like 1 or 2. This substantially reduces the pool concurrency
> used for shipping wal edits in parallel effectively slowing down replication
> for small clusters and causing lot of lag accumulation in AgeOfLastShipped.
> Sometimes it takes tens of hours to clear off the entire replication queue
> even after the client has finished writing on the source side.
> We are running tests by varying replication.source.ratio and have seen
> multi-fold improvement in total replication time (will update the results
> here). I wanted to propose here that we should increase the default value for
> replication.source.ratio also so that we have sufficient concurrency even for
> small clusters. We figured it out after lot of iterations and debugging so
> probably slightly higher default will save the trouble.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)