[ https://issues.apache.org/jira/browse/HBASE-16721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531747#comment-15531747 ]
Hadoop QA commented on HBASE-16721: ----------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 10s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 45s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 25m 11s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 90m 11s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 128m 23s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Timed out junit tests | org.apache.hadoop.hbase.client.TestReplicasClient | | | org.apache.hadoop.hbase.client.TestFromClientSide | | | org.apache.hadoop.hbase.client.TestTableSnapshotScanner | | | org.apache.hadoop.hbase.client.TestMobCloneSnapshotFromClient | | | org.apache.hadoop.hbase.client.TestMobSnapshotCloneIndependence | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:7bda515 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12830834/hbase-16721_v2.master.patch | | JIRA Issue | HBASE-16721 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux da73c8acacde 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 09a31bd | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/3766/artifact/patchprocess/patch-unit-hbase-server.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HBASE-Build/3766/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/3766/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/3766/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Concurrency issue in WAL unflushed seqId tracking > ------------------------------------------------- > > Key: HBASE-16721 > URL: https://issues.apache.org/jira/browse/HBASE-16721 > Project: HBase > Issue Type: Bug > Components: wal > Affects Versions: 1.0.0, 1.1.0, 1.2.0 > Reporter: Enis Soztutar > Assignee: Enis Soztutar > Priority: Critical > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 1.2.4 > > Attachments: hbase-16721_v1.branch-1.patch, > hbase-16721_v2.branch-1.patch, hbase-16721_v2.master.patch > > > I'm inspecting an interesting case where in a production cluster, some > regionservers ends up accumulating hundreds of WAL files, even with force > flushes going on due to max logs. This happened multiple times on the > cluster, but not on other clusters. The cluster has periodic memstore flusher > disabled, however, this still does not explain why the force flush of regions > due to max limit is not working. I think the periodic memstore flusher just > masks the underlying problem, which is why we do not see this in other > clusters. > The problem starts like this: > {code} > 2016-09-21 17:49:18,272 INFO [regionserver//10.2.0.55:16020.logRoller] > wal.FSHLog: Too many wals: logs=33, maxlogs=32; forcing flush of 1 > regions(s): d4cf39dc40ea79f5da4d0cf66d03cb1f > 2016-09-21 17:49:18,273 WARN [regionserver//10.2.0.55:16020.logRoller] > regionserver.LogRoller: Failed to schedule flush of > d4cf39dc40ea79f5da4d0cf66d03cb1f, region=null, requester=null > {code} > then, it continues until the RS is restarted: > {code} > 2016-09-23 17:43:49,356 INFO [regionserver//10.2.0.55:16020.logRoller] > wal.FSHLog: Too many wals: logs=721, maxlogs=32; forcing flush of 1 > regions(s): d4cf39dc40ea79f5da4d0cf66d03cb1f > 2016-09-23 17:43:49,357 WARN [regionserver//10.2.0.55:16020.logRoller] > regionserver.LogRoller: Failed to schedule flush of > d4cf39dc40ea79f5da4d0cf66d03cb1f, region=null, requester=null > {code} > The problem is that region {{d4cf39dc40ea79f5da4d0cf66d03cb1f}} is already > split some time ago, and was able to flush its data and split without any > problems. However, the FSHLog still thinks that there is some unflushed data > for this region. -- This message was sent by Atlassian JIRA (v6.3.4#6332)