[ https://issues.apache.org/jira/browse/HBASE-21259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645843#comment-16645843 ]
Hadoop QA commented on HBASE-21259: ----------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 27s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 54s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 16s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 48s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} branch-2.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 14s{color} | {color:red} hbase-server: The patch generated 9 new + 113 unchanged - 1 fixed = 122 total (was 114) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 36s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 28s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m 2s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 13s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 57m 40s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.master.TestMasterWALManager | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:42ca976 | | JIRA Issue | HBASE-21259 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12943331/HBASE-21259.branch-2.1.005.patch | | Optional Tests | dupname asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 98ca7bfa7c80 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2.1 / e283963533 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC3 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/14641/artifact/patchprocess/diff-checkstyle-hbase-server.txt | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/14641/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/14641/testReport/ | | Max. process+thread count | 645 (vs. ulimit of 10000) | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14641/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > [amv2] Revived deadservers; recreated serverstatenode > ----------------------------------------------------- > > Key: HBASE-21259 > URL: https://issues.apache.org/jira/browse/HBASE-21259 > Project: HBase > Issue Type: Bug > Components: amv2 > Affects Versions: 2.1.0 > Reporter: stack > Assignee: stack > Priority: Critical > Fix For: 2.2.0, 2.1.1, 2.0.3 > > Attachments: HBASE-21259.branch-2.1.001.patch, > HBASE-21259.branch-2.1.002.patch, HBASE-21259.branch-2.1.003.patch, > HBASE-21259.branch-2.1.004.patch, HBASE-21259.branch-2.1.005.patch > > > On startup, I see servers being revived; i.e. their serverstatenode is > getting marked online even though its just been processed by > ServerCrashProcedure. It looks like this (in a patched server that reports on > whenever a serverstatenode is created): > {code} > 2018-09-29 03:45:40,963 INFO > org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=3982597, > state=SUCCESS; ServerCrashProcedure > server=vb1442.halxg.cloudera.com,22101,1536675314426, splitWal=true, > meta=false in 1.0130sec > ... > 2018-09-29 03:45:43,733 INFO > org.apache.hadoop.hbase.master.assignment.RegionStates: CREATING! > vb1442.halxg.cloudera.com,22101,1536675314426 > java.lang.RuntimeException: WHERE AM I? > at > org.apache.hadoop.hbase.master.assignment.RegionStates.getOrCreateServer(RegionStates.java:1116) > at > org.apache.hadoop.hbase.master.assignment.RegionStates.addRegionToServer(RegionStates.java:1143) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1464) > at > org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:200) > at > org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:369) > at > org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:97) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:953) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1716) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1494) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2022) > {code} > See how we've just finished a SCP which will have removed the > serverstatenode... but then we come across an unassign that references the > server that was just processed. The unassign will attempt to update the > serverstatenode and therein we create one if one not present. We shouldn't be > creating one. > I think I see this a lot because I am scheduling unassigns with hbck2. The > servers crash and then come up with SCPs doing cleanup of old server and > unassign procedures in the procedure executor queue to be processed still.... > but could happen at any time on cluster should an unassign happen get > scheduled near an SCP. -- This message was sent by Atlassian JIRA (v7.6.3#76005)