[
https://issues.apache.org/jira/browse/HBASE-21259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16648785#comment-16648785
]
Hudson commented on HBASE-21259:
--------------------------------
Results for branch branch-2.1
[build #458 on
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/458/]:
(x) *{color:red}-1 overall{color}*
----
details (if available):
(/) {color:green}+1 general checks{color}
-- For more information [see general
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/458//General_Nightly_Build_Report/]
(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2)
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/458//JDK8_Nightly_Build_Report_(Hadoop2)/]
(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3)
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/458//JDK8_Nightly_Build_Report_(Hadoop3)/]
(/) {color:green}+1 source release artifact{color}
-- See build output for details.
(/) {color:green}+1 client integration test{color}
> [amv2] Revived deadservers; recreated serverstatenode
> -----------------------------------------------------
>
> Key: HBASE-21259
> URL: https://issues.apache.org/jira/browse/HBASE-21259
> Project: HBase
> Issue Type: Bug
> Components: amv2
> Affects Versions: 2.1.0
> Reporter: stack
> Assignee: stack
> Priority: Critical
> Fix For: 2.1.1, 2.0.3
>
> Attachments: HBASE-21259.branch-2.1.001.patch,
> HBASE-21259.branch-2.1.002.patch, HBASE-21259.branch-2.1.003.patch,
> HBASE-21259.branch-2.1.004.patch, HBASE-21259.branch-2.1.005.patch,
> HBASE-21259.branch-2.1.006.patch
>
>
> On startup, I see servers being revived; i.e. their serverstatenode is
> getting marked online even though its just been processed by
> ServerCrashProcedure. It looks like this (in a patched server that reports on
> whenever a serverstatenode is created):
> {code}
> 2018-09-29 03:45:40,963 INFO
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=3982597,
> state=SUCCESS; ServerCrashProcedure
> server=vb1442.halxg.cloudera.com,22101,1536675314426, splitWal=true,
> meta=false in 1.0130sec
> ...
> 2018-09-29 03:45:43,733 INFO
> org.apache.hadoop.hbase.master.assignment.RegionStates: CREATING!
> vb1442.halxg.cloudera.com,22101,1536675314426
> java.lang.RuntimeException: WHERE AM I?
> at
> org.apache.hadoop.hbase.master.assignment.RegionStates.getOrCreateServer(RegionStates.java:1116)
> at
> org.apache.hadoop.hbase.master.assignment.RegionStates.addRegionToServer(RegionStates.java:1143)
> at
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1464)
> at
> org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:200)
> at
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:369)
> at
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:97)
> at
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:953)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1716)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1494)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:75)
> at
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2022)
> {code}
> See how we've just finished a SCP which will have removed the
> serverstatenode... but then we come across an unassign that references the
> server that was just processed. The unassign will attempt to update the
> serverstatenode and therein we create one if one not present. We shouldn't be
> creating one.
> I think I see this a lot because I am scheduling unassigns with hbck2. The
> servers crash and then come up with SCPs doing cleanup of old server and
> unassign procedures in the procedure executor queue to be processed still....
> but could happen at any time on cluster should an unassign happen get
> scheduled near an SCP.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)