[ https://issues.apache.org/jira/browse/HBASE-21440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687484#comment-16687484 ]
Hudson commented on HBASE-21440: -------------------------------- Results for branch branch-2.0 [build #1085 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1085/]: (/) *{color:green}+1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1085//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1085//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1085//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Assign procedure on the crashed server is not properly interrupted > ------------------------------------------------------------------ > > Key: HBASE-21440 > URL: https://issues.apache.org/jira/browse/HBASE-21440 > Project: HBase > Issue Type: Bug > Affects Versions: 2.0.2 > Reporter: Ankit Singhal > Assignee: Ankit Singhal > Priority: Major > Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2 > > Attachments: HBASE-21440.branch-2.0.001.patch, > HBASE-21440.branch-2.0.002.patch, HBASE-21440.branch-2.0.003.patch, > HBASE-21440.branch-2.0.004.patch, HBASE-21440.branch-2.0.005.patch, > HBASE-21440.branch-2.1.005.patch > > > When the server crashes, it's SCP checks if there is already a procedure > assigning the region on this crashed server. If we found one, SCP will just > interrupt the already running AssignProcedure by calling remoteCallFailed > which internally just changes the region node state to OFFLINE and send the > procedure back with transition queue state for assignment with a new plan. > But, due to the race condition between the calling of the remoteCallFailed > and current state of the already running assign > procedure(REGION_TRANSITION_FINISH: where the region is already opened), it > is possible that assign procedure goes ahead in updating the regionStateNode > to OPEN on a crashed server. > As SCP had already skipped this region for assignment as it was relying on > existing assign procedure to do the right thing, this whole confusion leads > region to a not accessible state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)