[jira] [Commented] (HBASE-21440) Assign procedure on the crashed server is not properly interrupted

Hudson (JIRA) Wed, 14 Nov 2018 21:00:23 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-21440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687484#comment-16687484
 ]


Hudson commented on HBASE-21440:
--------------------------------

Results for branch branch-2.0
        [build #1085 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1085/]: 
(/) *{color:green}+1 overall{color}*
----
details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1085//General_Nightly_Build_Report/]




(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1085//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/1085//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Assign procedure on the crashed server is not properly interrupted
> ------------------------------------------------------------------
>
>                 Key: HBASE-21440
>                 URL: https://issues.apache.org/jira/browse/HBASE-21440
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.2
>            Reporter: Ankit Singhal
>            Assignee: Ankit Singhal
>            Priority: Major
>             Fix For: 3.0.0, 2.2.0, 2.0.3, 2.1.2
>
>         Attachments: HBASE-21440.branch-2.0.001.patch, 
> HBASE-21440.branch-2.0.002.patch, HBASE-21440.branch-2.0.003.patch, 
> HBASE-21440.branch-2.0.004.patch, HBASE-21440.branch-2.0.005.patch, 
> HBASE-21440.branch-2.1.005.patch
>
>
> When the server crashes, it's SCP checks if there is already a procedure 
> assigning the region on this crashed server. If we found one, SCP will just 
> interrupt the already running AssignProcedure by calling remoteCallFailed 
> which internally just changes the region node state to OFFLINE and send the 
> procedure back with transition queue state for assignment with a new plan. 
> But, due to the race condition between the calling of the remoteCallFailed 
> and current state of the already running assign 
> procedure(REGION_TRANSITION_FINISH: where the region is already opened), it 
> is possible that assign procedure goes ahead in updating the regionStateNode 
> to OPEN on a crashed server. 
> As SCP had already skipped this region for assignment as it was relying on 
> existing assign procedure to do the right thing, this whole confusion leads 
> region to a not accessible state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21440) Assign procedure on the crashed server is not properly interrupted

Reply via email to