[ 
https://issues.apache.org/jira/browse/HBASE-22631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16911470#comment-16911470
 ] 

Wellington Chevreuil commented on HBASE-22631:
----------------------------------------------

{quote}RegionNode.offline() will set the region state to State.OFFLINE and the 
next step AssignmentManager().undoRegionAsOpening will check the region state 
if state is in OPENING state .
{quote}
This is true indeed, but actually it's not enough to trigger the condition of 
having the region brought back. This is because *RegionStateNode* objects 
instances on *ServerStateNode.regions* are the same ones tracked on 
*RegionStates.regionsMap*. So once the the region state is updated by any 
procedure, such as *AssignProcedure* or *SplitProcedure* in this case, the 
state will be reflected so it will not be taken into account if an SCP is 
triggered. Also, once the SCP has completed successfully, the given entry for 
this RS instance would had already been removed from *RegionStates.regionsMap*. 
I had added a test that simulates the scenario described here, am attaching it 
in a new patch. It succeeds even without the proposed fix, due to the 
observations above. Nevertheless, the fact we may keep references in 
*RegionStates.regionsMap* for regions not in the RS anymore seems like an 
inconsistency, so I guess we can keep the proposed fix.

 

> assign failed may make gced parent region appear again !!!
> ----------------------------------------------------------
>
>                 Key: HBASE-22631
>                 URL: https://issues.apache.org/jira/browse/HBASE-22631
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2
>    Affects Versions: 2.1.1
>            Reporter: yuhuiyang
>            Priority: Major
>         Attachments: HBASE-22631-branch-2.1-01.patch, 
> HBASE-22631-branch-2.1-02.patch, assign.png, assignProcedure.txt, 
> serverCrash.png, splitAndGc.png
>
>
> When i assign a region A the process is as follows:
> step1 : A is assigned to rs1 , and rs1 fails to open it .
> step2 : assignProcedure handleFailure .
> step3 : A is assign to rs2 and rs success to open it .
> Above is the normal flow . However when rs1 is restart after the reigon A was 
> split and GCRegionProcedure was successed , the region A appare again !
> The region is that reigon A is not removed from the serverMap correctly when 
> assignprocedure handleFailure . Because the code regionNode.offline() make 
> the regionNode's regionLocation to be null and make regionNode's state to 
> OFFLINE . So when the code 
> env.getAssignmentManager().undoRegionAsOpening(regionNode) do nothing . So 
> when the rs1 restart event triggers a serverCrashProcedure, it will get 
> reigons from serverMap and it will get the region A then A will be assigned 
> and hdfs dir will be created. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to