[ 
https://issues.apache.org/jira/browse/HBASE-22631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906099#comment-16906099
 ] 

Wellington Chevreuil commented on HBASE-22631:
----------------------------------------------

{quote}I am sorry the rs log is lost for the reinstallation of the physical 
machine. As shown in the picture above the rs show the missing table descriptor 
exception message
{quote}
That's unfortunate, it would be nice to have a broader look at the logs. Table 
descriptor is not something stored on RSes, but on hdfs. RS tempt20 failed to 
assign region because of this, but then tempt21 succeeded, it was most likely a 
temporary hdfs ditch, or RS tempt20 was having problems to read files from hdfs.
{quote}The incorrect message is saved in master memory structure serverMap and 
if we restart master before tempt20 crashes
{quote}
_AssignmentManager.regionStateStore_ cache keeps two maps where regions states 
are tracked: _regionsMap_ and _serverMap_. Once the region got split and moved 
to CLOSED state, _serverMap_ would had been updated accordingly by the 
_UnassignProcedure_ (pid=3, from your logs) 
[here|https://github.com/apache/hbase/blob/branch-2.1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java#L234].
 And since the _GCRegionProcedure_ has completed successfully, _regionsMap_ 
would also had been updated at *GC_REGION_PURGE_METADATA* 
[stage|https://github.com/apache/hbase/blob/branch-2.1/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/GCRegionProcedure.java#L115].
 So it looks more that something caused master to load outdated information, 
such as HBASE-21843. It's hard to confirm without the proper logs. Are you able 
to consistently reproduce it?

> assign failed may make gced parent region appear again !!!
> ----------------------------------------------------------
>
>                 Key: HBASE-22631
>                 URL: https://issues.apache.org/jira/browse/HBASE-22631
>             Project: HBase
>          Issue Type: Bug
>          Components: proc-v2
>    Affects Versions: 2.1.1
>            Reporter: yuhuiyang
>            Priority: Major
>         Attachments: HBASE-22631-branch-2.1-01.patch, assign.png, 
> assignProcedure.txt, serverCrash.png, splitAndGc.png
>
>
> When i assign a region A the process is as follows:
> step1 : A is assigned to rs1 , and rs1 fails to open it .
> step2 : assignProcedure handleFailure .
> step3 : A is assign to rs2 and rs success to open it .
> Above is the normal flow . However when rs1 is restart after the reigon A was 
> split and GCRegionProcedure was successed , the region A appare again !
> The region is that reigon A is not removed from the serverMap correctly when 
> assignprocedure handleFailure . Because the code regionNode.offline() make 
> the regionNode's regionLocation to be null and make regionNode's state to 
> OFFLINE . So when the code 
> env.getAssignmentManager().undoRegionAsOpening(regionNode) do nothing . So 
> when the rs1 restart event triggers a serverCrashProcedure, it will get 
> reigons from serverMap and it will get the region A then A will be assigned 
> and hdfs dir will be created. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to