[ 
https://issues.apache.org/jira/browse/HBASE-8281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-8281.
----------------------------------------
    Resolution: Incomplete

> Unassigned regions: dropped messages from Master to RS
> ------------------------------------------------------
>
>                 Key: HBASE-8281
>                 URL: https://issues.apache.org/jira/browse/HBASE-8281
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.89-fb
>            Reporter: Amitanand Aiyer
>            Priority: Major
>
> We have seen a couple of scenarios where transcient network issue between the 
> RS and Master results in regions being unassigned (and staying unassigned) 
> until someone intervenes manually with hbck -fix.
> The events occur as follows. 
> RS checks in for a regionServerReport.
>   Master wants to assign a region to the RS. Hence adds a MSG_REGION_OPEN msg 
> to the return results, and marks the region as PENDING_OPEN.
>   The messages from the master to the RS is not delivered due to network 
> error. Master does not do anything to revert the state changes.
> Network heals, and the RS is able to do regionServerReports in future; it is 
> in good standing with the master. But, RS does not know that it has to open 
> the region. Master thinks that the RS is going to open the region.
> Region remains unassigned until we intervene with hbck.
> Possible fix:
>   I think it may be a mistake to unilaterally change the RegionState to 
> pendingOpen once the master decides that it wants to send the message. 
> Perhaps, we should create an intermediate state, where the master will keep 
> sending the OPEN message to the RS until it acks. And, update the RegionState 
> to PendingOpen only after the RS has acked.
> While this would fix the particular scenario in which the unassigned regions 
> were caused. We might want to update all the Master-RS communication (even 
> region closes?)to expect message failures, and wait for an ack before it 
> updates the state in master.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to