[ 
https://issues.apache.org/jira/browse/HBASE-25059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17198508#comment-17198508
 ] 

Nick Dimiduk commented on HBASE-25059:
--------------------------------------

Thanks [~zhangduo]. [~stack] also mentioned this to me as being core design 
assumption in AM. I'm very confused by this decision and would like to 
understand it better. Is there some design document I can reference, or 
comments in a part of the code where this is explained in more detail? It seems 
to me that we should have a better fencing mechanism than this for avoiding 
double-assign of a region.

> TransitionRegionStateProcedure should timeout, rollback, retry instead of 
> waiting infinitely on CONFIRMED_OPEN
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-25059
>                 URL: https://issues.apache.org/jira/browse/HBASE-25059
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 2.3.2
>            Reporter: Nick Dimiduk
>            Priority: Major
>
> Testing 2.3.2RC1 with ITBLL. The region server assigned to open meta locked 
> up due to HBASE-24896. Meanwhile, the master waits indefinitely on a 
> procedure {{pid=176583, ppid=176532, 
> state=WAITING:REGION_STATE_TRANSITION_CONFIRM_OPENED; 
> TransitRegionStateProcedure table=hbase:meta, region=1588230740, ASSIGN}}.
> AssignmentManager needs a way to rescind assignment when a RS fails to 
> complete within a reasonable timeout window, roll back the procedure, and try 
> again with a new target.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to