[jira] [Commented] (HBASE-4015) Refactor the TimeoutMonitor to make it less racy

ramkrishna.s.vasudevan (JIRA) Wed, 10 Aug 2011 02:16:24 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082242#comment-13082242
 ]


ramkrishna.s.vasudevan commented on HBASE-4015:
-----------------------------------------------

@Stack, 
{noformat}
Do we need new RS_ALLOCATE state? Could we just have OFFLINE plus your 
suggestion of adding RS name so its OFFLINE+RS_TO_OPEN_REGION_ON?
{noformat}
Yes this may also be possible.  But we thought of introducing a new state so 
that there is a clear distinction whether reallocation has happened or not and 
also handling of the new state may be cleaner than changing the behaviour in 
the existing state.

{noformat}
What happens if we assign the region back to RS1 (it can happen).
{noformat}
Yes.  we have considered this scenario also.  If the region is reallocated to 
the same RS there are two flows

-> If the state is OPENING in zk but it is still not added to online regions 
list in RS then any subsequent call from MASTER to RS with RE_ALLOCATE state 
will succeed but the previous processing from OPENING to OPEN will fail.
-> In the second case if the region is added to the online regions list then 
the RS will say ALREADY_OPENED and before removing from RIT in master we will 
check if the node is deleted if not it will not be removed from RIT. Hence the 
state will be in  PENDING_OPEN so subsequent timeout monitor call will handle 
it.

Pls provide your suggestions.


> Refactor the TimeoutMonitor to make it less racy
> ------------------------------------------------
>
>                 Key: HBASE-4015
>                 URL: https://issues.apache.org/jira/browse/HBASE-4015
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 0.90.3
>            Reporter: Jean-Daniel Cryans
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: Timeoutmonitor with state diagrams.pdf
>
>
> The current implementation of the TimeoutMonitor acts like a race condition 
> generator, mostly making things worse rather than better. It does it's own 
> thing for a while without caring for what's happening in the rest of the 
> master.
> The first thing that needs to happen is that the regions should not be 
> processed in one big batch, because that sometimes can take minutes to 
> process (meanwhile a region that timed out opening might have opened, then 
> what happens is it will be reassigned by the TimeoutMonitor generating the 
> never ending PENDING_OPEN situation).
> Those operations should also be done more atomically, although I'm not sure 
> how to do it in a scalable way in this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4015) Refactor the TimeoutMonitor to make it less racy

Reply via email to