[
https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083965#comment-13083965
]
stack commented on HBASE-4015:
------------------------------
bq. Timeout monitor DOESNOT preempt an znode to OFFLINE if in PENDING_OPEN
state.
Ok.
I think I understand now. The addition of new state breaks the move to OPENING
because the check for a previous OFFLINE state will fail... so the RS will no
proceed with the open.
But in fig (iii) in your doc. you check previous state is REALLOCATE? How is
this case different from the fig (i) where you check for OFFLINE? Won't your
code have to check for both REALLOCATE and OFFLINE and the presence of either
mean its ok to procede to OPENING (and then aren't REALLOCATE and OFFLINE the
'same' state because the presence of either will mean proceed to OPENING?).
I suppose the presence of the RS name will help. If its the 'same' name, then
we can proceed to OPENING and so what if OFFLINE was hijacked and became a
REALLOCATE. If they are not the same, then we'd abort the open.
So, why not just add machine name to OFFLINE? Then we don't need REALLOCATE
state? (Ideally it would be best if master told the regionserver the version of
the znode to expect when it goes to move the znode to OPENING but that looks
hard to pass from the master over to the RS EventHandlers).
So, figuring how to do deal with timeout of regions in PENDING_OPEN is one
aspect of this issue, right? The verification of state over in timeout monitor
before acting is another aspect?
You are working on TRUNK Ram? (I believe it acts a little differently from 0.90
because of recent work done in here).
Good stuff Ram. Thanks for digging into this.
> Refactor the TimeoutMonitor to make it less racy
> ------------------------------------------------
>
> Key: HBASE-4015
> URL: https://issues.apache.org/jira/browse/HBASE-4015
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 0.90.3
> Reporter: Jean-Daniel Cryans
> Assignee: ramkrishna.s.vasudevan
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: HBASE-4015_1_trunk.patch, Timeoutmonitor with state
> diagrams.pdf
>
>
> The current implementation of the TimeoutMonitor acts like a race condition
> generator, mostly making things worse rather than better. It does it's own
> thing for a while without caring for what's happening in the rest of the
> master.
> The first thing that needs to happen is that the regions should not be
> processed in one big batch, because that sometimes can take minutes to
> process (meanwhile a region that timed out opening might have opened, then
> what happens is it will be reassigned by the TimeoutMonitor generating the
> never ending PENDING_OPEN situation).
> Those operations should also be done more atomically, although I'm not sure
> how to do it in a scalable way in this case.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira