[
https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085543#comment-13085543
]
stack commented on HBASE-4015:
------------------------------
bq. So whats your idea Stack? Can i start digging as how many changes do we
need to make if we go with OFFLINE state and what are the interface changes etc.
That sounds good to me. I took a look and what I saw was that setting OFFLINE
state, its currently not easy getting back the znode seqid; might have to add
something here. Then, the seqid would have to be passed over the rpc when we
do open region. I'd say add a new open region method, one that takes two args
-- the region name and the seqid.. leave the old one in place and use -1 or
something to flag an open where no seqid has been passed (maybe shell wants to
do an open region and it won't have the seqid). Then I'd pass the seqid down
into the openhandler.... Then use it checking the znode seqid when we check
OFFLINE.
Something like that.
Good on you Ram.
> Refactor the TimeoutMonitor to make it less racy
> ------------------------------------------------
>
> Key: HBASE-4015
> URL: https://issues.apache.org/jira/browse/HBASE-4015
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 0.90.3
> Reporter: Jean-Daniel Cryans
> Assignee: ramkrishna.s.vasudevan
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: HBASE-4015_1_trunk.patch, Timeoutmonitor with state
> diagrams.pdf
>
>
> The current implementation of the TimeoutMonitor acts like a race condition
> generator, mostly making things worse rather than better. It does it's own
> thing for a while without caring for what's happening in the rest of the
> master.
> The first thing that needs to happen is that the regions should not be
> processed in one big batch, because that sometimes can take minutes to
> process (meanwhile a region that timed out opening might have opened, then
> what happens is it will be reassigned by the TimeoutMonitor generating the
> never ending PENDING_OPEN situation).
> Those operations should also be done more atomically, although I'm not sure
> how to do it in a scalable way in this case.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira