[
https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097144#comment-13097144
]
ramkrishna.s.vasudevan commented on HBASE-4015:
-----------------------------------------------
I tried to write testcases to simulate the scenario for timeout monitor,
Based on the state in RIT we need to decide the operations. So I tried like
before assigning some regions start a thread that has gets the reference to
assignement manager and gets the list of RITS(checking the state of RIT in a
loop). Based on the state is PENDING_OPEN or OPENING try changing the state in
znode to CLOSING.
Now when we try to check the status there is no guarentee that the moment we
get the status and when the RS will process it to OPENED.
Trying to do something similar to TestMasterFailOver. But in
TestMasterFailOver the Master is aborted and then we do the changes to the
znode.
But when the master is running, the operation of assigning regions is
asynchronous and we cannot gurantee reliability.
> Refactor the TimeoutMonitor to make it less racy
> ------------------------------------------------
>
> Key: HBASE-4015
> URL: https://issues.apache.org/jira/browse/HBASE-4015
> Project: HBase
> Issue Type: Sub-task
> Affects Versions: 0.90.3
> Reporter: Jean-Daniel Cryans
> Assignee: ramkrishna.s.vasudevan
> Priority: Blocker
> Fix For: 0.92.0
>
> Attachments: HBASE-4015_1_trunk.patch, HBASE-4015_2_trunk.patch,
> HBASE-4015_reprepared_trunk_2.patch, Timeoutmonitor with state diagrams.pdf
>
>
> The current implementation of the TimeoutMonitor acts like a race condition
> generator, mostly making things worse rather than better. It does it's own
> thing for a while without caring for what's happening in the rest of the
> master.
> The first thing that needs to happen is that the regions should not be
> processed in one big batch, because that sometimes can take minutes to
> process (meanwhile a region that timed out opening might have opened, then
> what happens is it will be reassigned by the TimeoutMonitor generating the
> never ending PENDING_OPEN situation).
> Those operations should also be done more atomically, although I'm not sure
> how to do it in a scalable way in this case.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira