[
https://issues.apache.org/jira/browse/HBASE-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042478#comment-13042478
]
Jean-Daniel Cryans commented on HBASE-3937:
-------------------------------------------
I'm not sure that patch would be better, starting with the fact that it copies
a bunch of code from the next switch case.
Thinking more about this problem, I believe that in your original case you
almost had a double assignment (and the patch you propose would really make it
a double assignment).
Let's say the region times out on PENDING_OPEN but by the time it gets
processed it's already opened by the RS. What you had originally is that it
will keep bouncing because RS2 can't open the region, but now it should be able
to assign it since the ZK state is cleared.
It's still unclear to me why your RS1 didn't go through and finally opened it
(it should be in your logs tho), but we have to consider both possibilities.
I'm starting to think that there won't be any easy solution, we need to rewrite
how TimeoutMonitor does its thing. Anything else would just be bandaids that
will never fix all the problems.
The way it should work is the following:
- It should not create a list of unassigns and assigns, since by the time the
list is processed the situation probably changed (I witnessed that a lot).
- This means the action should be taken as we go through first loop.
- One of the major issues is the lack of atomicity, so any action taken should
first check the current state, keep the version number, decide of the
corrective measure and update the znode by expecting the version it first got.
- If the updating of the znode is successful, we know for sure that the
operation will be seen by the region servers.
- If it's not successful, the situation needs to be reassessed.
This is clearly not something for 0.90, that's one of the reasons in 0.90.3 we
set the timeout much higher than 30 seconds. That's my conclusion at the end of
HBASE-3669.
> Region PENDING-OPEN timeout with un-expected ZK node state leads to an
> endless loop
> -----------------------------------------------------------------------------------
>
> Key: HBASE-3937
> URL: https://issues.apache.org/jira/browse/HBASE-3937
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.3
> Reporter: Jieshan Bean
> Assignee: Jieshan Bean
> Fix For: 0.90.4
>
>
> I describe the scenario of how this problem happened:
> 1.HMaster assigned the region A to RS1. So the RegionState was set to
> PENDING_OPEN.
> 2.For there's too many opening requests, the open process on RS1 was blocked.
> 3.Some time later, TimeoutMonitor found the assigning of A was timeout. For
> the RegionState was in PENDING_OPEN, went into the following handler
> process(Just put the region into an waiting-assigning set):
> case PENDING_OPEN:
> LOG.info("Region has been PENDING_OPEN for too " +
> "long, reassigning region=" +
> regionInfo.getRegionNameAsString());
> assigns.put(regionState.getRegion(), Boolean.TRUE);
> break;
> So we can see that, under this case, we consider the ZK node state was
> OFFLINE. Indeed, in an normal disposal, it's OK.
> 4.But before the real-assigning, the requests of RS1 was disposed. So that
> affected the new-assigning. For it update the ZK node state from OFFLINE to
> OPENING.
> 5.The new assigning started, so it send region to open in RS2. But while the
> opening, it should update the ZK node state from OFFLINE to OPENING. For the
> current state is OPENING, so this operation failed.
> So this region couldn't be open success anymore.
> So I think, to void this problem , under the case of PENDING_OPEN of
> TiemoutMonitor, we should transform the ZK node state to OFFLINE first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira