[
https://issues.apache.org/jira/browse/HBASE-921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jim Kellerman resolved HBASE-921.
---------------------------------
Resolution: Fixed
Committed to branch and trunk.
> region close and open processed out of order; makes for disagreement between
> master and regionserver on region state
> --------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-921
> URL: https://issues.apache.org/jira/browse/HBASE-921
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.18.0
> Reporter: stack
> Assignee: Jim Kellerman
> Priority: Blocker
> Fix For: 0.18.1, 0.19.0
>
> Attachments: 921-0.18.0.patch
>
>
> Master assigns region X successfully. It then decides to close it because it
> wants it opened elsewhere as part of region rebalancing. Both the open and
> close operations are reported back to the master. Both have operation
> processing components that are added to the todo list to be processed in
> another thread outside of the master's main loop.
> The close operation does the bulk of its work inline with the master main
> processing loop. Its todo component does some work if the region is offlined
> but otherwise nothing of consequence whereas the open in its todo does the
> important meta catalog table update with the new location information.
> Its been fairly common here on our cluster where the master todo queue is
> occupied processing the shutdown of a regionserver. It takes a long time to
> process the shutdown of a regionserver when thousands of regions This
> latter delays the processing of the open and close todos. In effect the open
> is running after the close. The region goes into limbo. Only a restart of
> the 'hosting' regionserver 'fixes' this state.
> This is a particular case of the general HBASE-543 issue. Its happening alot
> here on our cluster so will hack up a fix for this and get it into TRUNK and
> backport it to 0.18.1.
> Jim Firby here had a good idea for conditions like this. Clients should be
> able to say "I've asked for a regions location 10 times now and Mr. Master,
> you've given me the same response ten times in a row and each time, the
> answer was wrong. Revisit any notion that said region is at said location".
> Mr. Master would then go off and do something drastic like close and reassign
> the region.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.