[
https://issues.apache.org/jira/browse/HBASE-19533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297657#comment-16297657
]
stack commented on HBASE-19533:
-------------------------------
Let me schedule this one. There's a little prob. in shutdown we'll have to fix.
I noticed it over in TestRegionsOnMasterOptions. It fails occasionally. All
tests pass but for the last one in the list, testRegionsOnAllServers,
concurrent w/ the shutdown of the cluster, the balancer happens to run and
schedules a region move. In the unassign of move, all is fine until the update
of hbase:meta w/ changed state. hbase:meta has just closed so we are stuck
trying to reach a server that won't come back. We'll retry up to the max.
Meantime we block the shutdown of the Master. The running thread in the hbase
client is actually a daemon thread. Its the retrying thread holding a lock that
prevents the shutdown.
Will be back here.
To reliably reproduce, remove this bit I just added to
TestRegionsOnMasterOptions:
// Disable balancer and wait till RIT done else cluster won't go down.
TEST_UTIL.getAdmin().balancerSwitch(false, true);
while (true) {
if (!TEST_UTIL.getHBaseCluster().getMaster().getAssignmentManager().
isMetaRegionInTransition()) {
break;
}
Threads.sleep(10);
}
> How to do controlled shutdown in branch-2?
> ------------------------------------------
>
> Key: HBASE-19533
> URL: https://issues.apache.org/jira/browse/HBASE-19533
> Project: HBase
> Issue Type: Task
> Reporter: stack
> Fix For: 2.0.0-beta-2
>
>
> Before HBASE-18946, setting shutdown of a cluster, the Master would exit
> immediately. RegionServers would run region closes and then try and notify
> the Master of the close and would spew exceptions that the Master was
> unreachable.
> This is different to how branch-1 used to do it. It used to keep Master up
> and it would be like the captain of the ship, the last to go down. As of
> HBASE-18946, this is again the case but there are still open issues.
> # Usually Master does all open and close of regions. On cluster shutdown, it
> is the one time where the Regions run the region close. Currently, the
> regions report the close to the Master which disregards the message since it
> did not start the region closes. Should we do different? Try and update state
> in hbase:meta setting it to CLOSE? We might not be able to write CLOSE for
> all regions since hbase:meta will be closing too (the RS that is hosting
> hbase:meta will close it last.... but that may not be enough).
> # Should the Master run the cluster shutdown sending out close for all
> regions? What if cluster of 1M regions? Untenable? Send a message per server?
> That might be better.
> Anyways, this needs attention. Filing issue in meantime.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)