[ 
https://issues.apache.org/jira/browse/HBASE-19533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297657#comment-16297657
 ] 

stack commented on HBASE-19533:
-------------------------------

Let me schedule this one. There's a little prob. in shutdown we'll have to fix.

I noticed it over in TestRegionsOnMasterOptions. It fails occasionally. All 
tests pass but for the last one in the list, testRegionsOnAllServers, 
concurrent w/ the shutdown of the cluster, the balancer happens to run and 
schedules a region move. In the unassign of move, all is fine until the update 
of hbase:meta w/ changed state. hbase:meta has just closed so we are stuck 
trying to reach a server that won't come back. We'll retry up to the max. 
Meantime we block the shutdown of the Master. The running thread in the hbase 
client is actually a daemon thread. Its the retrying thread holding a lock that 
prevents the shutdown.

Will be back here.

To reliably reproduce, remove this bit I just added to 
TestRegionsOnMasterOptions:


      // Disable balancer and wait till RIT done else cluster won't go down.
      TEST_UTIL.getAdmin().balancerSwitch(false, true);
      while (true) {
        if (!TEST_UTIL.getHBaseCluster().getMaster().getAssignmentManager().
            isMetaRegionInTransition()) {
          break;
        }
        Threads.sleep(10);
      }

> How to do controlled shutdown in branch-2?
> ------------------------------------------
>
>                 Key: HBASE-19533
>                 URL: https://issues.apache.org/jira/browse/HBASE-19533
>             Project: HBase
>          Issue Type: Task
>            Reporter: stack
>             Fix For: 2.0.0-beta-2
>
>
> Before HBASE-18946, setting shutdown of a cluster, the Master would exit 
> immediately. RegionServers would run region closes and then try and notify 
> the Master of the close and would spew exceptions that the Master was 
> unreachable.
> This is different to how branch-1 used to do it. It used to keep Master up 
> and it would be like the captain of the ship, the last to go down. As of 
> HBASE-18946, this is again the case but there are still open issues.
>  # Usually Master does all open and close of regions. On cluster shutdown, it 
> is the one time where the Regions run the region close. Currently, the 
> regions report the close to the Master which disregards the message since it 
> did not start the region closes. Should we do different? Try and update state 
> in hbase:meta setting it to CLOSE? We might not be able to write CLOSE for 
> all regions since hbase:meta will be closing too (the RS that is hosting 
> hbase:meta will close it last.... but that may not be enough).
>  # Should the Master run the cluster shutdown sending out close for all 
> regions? What if cluster of 1M regions? Untenable? Send a message per server? 
> That might be better.
> Anyways, this needs attention. Filing issue in meantime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to