[ 
https://issues.apache.org/jira/browse/HBASE-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3032.
--------------------------

    Resolution: Cannot Reproduce

I took a look at stack trace and couldn't figure how it resulted in cluster 
shutdown in 0.20.

Anyways, in 0.90, cluster shutdown is setting of new flag -- cluster up flag 
over in zk -- and the only means of setting it is by explicit invocation of the 
HMaster#shutdown method:

{code}
  @Override
  public void shutdown() {
    this.serverManager.shutdownCluster();
    try {
      this.clusterStatusTracker.setClusterDown();
    } catch (KeeperException e) {
      LOG.error("ZooKeeper exception trying to set cluster as down in ZK", e);
    }
  }
{code}

Grepping shutdown, its not called if OOME or any other such not expected 
exception.

Shutdown is different from server stop.  The master can be stopped when various 
faults such as OOME or unexpected states (Master is rigged to fail fast while 
new master is new).

I'm closing this as 'Cannot Reproduce', not on 0.90 at least.

> Master - dont shut down cluster if you run into a fatal error
> -------------------------------------------------------------
>
>                 Key: HBASE-3032
>                 URL: https://issues.apache.org/jira/browse/HBASE-3032
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.89.20100621
>            Reporter: ryan rawson
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.0
>
>
> Saw this message:
> 2010-09-22 13:10:03,547 FATAL org.apache.hadoop.hbase.master.MetaScanner: 
> Caught error. Starting shutdown.
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:857)
>         at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:725)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:252)
>         at $Proxy1.openScanner(Unknown Source)
>         at 
> org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:182)
>         at 
> org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:73)
>         at 
> org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>         at 
> org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:156)
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:68)
> At this point the regionservers were instructed to exit, which caused more 
> problems than if the master just terminated itself.  
> This would prevent a backup master from picking up since the cluster is 
> terminating!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to