[jira] [Updated] (HBASE-4397) "-ROOT-", ".META." table stay offline for too long in the case of all RSs are shutdown at the same time

Ming Ma (Updated) (JIRA) Thu, 29 Dec 2011 23:32:07 -0800

     [ 
https://issues.apache.org/jira/browse/HBASE-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ming Ma updated HBASE-4397:
---------------------------

    Attachment: HBASE-4397-0.92.patch

There are two ways to address the issue.

1. One way is to have special handling for "-ROOT-" and ".META." tables.
2. Another way is to handle "all RSs just come back online while master is up 
all the time" scenario for all the regions.

The patch uses the second approach.
                
> "-ROOT-", ".META." table stay offline for too long in the case of all RSs are 
> shutdown at the same time
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-4397
>                 URL: https://issues.apache.org/jira/browse/HBASE-4397
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HBASE-4397-0.92.patch
>
>
> 1. Shutdown all RSs.
> 2. Bring all RS back online.
> The "-ROOT-", ".META." stay in offline state until timeout monitor force 
> assignment 30 minutes later. That is because HMaster can't find a RS to 
> assign the tables to in assign operation.
> 011-09-13 13:25:52,743 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
> Failed assignment of -ROOT-,,0.70236052 to sea-lab-4,60020,1315870341387, 
> trying to assign elsewhere instead; retry=0
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>         at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:345)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1002)
>         at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:854)
>         at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:148)
>         at $Proxy9.openRegion(Unknown Source)
>         at 
> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:407)
>         at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1408)
>         at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1153)
>         at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1128)
>         at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1123)
>         at 
> org.apache.hadoop.hbase.master.AssignmentManager.assignRoot(AssignmentManager.java:1788)
>         at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:100)
>         at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:118)
>         at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:181)
>         at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:167)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 2011-09-13 13:25:52,743 WARN 
> org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a viable 
> location to assign region -ROOT-,,0.70236052
> Possible fixes:
> 1. Have serverManager handle "server online" event similar to how 
> RegionServerTracker.java calls servermanager.expireServer in the case server 
> goes down.
> 2. Make timeoutMonitor handle the situation better. This is a special 
> situation in the cluster. 30 minutes timeout can be skipped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4397) "-ROOT-", ".META." table stay offline for too long in the case of all RSs are shutdown at the same time

Reply via email to