[jira] [Commented] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Ted Yu (JIRA) Tue, 09 Aug 2011 11:01:54 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081796#comment-13081796
 ]


Ted Yu commented on HBASE-4168:
-------------------------------

This happened in our staging cluster this morning.
System event log:
{code}
Tue Aug 09 2011 14:52:54                System Software event: OS Stop sensor, 
run-time critical stop was asserted      0.000010
{code}
Master came down after that. Here is snippet of master log:
{code}
2011-08-09 15:12:13,147 FATAL org.apache.hadoop.hbase.master.HMaster: 
verifyAndAssignRoot failed after10 times retries, aborting
java.net.NoRouteToHostException: No route to host
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
        at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
        at 
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
        at 
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
        at $Proxy8.getRegionInfo(Unknown Source)
        at 
org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:426)
        at 
org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:473)
        at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:91)
        at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:110)
        at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:163)
        at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:156)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2011-08-09 15:12:13,147 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
2011-08-09 15:12:13,147 ERROR org.apache.hadoop.hbase.executor.EventHandler: 
Caught throwable while processing event M_META_SERVER_SHUTDOWN
java.io.IOException: Aborting
        at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:119)
        at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:163)
        at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:156)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.NoRouteToHostException: No route to host
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
        at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
        at 
org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
        at 
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
        at $Proxy8.getRegionInfo(Unknown Source)
        at 
org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:426)
        at 
org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:473)
        at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:91)
        at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:110)
        ... 5 more
2011-08-09 15:12:13,809 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping 
service threads
{code}

> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Minor
>         Attachments: HBASE-4168(2).patch, HBASE-4168-revised.patch, 
> HBASE-4168.patch, hbase-hadoop-master-msgstore232.snc4.facebook.com.log
>
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I 
> kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also 
> able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get 
> it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. 
> I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen 
> elsewhere.
> The client is able to talk to the META and find the new kv location and get 
> it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. 
> I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen 
> elsewhere.
> The client is able to talk to the META and find the new kv location and get 
> it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I 
> power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a 
> really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META 
> keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the 
> client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Reply via email to