[
https://issues.apache.org/jira/browse/HBASE-9635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774803#comment-13774803
]
Jean-Daniel Cryans commented on HBASE-9635:
-------------------------------------------
The problem is pretty clear from the log: "Too many open files", your host is
hosed.
> HBase Table regions are not getting re-assigned to the new region server when
> it comes up (when the existing region server not able to handle the load)
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-9635
> URL: https://issues.apache.org/jira/browse/HBASE-9635
> Project: HBase
> Issue Type: Bug
> Components: master, regionserver
> Affects Versions: 0.94.11
> Environment: SuSE11
> Reporter: shankarlingayya
>
> {noformat}
> HBase Table regions are not getting assigned to the new region server for a
> period of 30 minutes (when the existing region server not able to handle the
> load)
> Procedure:
> 1. Setup Non HA Hadoop Cluster with two nodes (Node1-XX.XX.XX.XX,
> Node2-YY.YY.YY.YY)
> 2. Install Zookeeper & HRegionServer in Node-1
> 3. Install HMaster & HRegionServer in Node-2
> 4. From Node2 create HBase Table ( table name 't1' with one column family
> 'cf1' )
> 5. Perform addrecord 99649 rows
> 6. kill both the node Region Server and limit the Node1 Region Server FD to
> 600
> 7. Start only the Node1 Region server ==> so that FD exhaust can happen in
> Node1 Region Server
> 8. After some 5-10 minuites start the Node2 Region Server
> ===> Huge number of regions of table 't1' are in OPENING state, which are not
> getting re assigned to the Node2 region server which is free.
> ===> When the new region server comes up then the master should detect and
> allocate the open failed regions to the region server (here it is staying the
> OPENINING state for 30 minutes which will have huge impcat user app which
> makes use of this table)
> 2013-09-23 18:46:12,160 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
> Instantiated t1,row507465,1379937224590.2d9fad2aee78103f928d8c7fe16ba6cd.
> 2013-09-23 18:46:12,160 ERROR
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open
> of region=t1,row507465,1379937224590.2d9fad2aee78103f928d8c7fe16ba6cd.,
> starting to roll back the global memstore size.
> 2013-09-23 18:50:55,284 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to
> renew lease for
> [DFSClient_hb_rs_HOST-XX.XX.XX.XX,61020,1379940823286_-641204614_48] for 309
> seconds. Will retry shortly ...
> java.io.IOException: Failed on local exception: java.net.SocketException: Too
> many open files; Host Details : local host is:
> "HOST-XX.XX.XX.XX/XX.XX.XX.XX"; destination host is: "HOST-XX.XX.XX.XX":8020;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
> at org.apache.hadoop.ipc.Client.call(Client.java:1351)
> at org.apache.hadoop.ipc.Client.call(Client.java:1300)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at $Proxy13.renewLease(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:188)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at $Proxy13.renewLease(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:522)
> at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:679)
> at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417)
> at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442)
> at
> org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
> at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.SocketException: Too many open files
> at sun.nio.ch.Net.socket0(Native Method)
> at sun.nio.ch.Net.socket(Net.java:97)
> at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84)
> at
> sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37)
> at java.nio.channels.SocketChannel.open(SocketChannel.java:105)
> at
> org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:62)
> at
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:523)
> at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
> at
> org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
> at org.apache.hadoop.ipc.Client.call(Client.java:1318)
> ... 16 more
> 2013-09-23 18:50:56,285 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to
> renew lease for
> [DFSClient_hb_rs_HOST-XX.XX.XX.XX,61020,1379940823286_-641204614_48] for 310
> seconds. Will retry shortly ...
> java.io.IOException: Failed on local exception: java.net.SocketException: Too
> many open files; Host Details : local host is:
> "HOST-XX.XX.XX.XX/XX.XX.XX.XX"; destination host is: "HOST-XX.XX.XX.XX":8020;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
> at org.apache.hadoop.ipc.Client.call(Client.java:1351)
> at org.apache.hadoop.ipc.Client.call(Client.java:1300)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at $Proxy13.renewLease(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:188)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at $Proxy13.renewLease(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:522)
> at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:679)
> at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:417)
> at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:442)
> at
> org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
> at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.SocketException: Too many open files
> at sun.nio.ch.Net.socket0(Native Method)
> at sun.nio.ch.Net.socket(Net.java:97)
> at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84)
> at
> sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37)
> at java.nio.channels.SocketChannel.open(SocketChannel.java:105)
> at
> org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:62)
> at
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:523)
> at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
> at
> org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
> at org.apache.hadoop.ipc.Client.call(Client.java:1318)
> ... 16 more
> 2013-09-23 18:50:57,287 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to
> renew lease for
> [DFSClient_hb_rs_HOST-XX.XX.XX.XX,61020,1379940823286_-641204614_48] for 311
> seconds. Will retry shortly ...
> java.io.IOException: Failed on local exception: java.net.SocketException: Too
> many open files; Host Details : local host is:
> "HOST-XX.XX.XX.XX/XX.XX.XX.XX"; destination host is: "HOST-XX.XX.XX.XX":8020;
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira