Demai Ni created HBASE-10476:
--------------------------------
Summary: HBase Master log grows very fast after stopped hadoop
(due to connection exception)
Key: HBASE-10476
URL: https://issues.apache.org/jira/browse/HBASE-10476
Project: HBase
Issue Type: Bug
Reporter: Demai Ni
hbase 96.0(probably the same issue on 94.x) on single node cluster. At some
point, we stopped Hadoop, but keep hbase running. As expected, hbase began to
throw connection errors.
For the first hour, the regionserver log grows by ~10MB, and master log doesn't
grow much, which is ok.
{code:title=log size after one hour}
-rw-rw-r-- 1 biadmin biadmin 497959 Feb 5 10:36
hbase-biadmin-master-hdtest014.svl.ibm.com.log
...
-rw-rw-r-- 1 biadmin biadmin 8865371 Feb 5 10:37
hbase-biadmin-regionserver-hdtest014.svl.ibm.com.log
{code}
However, within 4 hours, the Master log grows to 13GB. And it only stops due to
out of disk space.
{code:title=log size after 4 hour}
-rw-rw-r-- 1 biadmin biadmin 3521880064 Feb 5 14:10
hbase-biadmin-master-hdtest014.svl.ibm.com.log
-rw-rw-r-- 1 biadmin biadmin 10737418582 Feb 5 11:25
hbase-biadmin-master-hdtest014.svl.ibm.com.log.1
...
-rw-rw-r-- 1 biadmin biadmin 11222365 Feb 5 10:49
hbase-biadmin-regionserver-hdtest014.svl.ibm.com.log
{code}
The exception/error message filled out Master log is
{code:title=Error message filling up Master log}
2014-02-05 11:37:48,688 INFO
org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler: Splitting
hbase:meta logs for hdtest014.svl.ibm.com,60020,1391622549030
2014-02-05 11:37:48,689 ERROR org.apache.hadoop.hbase.executor.EventHandler:
Caught throwable while processing event M_META_SERVER_SHUTDOWN
java.io.IOException: failed log splitting for
hdtest014.svl.ibm.com,60020,1391622549030, will retry
at
org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:70)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:906)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:929)
at java.lang.Thread.run(Thread.java:738)
Caused by: java.net.ConnectException: Call From
hdtest014.svl.ibm.com/9.30.194.23 to hdtest014.svl.ibm.com:9000 failed on
connection exception: java.net.ConnectException: Connection refused; For more
details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor5.newInstance(Unknown Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:39)
at java.lang.reflect.Constructor.newInstance(Constructor.java:527)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1351)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy8.getFileInfo(Unknown Source)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy8.getFileInfo(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:266)
at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1406)
at
org.apache.hadoop.hbase.master.MasterFileSystem.getLogDirs(MasterFileSystem.java:317)
at
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:405)
at
org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:301)
at
org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:292)
at
org.apache.hadoop.hbase.master.handler.MetaServerShutdownHandler.process(MetaServerShutdownHandler.java:63)
... 4 more
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:614)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
at org.apache.hadoop.ipc.Client.call(Client.java:1318)
{code}
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)