[
https://issues.apache.org/jira/browse/HBASE-23186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954433#comment-16954433
]
Xiaolin Ha commented on HBASE-23186:
------------------------------------
ZK session expired,and master aborted.
{quote}2019-10-16,23:49:41,611 INFO
[main-SendThread(tj1-hadoop-staging-ct05.kscn:11000)]
org.apache.zookeeper.ClientCnxn: Session establishment complete on server
tj1-hadoop-staging-ct05.kscn/10.38.166.12:11000, sessionid = 0x46cfbd296b7e62b,
negotiated timeout = 20000
2019-10-17,00:15:26,253 INFO
[master/tj1-hadoop-staging-ct02:22500.splitLogManager..Chore.1]
org.apache.hadoop.hbase.ScheduledChore: Chore: SplitLogManager Timeout Monitor
missed its start time
2019-10-17,00:15:37,357 INFO
[master/tj1-hadoop-staging-ct02:22500.splitLogManager..Chore.1]
org.apache.hadoop.hbase.ScheduledChore: Chore: SplitLogManager Timeout Monitor
missed its start time
2019-10-17,00:15:48,168 INFO
[master/tj1-hadoop-staging-ct02:22500.splitLogManager..Chore.1]
org.apache.hadoop.hbase.ScheduledChore: Chore: SplitLogManager Timeout Monitor
missed its start time
2019-10-17,00:15:50,285 INFO
[master/tj1-hadoop-staging-ct02:22500.splitLogManager..Chore.1]
org.apache.hadoop.hbase.ScheduledChore: Chore: SplitLogManager Timeout Monitor
missed its start time
2019-10-17,00:15:57,972 INFO
[main-SendThread(tj1-hadoop-staging-ct05.kscn:11000)]
org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from
server in 24963ms for sessionid 0x46cfbd296b7e62b, closing socket connection
and attempting reconnect
2019-10-17,00:15:59,505 WARN [master/tj1-hadoop-staging-ct02:22500]
org.apache.hadoop.hbase.util.Sleeper: We slept 24551ms instead of 3000ms, this
is likely due to a long garbage collecting pause and it's usually bad, see
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2019-10-17,00:16:01,733 INFO
[master/tj1-hadoop-staging-ct02:22500:becomeActiveMaster-SendThread(tj1-hadoop-staging-ct02.kscn:11000)]
org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from
server in 25436ms for sessionid 0x26cfbd28d32ffc0, closing socket connection
and attempting reconnect
2019-10-17,00:16:21,558 ERROR [main-EventThread]
org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded
coprocessors are: [org.apache.hadoop.hbase.security.access.AccessController,
org.apache.hadoop.hbase.security.access.SnapshotScannerHDFSAclController,
org.apache.hadoop.hbase.quotas.MasterQuotasObserver,
org.apache.hadoop.hbase.master.ThemisMasterObserver]
2019-10-17,00:16:21,595 INFO
[ReadOnlyZKClient-tjwq02tst.zk.hadoop.srv:11000@0x62a10a8c-SendThread(tj1-hadoop-staging-ct04.kscn:11000)]
org.apache.zookeeper.ClientCnxn: Session establishment complete on server
tj1-hadoop-staging-ct04.kscn/10.38.162.36:11000, sessionid = 0x36cfbd28d810e26,
negotiated timeout = 20000
2019-10-17,00:16:21,632 ERROR [main-EventThread]
org.apache.hadoop.hbase.master.HMaster: ***** ABORTING master
tj1-hadoop-staging-ct02.kscn,22500,1571009509049:
master:22500-0x46cfbd296b7e62b, quorum=tjwq02tst.zk.hadoop.srv:11000,
baseZNode=/hbase/tjwq02tst-staging master:22500-0x46cfbd296b7e62b received
expired from ZooKeeper, aborting *****
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired
at
org.apache.hadoop.hbase.zookeeper.ZKWatcher.connectionEvent(ZKWatcher.java:563)
at
org.apache.hadoop.hbase.zookeeper.ZKWatcher.process(ZKWatcher.java:493)
at
org.apache.hadoop.hbase.zookeeper.PendingWatcher.process(PendingWatcher.java:40)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
at
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498){quote}
But master process stayed here and didn't exit.
When sent 'kill -9' to it, two threads exited and throwed Exceptions as follows.
We can see they are fsck threads from the error log info "hbase-hbck.lock"
{quote}{color:#de350b}2019-10-17,10:14:00,332 WARN [Thread-7099]
org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception{color}
{color:#de350b}java.io.FileNotFoundException: File does not exist:
/hbase/tjwq02tst-staging/.tmp/hbase-hbck.lock (inode 34898756) Holder
DFSClient_NONMAPREDUCE_405679236_1 does not have any open files.{color}
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2955)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:598)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:173)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2834)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:979)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:581)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:916)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1716)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2742)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1628)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1423)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:598)
Caused by:
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does
not exist: /hbase/tjwq02tst-staging/.tmp/hbase-hbck.lock (inode 34898756)
Holder DFSClient_NONMAPREDUCE_405679236_1 does not have any open files.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2955)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:598)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:173)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2834)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:979)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:581)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:916)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1716)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2742)
at org.apache.hadoop.ipc.Client.call(Client.java:1504)
at org.apache.hadoop.ipc.Client.call(Client.java:1435)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:426)
at sun.reflect.GeneratedMethodAccessor121.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:249)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:107)
at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor121.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
at com.sun.proxy.$Proxy21.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor121.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
at com.sun.proxy.$Proxy21.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1624)
... 2 more
{color:#de350b}2019-10-17,10:14:00,333 ERROR [Thread-9]
org.apache.hadoop.hdfs.DFSClient: Failed to close inode 34898756{color}
{color:#de350b}java.io.FileNotFoundException: File does not exist:
/hbase/tjwq02tst-staging/.tmp/hbase-hbck.lock (inode 34898756) Holder
DFSClient_NONMAPREDUCE_405679236_1 does not have any open files.{color}
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2955)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:598)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:173)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2834)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:979)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:581)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:916)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1716)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2742)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1628)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1423)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:598)
Caused by:
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does
not exist: /hbase/tjwq02tst-staging/.tmp/hbase-hbck.lock (inode 34898756)
Holder DFSClient_NONMAPREDUCE_405679236_1 does not have any open files.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2955)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:598)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:173)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2834)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:979)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:581)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:916)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1716)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2742)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1628)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1423)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:598)
Caused by:
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does
not exist: /hbase/tjwq02tst-staging/.tmp/hbase-hbck.lock (inode 34898756)
Holder DFSClient_NONMAPREDUCE_405679236_1 does not have any open files.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2955)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:598)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:173)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2834)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:979)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:581)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:916)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:862)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1716)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2742)
at org.apache.hadoop.ipc.Client.call(Client.java:1504)
at org.apache.hadoop.ipc.Client.call(Client.java:1435)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:426)
at sun.reflect.GeneratedMethodAccessor121.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:249)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:107)
at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor121.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
at com.sun.proxy.$Proxy21.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor121.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
at com.sun.proxy.$Proxy21.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1624)
... 2 more{quote}
> Set Fsck thread be daemon and close its OutputStream when master abort
> ----------------------------------------------------------------------
>
> Key: HBASE-23186
> URL: https://issues.apache.org/jira/browse/HBASE-23186
> Project: HBase
> Issue Type: Bug
> Components: master
> Reporter: Xiaolin Ha
> Assignee: Xiaolin Ha
> Priority: Major
>
> HBASE-21072 imported to use HBaseFsck as default in hbase2.
> {code:java}
> if (this.conf.getBoolean("hbase.write.hbck1.lock.file", true)) {
> HBaseFsck.checkAndMarkRunningHbck(this.conf,
> HBaseFsck.createLockRetryCounterFactory(this.conf).create());
> }{code}
> But the fsck thread is not daemon,
> {code:java}
> public static Pair<Path, FSDataOutputStream>
> checkAndMarkRunningHbck(Configuration conf,
> RetryCounter retryCounter) throws IOException {
> FileLockCallable callable = new FileLockCallable(conf, retryCounter);
> ExecutorService executor = Executors.newFixedThreadPool(1);
> ...{code}
> This will make JVM not exit.
> We should set it be daemon and close the dfs output stream when master
> abort/stop.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)