[
https://issues.apache.org/jira/browse/HIVE-15997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15998961#comment-15998961
]
Yongzhi Chen commented on HIVE-15997:
-------------------------------------
This interrupt() in our code can cause InterruptedException when the code is
doing(or is scheduled to do) hdfs file operations or zookeeper lock operations.
For we do not have many long single file operations or long single lock
operations, but we do have many fast operations, the performance of cancel
operation will not affect by adding checkpoints . I found that the thread
interrupt() has no effect for some running operations: for example I tried to
interrupt HMS client who is waiting for a response of a long running API (for
example ListPartitions), the interrupt can not stop the waiting at all. And the
interrupt has some "delay effect", it causes InterruptedException later (for
example when the cleanup folder operations happen.) So we should not put the
Thread.currentThread().interrupt() in the heavily used method isInterrupted().
if, in the future, we find the place the interrupt() is really needed, we can
just add the code there.
> Resource leaks when query is cancelled
> ---------------------------------------
>
> Key: HIVE-15997
> URL: https://issues.apache.org/jira/browse/HIVE-15997
> Project: Hive
> Issue Type: Bug
> Reporter: Yongzhi Chen
> Assignee: Yongzhi Chen
> Fix For: 2.2.0
>
> Attachments: HIVE-15997.1.patch
>
>
> There may some resource leaks when query is cancelled.
> We see following stacks in the log:
> Possible files and folder leak:
> {noformat}
> 2017-02-02 06:23:25,410 WARN hive.ql.Context: [HiveServer2-Background-Pool:
> Thread-61]: Error Removing Scratch: java.io.IOException: Failed on local
> exception: java.nio.channels.ClosedByInterruptException; Host Details : local
> host is: "ychencdh511t-1.vpc.cloudera.com/172.26.11.50"; destination host is:
> "ychencdh511t-1.vpc.cloudera.com":8020;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
> at org.apache.hadoop.ipc.Client.call(Client.java:1476)
> at org.apache.hadoop.ipc.Client.call(Client.java:1409)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>
> at com.sun.proxy.$Proxy25.delete(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:535)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>
> at com.sun.proxy.$Proxy26.delete(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2059)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:675)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:671)
>
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:671)
>
> at org.apache.hadoop.hive.ql.Context.removeScratchDir(Context.java:405)
> at org.apache.hadoop.hive.ql.Context.clear(Context.java:541)
> at org.apache.hadoop.hive.ql.Driver.releaseContext(Driver.java:2109)
> at org.apache.hadoop.hive.ql.Driver.closeInProcess(Driver.java:2150)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1472)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1207)
> at
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:237)
>
> at
> org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88)
>
> at
> org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:293)
>
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
>
> at
> org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:306)
>
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.ClosedByInterruptException
> at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>
> at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:681)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
>
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
> at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:615)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:714)
> at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:376)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1525)
> at org.apache.hadoop.ipc.Client.call(Client.java:1448)
> ... 35 more
> 2017-02-02 12:26:52,706 INFO
> org.apache.hive.service.cli.operation.OperationManager:
> [HiveServer2-Background-Pool: Thread-23]: Operation is timed
> out,operation=OperationHandle [opType=EXECUTE_STATEMENT,
> getHandleIdentifier()=2af82100-94cf-4f26-abaa-c4b57c57b23c],state=CANCELED
> {format}
> Possible lock leak:
> Locks leak:
> {format}
> 2017-02-02 06:21:05,054 ERROR ZooKeeperHiveLockManager:
> [HiveServer2-Background-Pool: Thread-61]: Failed to release ZooKeeper lock:
> java.lang.InterruptedException
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:503)
> at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
> at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:871)
> at
> org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:238)
> at
> org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:233)
> at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
> at
> org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:230)
> at
> org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:214)
> at
> org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:41)
> at
> org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.unlockPrimitive(ZooKeeperHiveLockManager.java:488)
> at
> org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.unlockWithRetry(ZooKeeperHiveLockManager.java:466)
> at
> org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.unlock(ZooKeeperHiveLockManager.java:454)
> at
> org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.releaseLocks(ZooKeeperHiveLockManager.java:236)
> at
> org.apache.hadoop.hive.ql.Driver.releaseLocksAndCommitOrRollback(Driver.java:1175)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1432)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1207)
> at
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:237)
> at
> org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88)
> at
> org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:293)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
> at
> org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:306)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)