[
https://issues.apache.org/jira/browse/HIVE-12904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111178#comment-15111178
]
Sergey Shelukhin commented on HIVE-12904:
-----------------------------------------
[~sseth] any insight?
> LLAP: deadlock in task scheduling
> ---------------------------------
>
> Key: HIVE-12904
> URL: https://issues.apache.org/jira/browse/HIVE-12904
> Project: Hive
> Issue Type: Bug
> Reporter: Hui Zheng
>
> {noformat}
> Thread 34107: (state = BLOCKED)
> -
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper.isInWaitQueue()
> @bci=0, line=690 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService.finishableStateUpdated(org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper,
> boolean) @bci=8, line=485 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService.access$1500(org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService,
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper,
> boolean) @bci=3, line=78 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper.finishableStateUpdated(boolean)
> @bci=27, line=733 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.QueryInfo$FinishableStateTracker.sourceStateUpdated(java.lang.String)
> @bci=76, line=210 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.QueryInfo.sourceStateUpdated(java.lang.String)
> @bci=5, line=164 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.QueryTracker.registerSourceStateChange(java.lang.String,
> java.lang.String,
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateProto)
> @bci=34, line=228 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.sourceStateUpdated(org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateUpdatedRequestProto)
> @bci=47, line=255 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.sourceStateUpdated(org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateUpdatedRequestProto)
> @bci=5, line=328 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemonProtocolServerImpl.sourceStateUpdated(com.google.protobuf.RpcController,
>
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateUpdatedRequestProto)
> @bci=5, line=105 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(com.google.protobuf.Descriptors$MethodDescriptor,
> com.google.protobuf.RpcController, com.google.protobuf.Message) @bci=80,
> line=13067 (Compiled frame)
> -
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(org.apache.hadoop.ipc.RPC$Server,
> java.lang.String, org.apache.hadoop.io.Writable, long) @bci=246, line=616
> (Compiled frame)
> - org.apache.hadoop.ipc.RPC$Server.call(org.apache.hadoop.ipc.RPC$RpcKind,
> java.lang.String, org.apache.hadoop.io.Writable, long) @bci=9, line=969
> (Compiled frame)
> - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=38, line=2151 (Compiled
> frame)
> - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=1, line=2147 (Compiled
> frame)
> -
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
> java.security.AccessControlContext) @bci=0 (Compiled frame)
> - javax.security.auth.Subject.doAs(javax.security.auth.Subject,
> java.security.PrivilegedExceptionAction) @bci=42, line=422 (Compiled frame)
> -
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
> @bci=14, line=1657 (Compiled frame)
> - org.apache.hadoop.ipc.Server$Handler.run() @bci=315, line=2145
> (Interpreted frame)
> and
> Thread 34500: (state = BLOCKED)
> -
> org.apache.hadoop.hive.llap.daemon.impl.QueryInfo$FinishableStateTracker.unregisterForUpdates(org.apache.hadoop.hive.llap.daemon.FinishableStateUpdateHandler)
> @bci=0, line=195 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.QueryInfo.unregisterFinishableStateUpdate(org.apache.hadoop.hive.llap.daemon.FinishableStateUpdateHandler)
> @bci=5, line=160 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.QueryFragmentInfo.unregisterForFinishableStateUpdates(org.apache.hadoop.hive.llap.daemon.FinishableStateUpdateHandler)
> @bci=5, line=143 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper.maybeUnregisterForFinishedStateNotifications()
> @bci=20, line=681 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$InternalCompletionListener.onSuccess(org.apache.tez.runtime.task.TaskRunner2Result)
> @bci=32, line=548 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$InternalCompletionListener.onSuccess(java.lang.Object)
> @bci=5, line=535 (Compiled frame)
> - com.google.common.util.concurrent.Futures$4.run() @bci=55, line=1149
> (Compiled frame)
> -
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
> @bci=95, line=1142 (Compiled frame)
> - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617
> (Interpreted frame)
> - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> "IPC Server handler 0 on 15001":
> waiting to lock Monitor@0x00007f5d322ecb08 (Object@0x00007f67032cd2c0, a
> org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService$TaskWrapper),
> which is held by "ExecutionCompletionThread #0"
> "ExecutionCompletionThread #0":
> waiting to lock Monitor@0x00007f6066b9e8c8 (Object@0x00007f66b6570200, a
> org/apache/hadoop/hive/llap/daemon/impl/QueryInfo$FinishableStateTracker),
> which is held by "IPC Server handler 0 on 15001"
> Found a total of 1 deadlock.
> {noformat}
> Looks like it's caused by synchronized blocks:
> {noformat}
> TaskWrapper:
> public synchronized void maybeUnregisterForFinishedStateNotifications
> {noformat}
> Eventually calls
> {noformat}
> FinishableStateTracker
> synchronized void unregisterForUpdates(FinishableStateUpdateHandler handler) {
> {noformat}
> and
> {noformat}
> FST
> synchronized void sourceStateUpdated(String sourceName) {
> {noformat}
> eventually calls
> {noformat}
> public synchronized boolean isInWaitQueue() {
> {noformat}
> The latter returns the boolean, so it definitely doesn't need synchronized,
> however I don't know if there are other similar issues and what is necessary
> inside sync blocks, perhaps there's a better fix.
> Overall I'd say synch methods on objects that call any other non-trivial
> objects should not be used. Perhaps for now it would be good to replace all
> sync methods by sync blocks that cover entire method, as well as remove the
> unnecessary ones like the isWait... one. Then the scope of the blocks can be
> adjusted based on logic in future.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)