[
https://issues.apache.org/jira/browse/HIVE-12904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111392#comment-15111392
]
Sergey Shelukhin commented on HIVE-12904:
-----------------------------------------
What visibility? boolean read or write are both atomic. Synchronized doesn't do
anything unless there's some really unobvious logic that is supposed to
serialize all operations that read this field externally with all the
operations on this object (not that it would change much, the same thing on the
other object will just happen later. Or deadlock)
> LLAP: deadlock in task scheduling
> ---------------------------------
>
> Key: HIVE-12904
> URL: https://issues.apache.org/jira/browse/HIVE-12904
> Project: Hive
> Issue Type: Bug
> Affects Versions: 2.0.0
> Reporter: Hui Zheng
> Assignee: Sergey Shelukhin
> Attachments: HIVE-12904.patch
>
>
> {noformat}
> Thread 34107: (state = BLOCKED)
> -
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper.isInWaitQueue()
> @bci=0, line=690 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService.finishableStateUpdated(org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper,
> boolean) @bci=8, line=485 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService.access$1500(org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService,
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper,
> boolean) @bci=3, line=78 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper.finishableStateUpdated(boolean)
> @bci=27, line=733 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.QueryInfo$FinishableStateTracker.sourceStateUpdated(java.lang.String)
> @bci=76, line=210 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.QueryInfo.sourceStateUpdated(java.lang.String)
> @bci=5, line=164 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.QueryTracker.registerSourceStateChange(java.lang.String,
> java.lang.String,
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateProto)
> @bci=34, line=228 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.sourceStateUpdated(org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateUpdatedRequestProto)
> @bci=47, line=255 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.sourceStateUpdated(org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateUpdatedRequestProto)
> @bci=5, line=328 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemonProtocolServerImpl.sourceStateUpdated(com.google.protobuf.RpcController,
>
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateUpdatedRequestProto)
> @bci=5, line=105 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(com.google.protobuf.Descriptors$MethodDescriptor,
> com.google.protobuf.RpcController, com.google.protobuf.Message) @bci=80,
> line=13067 (Compiled frame)
> -
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(org.apache.hadoop.ipc.RPC$Server,
> java.lang.String, org.apache.hadoop.io.Writable, long) @bci=246, line=616
> (Compiled frame)
> - org.apache.hadoop.ipc.RPC$Server.call(org.apache.hadoop.ipc.RPC$RpcKind,
> java.lang.String, org.apache.hadoop.io.Writable, long) @bci=9, line=969
> (Compiled frame)
> - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=38, line=2151 (Compiled
> frame)
> - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=1, line=2147 (Compiled
> frame)
> -
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
> java.security.AccessControlContext) @bci=0 (Compiled frame)
> - javax.security.auth.Subject.doAs(javax.security.auth.Subject,
> java.security.PrivilegedExceptionAction) @bci=42, line=422 (Compiled frame)
> -
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
> @bci=14, line=1657 (Compiled frame)
> - org.apache.hadoop.ipc.Server$Handler.run() @bci=315, line=2145
> (Interpreted frame)
> and
> Thread 34500: (state = BLOCKED)
> -
> org.apache.hadoop.hive.llap.daemon.impl.QueryInfo$FinishableStateTracker.unregisterForUpdates(org.apache.hadoop.hive.llap.daemon.FinishableStateUpdateHandler)
> @bci=0, line=195 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.QueryInfo.unregisterFinishableStateUpdate(org.apache.hadoop.hive.llap.daemon.FinishableStateUpdateHandler)
> @bci=5, line=160 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.QueryFragmentInfo.unregisterForFinishableStateUpdates(org.apache.hadoop.hive.llap.daemon.FinishableStateUpdateHandler)
> @bci=5, line=143 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper.maybeUnregisterForFinishedStateNotifications()
> @bci=20, line=681 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$InternalCompletionListener.onSuccess(org.apache.tez.runtime.task.TaskRunner2Result)
> @bci=32, line=548 (Compiled frame)
> -
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$InternalCompletionListener.onSuccess(java.lang.Object)
> @bci=5, line=535 (Compiled frame)
> - com.google.common.util.concurrent.Futures$4.run() @bci=55, line=1149
> (Compiled frame)
> -
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
> @bci=95, line=1142 (Compiled frame)
> - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617
> (Interpreted frame)
> - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> "IPC Server handler 0 on 15001":
> waiting to lock Monitor@0x00007f5d322ecb08 (Object@0x00007f67032cd2c0, a
> org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService$TaskWrapper),
> which is held by "ExecutionCompletionThread #0"
> "ExecutionCompletionThread #0":
> waiting to lock Monitor@0x00007f6066b9e8c8 (Object@0x00007f66b6570200, a
> org/apache/hadoop/hive/llap/daemon/impl/QueryInfo$FinishableStateTracker),
> which is held by "IPC Server handler 0 on 15001"
> Found a total of 1 deadlock.
> {noformat}
> Looks like it's caused by synchronized blocks:
> {noformat}
> TaskWrapper:
> public synchronized void maybeUnregisterForFinishedStateNotifications
> {noformat}
> Eventually calls
> {noformat}
> FinishableStateTracker
> synchronized void unregisterForUpdates(FinishableStateUpdateHandler handler) {
> {noformat}
> and
> {noformat}
> FST
> synchronized void sourceStateUpdated(String sourceName) {
> {noformat}
> eventually calls
> {noformat}
> public synchronized boolean isInWaitQueue() {
> {noformat}
> The latter returns the boolean, so it definitely doesn't need synchronized,
> however I don't know if there are other similar issues and what is necessary
> inside sync blocks, perhaps there's a better fix.
> Overall I'd say synch methods on objects that call any other non-trivial
> objects should not be used. Perhaps for now it would be good to replace all
> sync methods by sync blocks that cover entire method, as well as remove the
> unnecessary ones like the isWait... one. Then the scope of the blocks can be
> adjusted based on logic in future.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)