[ 
https://issues.apache.org/jira/browse/HIVE-12904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115789#comment-15115789
 ] 

Sergey Shelukhin commented on HIVE-12904:
-----------------------------------------

+1

> LLAP: deadlock in task scheduling
> ---------------------------------
>
>                 Key: HIVE-12904
>                 URL: https://issues.apache.org/jira/browse/HIVE-12904
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Hui Zheng
>            Assignee: Sergey Shelukhin
>            Priority: Critical
>         Attachments: HIVE-12904.2.patch, HIVE-12904.3.patch, HIVE-12904.patch
>
>
> {noformat}
> Thread 34107: (state = BLOCKED)
>  - 
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper.isInWaitQueue()
>  @bci=0, line=690 (Compiled frame)
>  - 
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService.finishableStateUpdated(org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper,
>  boolean) @bci=8, line=485 (Compiled frame)
>  - 
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService.access$1500(org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService,
>  org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper, 
> boolean) @bci=3, line=78 (Compiled frame)
>  - 
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper.finishableStateUpdated(boolean)
>  @bci=27, line=733 (Compiled frame)
>  - 
> org.apache.hadoop.hive.llap.daemon.impl.QueryInfo$FinishableStateTracker.sourceStateUpdated(java.lang.String)
>  @bci=76, line=210 (Compiled frame)
>  - 
> org.apache.hadoop.hive.llap.daemon.impl.QueryInfo.sourceStateUpdated(java.lang.String)
>  @bci=5, line=164 (Compiled frame)
>  - 
> org.apache.hadoop.hive.llap.daemon.impl.QueryTracker.registerSourceStateChange(java.lang.String,
>  java.lang.String, 
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateProto)
>  @bci=34, line=228 (Compiled frame)
>  - 
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.sourceStateUpdated(org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateUpdatedRequestProto)
>  @bci=47, line=255 (Compiled frame)
>  - 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.sourceStateUpdated(org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateUpdatedRequestProto)
>  @bci=5, line=328 (Compiled frame)
>  - 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemonProtocolServerImpl.sourceStateUpdated(com.google.protobuf.RpcController,
>  
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$SourceStateUpdatedRequestProto)
>  @bci=5, line=105 (Compiled frame)
>  - 
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(com.google.protobuf.Descriptors$MethodDescriptor,
>  com.google.protobuf.RpcController, com.google.protobuf.Message) @bci=80, 
> line=13067 (Compiled frame)
>  - 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(org.apache.hadoop.ipc.RPC$Server,
>  java.lang.String, org.apache.hadoop.io.Writable, long) @bci=246, line=616 
> (Compiled frame)
>  - org.apache.hadoop.ipc.RPC$Server.call(org.apache.hadoop.ipc.RPC$RpcKind, 
> java.lang.String, org.apache.hadoop.io.Writable, long) @bci=9, line=969 
> (Compiled frame)
>  - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=38, line=2151 (Compiled 
> frame)
>  - org.apache.hadoop.ipc.Server$Handler$1.run() @bci=1, line=2147 (Compiled 
> frame)
>  - 
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>  java.security.AccessControlContext) @bci=0 (Compiled frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
> java.security.PrivilegedExceptionAction) @bci=42, line=422 (Compiled frame)
>  - 
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>  @bci=14, line=1657 (Compiled frame)
>  - org.apache.hadoop.ipc.Server$Handler.run() @bci=315, line=2145 
> (Interpreted frame)
> and 
> Thread 34500: (state = BLOCKED)
>  - 
> org.apache.hadoop.hive.llap.daemon.impl.QueryInfo$FinishableStateTracker.unregisterForUpdates(org.apache.hadoop.hive.llap.daemon.FinishableStateUpdateHandler)
>  @bci=0, line=195 (Compiled frame)
>  - 
> org.apache.hadoop.hive.llap.daemon.impl.QueryInfo.unregisterFinishableStateUpdate(org.apache.hadoop.hive.llap.daemon.FinishableStateUpdateHandler)
>  @bci=5, line=160 (Compiled frame)
>  - 
> org.apache.hadoop.hive.llap.daemon.impl.QueryFragmentInfo.unregisterForFinishableStateUpdates(org.apache.hadoop.hive.llap.daemon.FinishableStateUpdateHandler)
>  @bci=5, line=143 (Compiled frame)
>  - 
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$TaskWrapper.maybeUnregisterForFinishedStateNotifications()
>  @bci=20, line=681 (Compiled frame)
>  - 
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$InternalCompletionListener.onSuccess(org.apache.tez.runtime.task.TaskRunner2Result)
>  @bci=32, line=548 (Compiled frame)
>  - 
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService$InternalCompletionListener.onSuccess(java.lang.Object)
>  @bci=5, line=535 (Compiled frame)
>  - com.google.common.util.concurrent.Futures$4.run() @bci=55, line=1149 
> (Compiled frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1142 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 
> (Interpreted frame)
>  - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> "IPC Server handler 0 on 15001":
>   waiting to lock Monitor@0x00007f5d322ecb08 (Object@0x00007f67032cd2c0, a 
> org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService$TaskWrapper),
>   which is held by "ExecutionCompletionThread #0"
> "ExecutionCompletionThread #0":
>   waiting to lock Monitor@0x00007f6066b9e8c8 (Object@0x00007f66b6570200, a 
> org/apache/hadoop/hive/llap/daemon/impl/QueryInfo$FinishableStateTracker),
>   which is held by "IPC Server handler 0 on 15001"
> Found a total of 1 deadlock.
> {noformat}
> Looks like it's caused by synchronized blocks:
> {noformat}
> TaskWrapper:
> public synchronized void maybeUnregisterForFinishedStateNotifications
> {noformat}
> Eventually calls 
> {noformat}
> FinishableStateTracker
> synchronized void unregisterForUpdates(FinishableStateUpdateHandler handler) {
> {noformat}
> and 
> {noformat}
> FST
>  synchronized void sourceStateUpdated(String sourceName) {
>    {noformat}
> eventually calls
> {noformat}
>  public synchronized boolean isInWaitQueue() {
> {noformat}
> The latter returns the boolean, so it definitely doesn't need synchronized, 
> however I don't know if there are other similar issues and what is necessary 
> inside sync blocks, perhaps there's a better fix.
> Overall I'd say synch methods on objects that call any other non-trivial 
> objects should not be used. Perhaps for now it would be good to replace all 
> sync methods by sync blocks that cover entire method, as well as remove the 
> unnecessary ones like the isWait... one. Then the scope of the blocks can be 
> adjusted based on logic in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to