[ 
https://issues.apache.org/jira/browse/HIVE-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838349#comment-15838349
 ] 

Sergey Shelukhin commented on HIVE-15722:
-----------------------------------------

1) What is LLL?
2) {noformat}
+  List<QueryFragmentInfo> getRegisteredFragments(QueryIdentifier 
queryIdentifier) {
+    ReadWriteLock dagLock = getDagLock(queryIdentifier);
+    dagLock.writeLock().lock();
+    try {
+      LOG.info("Processing queryFailed for queryIdentifier={}", 
queryIdentifier);
{noformat}
should it be rename based on the log line and write lock?

3) Why was initializing query ID in the beginning of the patch moved?

> LLAP: Avoid marking a query as complete if the AMReporter runs into an error
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-15722
>                 URL: https://issues.apache.org/jira/browse/HIVE-15722
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: HIVE-15722.01.patch
>
>
> When the AMReporter runs into an error (typically intermittent), we end up 
> killing all fragments on the daemon. This is done by marking the query as 
> complete.
> The AM would continue to try scheduling on this node - which would lead to 
> task failures if the daemon structures are updated.
> Instead of clearing the structures, it's better to kill the fragments, and 
> let a queryComplete call come in from the AM.
> Later, we could make enhancements in the AM to avoid such nodes. That's not 
> simple though, since the AM will not find out what happened due to the 
> communication failure from the daemon.
> Leads to 
> {code}
> org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): Dag 
> query16 already complete. Rejecting fragment [Map 7, 29, 0]
>       at 
> org.apache.hadoop.hive.llap.daemon.impl.QueryTracker.registerFragment(QueryTracker.java:149)
>       at 
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:226)
>       at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:487)
>       at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.submitWork(LlapProtocolServerImpl.java:101)
>       at 
> org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:16728)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to