[ 
https://issues.apache.org/jira/browse/HIVE-19169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433495#comment-16433495
 ] 

Zoltan Haindrich commented on HIVE-19169:
-----------------------------------------

this could be known issue ; but I wasn't able to locate a relevant jira - I'm 
using an at least 2-3 weeks old version of master

> llap: Timed out after 90 secs
> -----------------------------
>
>                 Key: HIVE-19169
>                 URL: https://issues.apache.org/jira/browse/HIVE-19169
>             Project: Hive
>          Issue Type: Bug
>          Components: llap
>            Reporter: Zoltan Haindrich
>            Priority: Major
>
> a more or less recent hive/master with tpcds1000; while running only 1 query 
> at a time on the cluster; sometimes a "timeout" happens - this could be even 
> a misconfiguration problem...I'm not sure I've set it up correctly.
> what I see from: it seems like the attempt have entered the queue and 
> accepted - after almost 90 seconds there are messages that it will be 
> pre-empted - and the 90 sec timeout happens.
> hive.log; example: attempt_1522319554594_0065_19_05_000119_14 failed
> {code}
> 2018-04-10T13:20:25,178 ERROR [HiveServer2-Background-Pool: Thread-4194]: 
> SessionState (SessionState.java:printError(1214)) - Vertex failed, 
> vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05, 
> diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119, 
> diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90 
> secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed, 
> TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 
> 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed, 
> TaskAttempt 12 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
> secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
> secs], TaskAttempt 15 failed, 
> info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
> attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
> removed
> 2018-04-10T13:20:25,260 ERROR [HiveServer2-Background-Pool: Thread-4194]: 
> ql.Driver (SessionState.java:printError(1214)) - FAILED: Execution Error, 
> return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
> vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05, 
> diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119, 
> diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90 
> secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed, 
> TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 
> 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed, 
> TaskAttempt 12 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
> secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
> secs], TaskAttempt 15 failed, 
> info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
> attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
> removed
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 
> 3, vertexId=vertex_1522319554594_0065_19_05, diagnostics=[Task failed, 
> taskId=task_1522319554594_0065_19_05_000119, diagnostics=[TaskAttempt 0 
> killed, TaskAttempt 1 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90 
> secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed, 
> TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 
> 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed, 
> TaskAttempt 12 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
> secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
> secs], TaskAttempt 15 failed, 
> info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
> attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
> removed
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
> vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05, 
> diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119, 
> diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90 
> secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed, 
> TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 
> 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed, 
> TaskAttempt 12 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90 
> secs], TaskAttempt 13 killed, TaskAttempt 14 failed, 
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90 
> secs], TaskAttempt 15 failed, 
> info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): 
> attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be 
> removed
> {code}
> from llap0 logs; filtered for attempt_1522319554594_0065_19_05_000119_14 ; 
> ordered by time:
> {code}
> java.lang.RuntimeException: attempt_1522319554594_0065_19_05_000119_14 was 
> not registered and couldn't be removed
> 2018-04-10T13:18:45,455 INFO  [IPC Server handler 3 on 50155 ()] 
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl: Queueing 
> container for execution: 
> fragemendId=attempt_1522319554594_0065_19_05_000119_14, 
> am_details=lemon-8.openstacklocal:35408, taskInfo= fragment 119 attempt 14, 
> user=hive, queryId=hive_20180410113951_2fc85570-df7a-4cb1-8499-74d11f227e0d, 
> appIdString=application_1522319554594_0065, appAttemptNum=1, 
> containerIdString=container_222212222_0065_01_067763, dagName=with ws as
> 2018-04-10T13:18:45,457  INFO [IPC Server handler 3 on 50155 
> (1522319554594_0065_19_05_000119_14)] impl.ContainerRunnerImpl: 
> SubmissionState for attempt_1522319554594_0065_19_05_000119_14 : ACCEPTED 
> 2018-04-10T13:19:13,533 INFO  [Wait-Queue-Scheduler-0 ()] 
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Adding 
> attempt_1522319554594_0065_19_05_000119_14 to pre-emption queue
> 2018-04-10T13:19:13,533 INFO  [Wait-Queue-Scheduler-0 ()] 
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Attempting to 
> execute TaskWrapper{task=attempt_1522319554594_0065_19_05_000119_14, 
> inWaitQueue=true, inPreemptionQueue=false, registeredForNotifications=true, 
> canFinish=true, canFinish(in queue)=true, isGuaranteed=false, 
> firstAttemptStartTime=1523364757720, dagStartTime=1523363710305, 
> withinDagPriority=194, vertexParallelism= 297, selfAndUpstreamParallelism= 
> 1286, selfAndUpstreamComplete= 1243}
> 2018-04-10T13:19:13,534 WARN  [ExecutionCompletionThread #0 ()] 
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Received 
> onSuccess/onFailure for a fragment for which a completing message was not 
> received: attempt_1522319554594_0065_19_05_000119_14
> 2018-04-10T13:19:13,536 ERROR [ExecutionCompletionThread #0 ()] 
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Failed 
> notification received: Stacktrace: java.lang.RuntimeException: 
> attempt_1522319554594_0065_19_05_000119_14 was not registered and couldn't be 
> removed
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to