Zoltan Haindrich created HIVE-19169:
---------------------------------------
Summary: llap: Timed out after 90 secs
Key: HIVE-19169
URL: https://issues.apache.org/jira/browse/HIVE-19169
Project: Hive
Issue Type: Bug
Components: llap
Reporter: Zoltan Haindrich
a more or less recent hive/master with tpcds1000; while running only 1 query at
a time on the cluster; sometimes a "timeout" happens - this could be even a
misconfiguration problem...I'm not sure I've set it up correctly.
what I see from: it seems like the attempt have entered the queue and accepted
- after almost 90 seconds there are messages that it will be pre-empted - and
the 90 sec timeout happens.
hive.log; example: attempt_1522319554594_0065_19_05_000119_14 failed
{code}
2018-04-10T13:20:25,178 ERROR [HiveServer2-Background-Pool: Thread-4194]:
SessionState (SessionState.java:printError(1214)) - Vertex failed,
vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05,
diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119,
diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90
secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed,
TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 8
killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed,
TaskAttempt 12 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
secs], TaskAttempt 15 failed,
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
removed
2018-04-10T13:20:25,260 ERROR [HiveServer2-Background-Pool: Thread-4194]:
ql.Driver (SessionState.java:printError(1214)) - FAILED: Execution Error,
return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed,
vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05,
diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119,
diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90
secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed,
TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 8
killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed,
TaskAttempt 12 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
secs], TaskAttempt 15 failed,
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
removed
org.apache.hive.service.cli.HiveSQLException: Error while processing statement:
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer
3, vertexId=vertex_1522319554594_0065_19_05, diagnostics=[Task failed,
taskId=task_1522319554594_0065_19_05_000119, diagnostics=[TaskAttempt 0 killed,
TaskAttempt 1 failed, info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1
Timed out after 90 secs], TaskAttempt 2 killed, TaskAttempt 3 killed,
TaskAttempt 4 killed, TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7
killed, TaskAttempt 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed,
TaskAttempt 11 killed, TaskAttempt 12 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
secs], TaskAttempt 15 failed,
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
removed
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed,
vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05,
diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119,
diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90
secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed,
TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 8
killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed,
TaskAttempt 12 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
secs], TaskAttempt 15 failed,
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
removed
{code}
from llap0 logs; filtered for attempt_1522319554594_0065_19_05_000119_14 ;
ordered by time:
{code}
java.lang.RuntimeException: attempt_1522319554594_0065_19_05_000119_14 was not
registered and couldn't be removed
2018-04-10T13:18:45,455 INFO [IPC Server handler 3 on 50155 ()]
org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl: Queueing container
for execution: fragemendId=attempt_1522319554594_0065_19_05_000119_14,
am_details=lemon-8.openstacklocal:35408, taskInfo= fragment 119 attempt 14,
user=hive, queryId=hive_20180410113951_2fc85570-df7a-4cb1-8499-74d11f227e0d,
appIdString=application_1522319554594_0065, appAttemptNum=1,
containerIdString=container_222212222_0065_01_067763, dagName=with ws as
2018-04-10T13:18:45,457 INFO [IPC Server handler 3 on 50155
(1522319554594_0065_19_05_000119_14)] impl.ContainerRunnerImpl: SubmissionState
for attempt_1522319554594_0065_19_05_000119_14 : ACCEPTED
2018-04-10T13:19:13,533 INFO [Wait-Queue-Scheduler-0 ()]
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Adding
attempt_1522319554594_0065_19_05_000119_14 to pre-emption queue
2018-04-10T13:19:13,533 INFO [Wait-Queue-Scheduler-0 ()]
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Attempting to
execute TaskWrapper{task=attempt_1522319554594_0065_19_05_000119_14,
inWaitQueue=true, inPreemptionQueue=false, registeredForNotifications=true,
canFinish=true, canFinish(in queue)=true, isGuaranteed=false,
firstAttemptStartTime=1523364757720, dagStartTime=1523363710305,
withinDagPriority=194, vertexParallelism= 297, selfAndUpstreamParallelism=
1286, selfAndUpstreamComplete= 1243}
2018-04-10T13:19:13,534 WARN [ExecutionCompletionThread #0 ()]
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Received
onSuccess/onFailure for a fragment for which a completing message was not
received: attempt_1522319554594_0065_19_05_000119_14
2018-04-10T13:19:13,536 ERROR [ExecutionCompletionThread #0 ()]
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Failed
notification received: Stacktrace: java.lang.RuntimeException:
attempt_1522319554594_0065_19_05_000119_14 was not registered and couldn't be
removed
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)