[
https://issues.apache.org/jira/browse/HIVE-19169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zoltan Haindrich updated HIVE-19169:
------------------------------------
Description:
a more or less recent hive/master with tpcds1000; while running only 1 query at
a time on the cluster; sometimes a "timeout" happens - this could be even a
misconfiguration problem...I'm not sure I've set it up correctly.
what I see from: it seems like the attempt have entered the queue and accepted
- after almost 90 seconds there are messages that it will be pre-empted - and
the 90 sec timeout happens.
hive.log; example: attempt_1522319554594_0065_19_05_000119_14 failed
{code}
2018-04-10T13:20:25,178 ERROR [HiveServer2-Background-Pool: Thread-4194]:
SessionState (SessionState.java:printError(1214)) - Vertex failed,
vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05,
diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119,
diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90
secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed,
TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 8
killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed,
TaskAttempt 12 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
secs], TaskAttempt 15 failed,
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
removed
2018-04-10T13:20:25,260 ERROR [HiveServer2-Background-Pool: Thread-4194]:
ql.Driver (SessionState.java:printError(1214)) - FAILED: Execution Error,
return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed,
vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05,
diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119,
diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90
secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed,
TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 8
killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed,
TaskAttempt 12 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
secs], TaskAttempt 15 failed,
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
removed
org.apache.hive.service.cli.HiveSQLException: Error while processing statement:
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer
3, vertexId=vertex_1522319554594_0065_19_05, diagnostics=[Task failed,
taskId=task_1522319554594_0065_19_05_000119, diagnostics=[TaskAttempt 0 killed,
TaskAttempt 1 failed, info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1
Timed out after 90 secs], TaskAttempt 2 killed, TaskAttempt 3 killed,
TaskAttempt 4 killed, TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7
killed, TaskAttempt 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed,
TaskAttempt 11 killed, TaskAttempt 12 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
secs], TaskAttempt 15 failed,
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
removed
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed,
vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05,
diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119,
diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90
secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed,
TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 8
killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed,
TaskAttempt 12 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
secs], TaskAttempt 15 failed,
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
removed
{code}
tezam logs for the attempt
{code}
2018-04-10 13:18:45,430 [INFO] [TaskSchedulerEventHandlerThread]
|tezplugins.LlapTaskSchedulerService|: Received allocateRequest.
task=attempt_1522319554594_0065_19_05_000119_14, priority=194,
capability=<memory:3072, vCores:1>, hosts=null
2018-04-10 13:18:45,440 [INFO] [LlapScheduler]
|tezplugins.LlapTaskSchedulerService|: Assigned #67763,
task=TaskInfo{task=attempt_1522319554594_0065_19_05_000119_14, priority=194,
startTime=0, containerId=null, uniqueId=69780, localityDelayTimeout=1127144039}
on node={lemon-9.openstacklocal:50155, id=2a3562c1-1094-499f-9d83-5c3b5cf34663,
canAcceptTask=true, st=10, ac=1, commF=false, disabled=false}, to
container=container_222212222_0065_01_067763
2018-04-10 13:18:45,440 [INFO] [LlapScheduler] |GuaranteedTasks|: Registering
attempt_1522319554594_0065_19_05_000119_14; false
2018-04-10 13:18:45,446 [INFO] [Dispatcher thread {Central}]
|impl.TaskAttemptImpl|: TaskAttempt:
[attempt_1522319554594_0065_19_05_000119_14] submitted. Is using containerId:
[container_222212222_0065_01_067763] on NM: [lemon-9.openstacklocal:50155]
2018-04-10 13:18:45,446 [INFO] [Dispatcher thread {Central}]
|HistoryEventHandler.criticalEvents|:
[HISTORY][DAG:dag_1522319554594_0065_19][Event:TASK_ATTEMPT_STARTED]:
vertexName=Reducer 3, taskAttemptId=attempt_1522319554594_0065_19_05_000119_14,
startTime=1523366325446, containerId=container_222212222_0065_01_067763,
nodeId=lemon-9.openstacklocal:50155
2018-04-10 13:18:45,464 [INFO] [TaskCommunicator # 9]
|tezplugins.LlapTaskCommunicator|: Successfully launched task:
attempt_1522319554594_0065_19_05_000119_14
2018-04-10 13:18:45,464 [INFO] [TaskCommunicator # 9] |GuaranteedTasks|:
Received response for attempt_1522319554594_0065_19_05_000119_14, true
2018-04-10 13:19:08,743 [INFO] [IPC Server handler 15 on 35408]
|tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node are
not there: attempt_1522319554594_0065_19_05_000130_14,
attempt_1522319554594_0065_19_05_000191_14,
attempt_1522319554594_0065_19_05_000128_14,
attempt_1522319554594_0065_19_08_000018_6,
attempt_1522319554594_0065_19_08_000022_6,
attempt_1522319554594_0065_19_05_000181_14,
attempt_1522319554594_0065_19_05_000131_15,
attempt_1522319554594_0065_19_05_000214_15,
attempt_1522319554594_0065_19_05_000093_11,
attempt_1522319554594_0065_19_05_000119_14,
2018-04-10 13:19:18,728 [INFO] [IPC Server handler 11 on 35408]
|tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node are
not there: attempt_1522319554594_0065_19_05_000130_14,
attempt_1522319554594_0065_19_05_000191_14,
attempt_1522319554594_0065_19_08_000018_6,
attempt_1522319554594_0065_19_08_000022_6,
attempt_1522319554594_0065_19_05_000181_14,
attempt_1522319554594_0065_19_05_000131_15,
attempt_1522319554594_0065_19_05_000214_15,
attempt_1522319554594_0065_19_05_000093_11,
attempt_1522319554594_0065_19_05_000119_14,
attempt_1522319554594_0065_19_08_000025_7,
2018-04-10 13:19:28,726 [INFO] [IPC Server handler 28 on 35408]
|tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node are
not there: attempt_1522319554594_0065_19_05_000130_14,
attempt_1522319554594_0065_19_05_000191_14,
attempt_1522319554594_0065_19_08_000018_6,
attempt_1522319554594_0065_19_08_000022_6,
attempt_1522319554594_0065_19_05_000181_14,
attempt_1522319554594_0065_19_05_000131_15,
attempt_1522319554594_0065_19_05_000214_15,
attempt_1522319554594_0065_19_05_000093_11,
attempt_1522319554594_0065_19_05_000119_14,
attempt_1522319554594_0065_19_08_000025_7,
2018-04-10 13:19:38,726 [INFO] [IPC Server handler 15 on 35408]
|tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node are
not there: attempt_1522319554594_0065_19_05_000130_14,
attempt_1522319554594_0065_19_05_000191_14,
attempt_1522319554594_0065_19_08_000018_6,
attempt_1522319554594_0065_19_08_000022_6,
attempt_1522319554594_0065_19_05_000181_14,
attempt_1522319554594_0065_19_05_000131_15,
attempt_1522319554594_0065_19_05_000214_15,
attempt_1522319554594_0065_19_05_000093_11,
attempt_1522319554594_0065_19_05_000119_14,
attempt_1522319554594_0065_19_08_000025_7,
2018-04-10 13:19:48,726 [INFO] [IPC Server handler 17 on 35408]
|tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node are
not there: attempt_1522319554594_0065_19_05_000130_14,
attempt_1522319554594_0065_19_05_000191_14,
attempt_1522319554594_0065_19_08_000018_6,
attempt_1522319554594_0065_19_08_000022_6,
attempt_1522319554594_0065_19_05_000181_14,
attempt_1522319554594_0065_19_05_000131_15,
attempt_1522319554594_0065_19_05_000214_15,
attempt_1522319554594_0065_19_05_000093_11,
attempt_1522319554594_0065_19_05_000119_14,
attempt_1522319554594_0065_19_08_000025_7,
2018-04-10 13:19:58,726 [INFO] [IPC Server handler 5 on 35408]
|tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node are
not there: attempt_1522319554594_0065_19_05_000130_14,
attempt_1522319554594_0065_19_05_000191_14,
attempt_1522319554594_0065_19_08_000018_6,
attempt_1522319554594_0065_19_08_000022_6,
attempt_1522319554594_0065_19_05_000181_14,
attempt_1522319554594_0065_19_05_000131_15,
attempt_1522319554594_0065_19_05_000214_15,
attempt_1522319554594_0065_19_05_000093_11,
attempt_1522319554594_0065_19_05_000119_14,
attempt_1522319554594_0065_19_08_000025_7,
2018-04-10 13:20:08,726 [INFO] [IPC Server handler 2 on 35408]
|tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node are
not there: attempt_1522319554594_0065_19_05_000130_14,
attempt_1522319554594_0065_19_05_000191_14,
attempt_1522319554594_0065_19_08_000018_6,
attempt_1522319554594_0065_19_08_000022_6,
attempt_1522319554594_0065_19_05_000181_14,
attempt_1522319554594_0065_19_05_000131_15,
attempt_1522319554594_0065_19_05_000214_15,
attempt_1522319554594_0065_19_05_000093_11,
attempt_1522319554594_0065_19_05_000119_14,
attempt_1522319554594_0065_19_08_000025_7,
2018-04-10 13:20:18,727 [INFO] [IPC Server handler 17 on 35408]
|tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node are
not there: attempt_1522319554594_0065_19_05_000130_14,
attempt_1522319554594_0065_19_05_000191_14,
attempt_1522319554594_0065_19_08_000018_6,
attempt_1522319554594_0065_19_08_000022_6,
attempt_1522319554594_0065_19_05_000181_14,
attempt_1522319554594_0065_19_05_000131_15,
attempt_1522319554594_0065_19_05_000214_15,
attempt_1522319554594_0065_19_05_000093_11,
attempt_1522319554594_0065_19_05_000119_14,
attempt_1522319554594_0065_19_08_000025_7,
2018-04-10 13:20:24,594 [INFO] [Dispatcher thread {Central}]
|HistoryEventHandler.criticalEvents|:
[HISTORY][DAG:dag_1522319554594_0065_19][Event:TASK_ATTEMPT_FINISHED]:
vertexName=Reducer 3, taskAttemptId=attempt_1522319554594_0065_19_05_000119_14,
creationTime=1523366325392, allocationTime=1523366325443,
startTime=1523366325446, finishTime=1523366424594, timeTaken=99148,
status=FAILED, taskFailureType=NON_FATAL, errorEnum=TASK_HEARTBEAT_ERROR,
diagnostics=AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out
after 90 secs, nodeHttpAddress=http://lemon-9.openstacklocal:15002,
counters=Counters: 0
2018-04-10 13:20:24,600 [INFO] [TaskSchedulerEventHandlerThread]
|GuaranteedTasks|: Unregistering attempt_1522319554594_0065_19_05_000119_14;
false
2018-04-10 13:20:24,600 [INFO] [TaskSchedulerEventHandlerThread]
|tezplugins.LlapTaskSchedulerService|: Processing de-allocate request for
task=attempt_1522319554594_0065_19_05_000119_14, state=ASSIGNED, endReason=OTHER
2018-04-10 13:20:24,603 [INFO] [Dispatcher thread {Central}] |node.AMNodeImpl|:
Attempt failedon node: lemon-9.openstacklocal:50155 TA:
attempt_1522319554594_0065_19_05_000119_14, container:
container_222212222_0065_01_067763, numFailedTAs: 57
2018-04-10 13:20:24,605 [INFO] [Dispatcher thread {Central}] |impl.VertexImpl|:
Source task attempt completed for vertex: vertex_1522319554594_0065_19_07
[Reducer 10] attempt: attempt_1522319554594_0065_19_05_000119_14 with state:
FAILED vertexState: RUNNING
2018-04-10 13:20:24,605 [INFO] [Dispatcher thread {Central}] |impl.VertexImpl|:
Source task attempt completed for vertex: vertex_1522319554594_0065_19_11
[Reducer 4] attempt: attempt_1522319554594_0065_19_05_000119_14 with state:
FAILED vertexState: RUNNING
2018-04-10 13:20:24,605 [INFO] [Dispatcher thread {Central}] |impl.VertexImpl|:
Source task attempt completed for vertex: vertex_1522319554594_0065_19_06
[Reducer 9] attempt: attempt_1522319554594_0065_19_05_000119_14 with state:
FAILED vertexState: RUNNING
TaskAttempt 14 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
secs]
2018-04-10 13:20:25,020 [INFO] [Dispatcher thread {Central}]
|HistoryEventHandler.criticalEvents|:
[HISTORY][DAG:dag_1522319554594_0065_19][Event:VERTEX_FINISHED]:
vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05,
initRequestedTime=1523363710631, initedTime=1523363710645,
startRequestedTime=1523363713645, startedTime=1523363713646,
finishTime=1523366425017, timeTaken=2711371, status=FAILED, diagnostics=Task
failed, taskId=task_1522319554594_0065_19_05_000119, diagnostics=[TaskAttempt 0
killed, TaskAttempt 1 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90
secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed,
TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 8
killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed,
TaskAttempt 12 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
secs], TaskAttempt 15 failed,
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
removed
2018-04-10 13:20:25,157 [INFO] [Dispatcher thread {Central}]
|HistoryEventHandler.criticalEvents|:
[HISTORY][DAG:dag_1522319554594_0065_19][Event:DAG_FINISHED]:
dagId=dag_1522319554594_0065_19, startTime=1523363710305,
finishTime=1523366425121, timeTaken=2714816, status=FAILED, diagnostics=Vertex
failed, vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05,
diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119,
diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90
secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed,
TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 8
killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed,
TaskAttempt 12 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
secs], TaskAttempt 15 failed,
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
removed
{code}
from llap0 logs; filtered for attempt_1522319554594_0065_19_05_000119_14 ;
ordered by time:
{code}
java.lang.RuntimeException: attempt_1522319554594_0065_19_05_000119_14 was not
registered and couldn't be removed
2018-04-10T13:18:45,455 INFO [IPC Server handler 3 on 50155 ()]
org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl: Queueing container
for execution: fragemendId=attempt_1522319554594_0065_19_05_000119_14,
am_details=lemon-8.openstacklocal:35408, taskInfo= fragment 119 attempt 14,
user=hive, queryId=hive_20180410113951_2fc85570-df7a-4cb1-8499-74d11f227e0d,
appIdString=application_1522319554594_0065, appAttemptNum=1,
containerIdString=container_222212222_0065_01_067763, dagName=with ws as
2018-04-10T13:18:45,457 INFO [IPC Server handler 3 on 50155
(1522319554594_0065_19_05_000119_14)] impl.ContainerRunnerImpl: SubmissionState
for attempt_1522319554594_0065_19_05_000119_14 : ACCEPTED
2018-04-10T13:19:13,533 INFO [Wait-Queue-Scheduler-0 ()]
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Adding
attempt_1522319554594_0065_19_05_000119_14 to pre-emption queue
2018-04-10T13:19:13,533 INFO [Wait-Queue-Scheduler-0 ()]
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Attempting to
execute TaskWrapper{task=attempt_1522319554594_0065_19_05_000119_14,
inWaitQueue=true, inPreemptionQueue=false, registeredForNotifications=true,
canFinish=true, canFinish(in queue)=true, isGuaranteed=false,
firstAttemptStartTime=1523364757720, dagStartTime=1523363710305,
withinDagPriority=194, vertexParallelism= 297, selfAndUpstreamParallelism=
1286, selfAndUpstreamComplete= 1243}
2018-04-10T13:19:13,534 WARN [ExecutionCompletionThread #0 ()]
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Received
onSuccess/onFailure for a fragment for which a completing message was not
received: attempt_1522319554594_0065_19_05_000119_14
2018-04-10T13:19:13,536 ERROR [ExecutionCompletionThread #0 ()]
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Failed
notification received: Stacktrace: java.lang.RuntimeException:
attempt_1522319554594_0065_19_05_000119_14 was not registered and couldn't be
removed
{code}
was:
a more or less recent hive/master with tpcds1000; while running only 1 query at
a time on the cluster; sometimes a "timeout" happens - this could be even a
misconfiguration problem...I'm not sure I've set it up correctly.
what I see from: it seems like the attempt have entered the queue and accepted
- after almost 90 seconds there are messages that it will be pre-empted - and
the 90 sec timeout happens.
hive.log; example: attempt_1522319554594_0065_19_05_000119_14 failed
{code}
2018-04-10T13:20:25,178 ERROR [HiveServer2-Background-Pool: Thread-4194]:
SessionState (SessionState.java:printError(1214)) - Vertex failed,
vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05,
diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119,
diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90
secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed,
TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 8
killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed,
TaskAttempt 12 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
secs], TaskAttempt 15 failed,
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
removed
2018-04-10T13:20:25,260 ERROR [HiveServer2-Background-Pool: Thread-4194]:
ql.Driver (SessionState.java:printError(1214)) - FAILED: Execution Error,
return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed,
vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05,
diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119,
diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90
secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed,
TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 8
killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed,
TaskAttempt 12 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
secs], TaskAttempt 15 failed,
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
removed
org.apache.hive.service.cli.HiveSQLException: Error while processing statement:
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer
3, vertexId=vertex_1522319554594_0065_19_05, diagnostics=[Task failed,
taskId=task_1522319554594_0065_19_05_000119, diagnostics=[TaskAttempt 0 killed,
TaskAttempt 1 failed, info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1
Timed out after 90 secs], TaskAttempt 2 killed, TaskAttempt 3 killed,
TaskAttempt 4 killed, TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7
killed, TaskAttempt 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed,
TaskAttempt 11 killed, TaskAttempt 12 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
secs], TaskAttempt 15 failed,
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
removed
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed,
vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05,
diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119,
diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90
secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed,
TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt 8
killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed,
TaskAttempt 12 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
secs], TaskAttempt 15 failed,
info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
removed
{code}
from llap0 logs; filtered for attempt_1522319554594_0065_19_05_000119_14 ;
ordered by time:
{code}
java.lang.RuntimeException: attempt_1522319554594_0065_19_05_000119_14 was not
registered and couldn't be removed
2018-04-10T13:18:45,455 INFO [IPC Server handler 3 on 50155 ()]
org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl: Queueing container
for execution: fragemendId=attempt_1522319554594_0065_19_05_000119_14,
am_details=lemon-8.openstacklocal:35408, taskInfo= fragment 119 attempt 14,
user=hive, queryId=hive_20180410113951_2fc85570-df7a-4cb1-8499-74d11f227e0d,
appIdString=application_1522319554594_0065, appAttemptNum=1,
containerIdString=container_222212222_0065_01_067763, dagName=with ws as
2018-04-10T13:18:45,457 INFO [IPC Server handler 3 on 50155
(1522319554594_0065_19_05_000119_14)] impl.ContainerRunnerImpl: SubmissionState
for attempt_1522319554594_0065_19_05_000119_14 : ACCEPTED
2018-04-10T13:19:13,533 INFO [Wait-Queue-Scheduler-0 ()]
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Adding
attempt_1522319554594_0065_19_05_000119_14 to pre-emption queue
2018-04-10T13:19:13,533 INFO [Wait-Queue-Scheduler-0 ()]
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Attempting to
execute TaskWrapper{task=attempt_1522319554594_0065_19_05_000119_14,
inWaitQueue=true, inPreemptionQueue=false, registeredForNotifications=true,
canFinish=true, canFinish(in queue)=true, isGuaranteed=false,
firstAttemptStartTime=1523364757720, dagStartTime=1523363710305,
withinDagPriority=194, vertexParallelism= 297, selfAndUpstreamParallelism=
1286, selfAndUpstreamComplete= 1243}
2018-04-10T13:19:13,534 WARN [ExecutionCompletionThread #0 ()]
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Received
onSuccess/onFailure for a fragment for which a completing message was not
received: attempt_1522319554594_0065_19_05_000119_14
2018-04-10T13:19:13,536 ERROR [ExecutionCompletionThread #0 ()]
org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Failed
notification received: Stacktrace: java.lang.RuntimeException:
attempt_1522319554594_0065_19_05_000119_14 was not registered and couldn't be
removed
{code}
> llap: Timed out after 90 secs
> -----------------------------
>
> Key: HIVE-19169
> URL: https://issues.apache.org/jira/browse/HIVE-19169
> Project: Hive
> Issue Type: Bug
> Components: llap
> Reporter: Zoltan Haindrich
> Priority: Major
>
> a more or less recent hive/master with tpcds1000; while running only 1 query
> at a time on the cluster; sometimes a "timeout" happens - this could be even
> a misconfiguration problem...I'm not sure I've set it up correctly.
> what I see from: it seems like the attempt have entered the queue and
> accepted - after almost 90 seconds there are messages that it will be
> pre-empted - and the 90 sec timeout happens.
> hive.log; example: attempt_1522319554594_0065_19_05_000119_14 failed
> {code}
> 2018-04-10T13:20:25,178 ERROR [HiveServer2-Background-Pool: Thread-4194]:
> SessionState (SessionState.java:printError(1214)) - Vertex failed,
> vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05,
> diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119,
> diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed,
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90
> secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed,
> TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt
> 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed,
> TaskAttempt 12 failed,
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
> secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
> secs], TaskAttempt 15 failed,
> info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
> attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
> removed
> 2018-04-10T13:20:25,260 ERROR [HiveServer2-Background-Pool: Thread-4194]:
> ql.Driver (SessionState.java:printError(1214)) - FAILED: Execution Error,
> return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed,
> vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05,
> diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119,
> diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed,
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90
> secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed,
> TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt
> 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed,
> TaskAttempt 12 failed,
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
> secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
> secs], TaskAttempt 15 failed,
> info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
> attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
> removed
> org.apache.hive.service.cli.HiveSQLException: Error while processing
> statement: FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer
> 3, vertexId=vertex_1522319554594_0065_19_05, diagnostics=[Task failed,
> taskId=task_1522319554594_0065_19_05_000119, diagnostics=[TaskAttempt 0
> killed, TaskAttempt 1 failed,
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90
> secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed,
> TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt
> 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed,
> TaskAttempt 12 failed,
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
> secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
> secs], TaskAttempt 15 failed,
> info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
> attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
> removed
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed,
> vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05,
> diagnostics=[Task failed, taskId=task_1522319554594_0065_19_05_000119,
> diagnostics=[TaskAttempt 0 killed, TaskAttempt 1 failed,
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90
> secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed,
> TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt
> 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed,
> TaskAttempt 12 failed,
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
> secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
> secs], TaskAttempt 15 failed,
> info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
> attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
> removed
> {code}
> tezam logs for the attempt
> {code}
> 2018-04-10 13:18:45,430 [INFO] [TaskSchedulerEventHandlerThread]
> |tezplugins.LlapTaskSchedulerService|: Received allocateRequest.
> task=attempt_1522319554594_0065_19_05_000119_14, priority=194,
> capability=<memory:3072, vCores:1>, hosts=null
> 2018-04-10 13:18:45,440 [INFO] [LlapScheduler]
> |tezplugins.LlapTaskSchedulerService|: Assigned #67763,
> task=TaskInfo{task=attempt_1522319554594_0065_19_05_000119_14, priority=194,
> startTime=0, containerId=null, uniqueId=69780,
> localityDelayTimeout=1127144039} on node={lemon-9.openstacklocal:50155,
> id=2a3562c1-1094-499f-9d83-5c3b5cf34663, canAcceptTask=true, st=10, ac=1,
> commF=false, disabled=false}, to container=container_222212222_0065_01_067763
> 2018-04-10 13:18:45,440 [INFO] [LlapScheduler] |GuaranteedTasks|: Registering
> attempt_1522319554594_0065_19_05_000119_14; false
> 2018-04-10 13:18:45,446 [INFO] [Dispatcher thread {Central}]
> |impl.TaskAttemptImpl|: TaskAttempt:
> [attempt_1522319554594_0065_19_05_000119_14] submitted. Is using containerId:
> [container_222212222_0065_01_067763] on NM: [lemon-9.openstacklocal:50155]
> 2018-04-10 13:18:45,446 [INFO] [Dispatcher thread {Central}]
> |HistoryEventHandler.criticalEvents|:
> [HISTORY][DAG:dag_1522319554594_0065_19][Event:TASK_ATTEMPT_STARTED]:
> vertexName=Reducer 3,
> taskAttemptId=attempt_1522319554594_0065_19_05_000119_14,
> startTime=1523366325446, containerId=container_222212222_0065_01_067763,
> nodeId=lemon-9.openstacklocal:50155
> 2018-04-10 13:18:45,464 [INFO] [TaskCommunicator # 9]
> |tezplugins.LlapTaskCommunicator|: Successfully launched task:
> attempt_1522319554594_0065_19_05_000119_14
> 2018-04-10 13:18:45,464 [INFO] [TaskCommunicator # 9] |GuaranteedTasks|:
> Received response for attempt_1522319554594_0065_19_05_000119_14, true
> 2018-04-10 13:19:08,743 [INFO] [IPC Server handler 15 on 35408]
> |tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node
> are not there: attempt_1522319554594_0065_19_05_000130_14,
> attempt_1522319554594_0065_19_05_000191_14,
> attempt_1522319554594_0065_19_05_000128_14,
> attempt_1522319554594_0065_19_08_000018_6,
> attempt_1522319554594_0065_19_08_000022_6,
> attempt_1522319554594_0065_19_05_000181_14,
> attempt_1522319554594_0065_19_05_000131_15,
> attempt_1522319554594_0065_19_05_000214_15,
> attempt_1522319554594_0065_19_05_000093_11,
> attempt_1522319554594_0065_19_05_000119_14,
> 2018-04-10 13:19:18,728 [INFO] [IPC Server handler 11 on 35408]
> |tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node
> are not there: attempt_1522319554594_0065_19_05_000130_14,
> attempt_1522319554594_0065_19_05_000191_14,
> attempt_1522319554594_0065_19_08_000018_6,
> attempt_1522319554594_0065_19_08_000022_6,
> attempt_1522319554594_0065_19_05_000181_14,
> attempt_1522319554594_0065_19_05_000131_15,
> attempt_1522319554594_0065_19_05_000214_15,
> attempt_1522319554594_0065_19_05_000093_11,
> attempt_1522319554594_0065_19_05_000119_14,
> attempt_1522319554594_0065_19_08_000025_7,
> 2018-04-10 13:19:28,726 [INFO] [IPC Server handler 28 on 35408]
> |tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node
> are not there: attempt_1522319554594_0065_19_05_000130_14,
> attempt_1522319554594_0065_19_05_000191_14,
> attempt_1522319554594_0065_19_08_000018_6,
> attempt_1522319554594_0065_19_08_000022_6,
> attempt_1522319554594_0065_19_05_000181_14,
> attempt_1522319554594_0065_19_05_000131_15,
> attempt_1522319554594_0065_19_05_000214_15,
> attempt_1522319554594_0065_19_05_000093_11,
> attempt_1522319554594_0065_19_05_000119_14,
> attempt_1522319554594_0065_19_08_000025_7,
> 2018-04-10 13:19:38,726 [INFO] [IPC Server handler 15 on 35408]
> |tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node
> are not there: attempt_1522319554594_0065_19_05_000130_14,
> attempt_1522319554594_0065_19_05_000191_14,
> attempt_1522319554594_0065_19_08_000018_6,
> attempt_1522319554594_0065_19_08_000022_6,
> attempt_1522319554594_0065_19_05_000181_14,
> attempt_1522319554594_0065_19_05_000131_15,
> attempt_1522319554594_0065_19_05_000214_15,
> attempt_1522319554594_0065_19_05_000093_11,
> attempt_1522319554594_0065_19_05_000119_14,
> attempt_1522319554594_0065_19_08_000025_7,
> 2018-04-10 13:19:48,726 [INFO] [IPC Server handler 17 on 35408]
> |tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node
> are not there: attempt_1522319554594_0065_19_05_000130_14,
> attempt_1522319554594_0065_19_05_000191_14,
> attempt_1522319554594_0065_19_08_000018_6,
> attempt_1522319554594_0065_19_08_000022_6,
> attempt_1522319554594_0065_19_05_000181_14,
> attempt_1522319554594_0065_19_05_000131_15,
> attempt_1522319554594_0065_19_05_000214_15,
> attempt_1522319554594_0065_19_05_000093_11,
> attempt_1522319554594_0065_19_05_000119_14,
> attempt_1522319554594_0065_19_08_000025_7,
> 2018-04-10 13:19:58,726 [INFO] [IPC Server handler 5 on 35408]
> |tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node
> are not there: attempt_1522319554594_0065_19_05_000130_14,
> attempt_1522319554594_0065_19_05_000191_14,
> attempt_1522319554594_0065_19_08_000018_6,
> attempt_1522319554594_0065_19_08_000022_6,
> attempt_1522319554594_0065_19_05_000181_14,
> attempt_1522319554594_0065_19_05_000131_15,
> attempt_1522319554594_0065_19_05_000214_15,
> attempt_1522319554594_0065_19_05_000093_11,
> attempt_1522319554594_0065_19_05_000119_14,
> attempt_1522319554594_0065_19_08_000025_7,
> 2018-04-10 13:20:08,726 [INFO] [IPC Server handler 2 on 35408]
> |tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node
> are not there: attempt_1522319554594_0065_19_05_000130_14,
> attempt_1522319554594_0065_19_05_000191_14,
> attempt_1522319554594_0065_19_08_000018_6,
> attempt_1522319554594_0065_19_08_000022_6,
> attempt_1522319554594_0065_19_05_000181_14,
> attempt_1522319554594_0065_19_05_000131_15,
> attempt_1522319554594_0065_19_05_000214_15,
> attempt_1522319554594_0065_19_05_000093_11,
> attempt_1522319554594_0065_19_05_000119_14,
> attempt_1522319554594_0065_19_08_000025_7,
> 2018-04-10 13:20:18,727 [INFO] [IPC Server handler 17 on 35408]
> |tezplugins.LlapTaskCommunicator|: The tasks we expected to be on the node
> are not there: attempt_1522319554594_0065_19_05_000130_14,
> attempt_1522319554594_0065_19_05_000191_14,
> attempt_1522319554594_0065_19_08_000018_6,
> attempt_1522319554594_0065_19_08_000022_6,
> attempt_1522319554594_0065_19_05_000181_14,
> attempt_1522319554594_0065_19_05_000131_15,
> attempt_1522319554594_0065_19_05_000214_15,
> attempt_1522319554594_0065_19_05_000093_11,
> attempt_1522319554594_0065_19_05_000119_14,
> attempt_1522319554594_0065_19_08_000025_7,
> 2018-04-10 13:20:24,594 [INFO] [Dispatcher thread {Central}]
> |HistoryEventHandler.criticalEvents|:
> [HISTORY][DAG:dag_1522319554594_0065_19][Event:TASK_ATTEMPT_FINISHED]:
> vertexName=Reducer 3,
> taskAttemptId=attempt_1522319554594_0065_19_05_000119_14,
> creationTime=1523366325392, allocationTime=1523366325443,
> startTime=1523366325446, finishTime=1523366424594, timeTaken=99148,
> status=FAILED, taskFailureType=NON_FATAL, errorEnum=TASK_HEARTBEAT_ERROR,
> diagnostics=AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out
> after 90 secs, nodeHttpAddress=http://lemon-9.openstacklocal:15002,
> counters=Counters: 0
> 2018-04-10 13:20:24,600 [INFO] [TaskSchedulerEventHandlerThread]
> |GuaranteedTasks|: Unregistering attempt_1522319554594_0065_19_05_000119_14;
> false
> 2018-04-10 13:20:24,600 [INFO] [TaskSchedulerEventHandlerThread]
> |tezplugins.LlapTaskSchedulerService|: Processing de-allocate request for
> task=attempt_1522319554594_0065_19_05_000119_14, state=ASSIGNED,
> endReason=OTHER
> 2018-04-10 13:20:24,603 [INFO] [Dispatcher thread {Central}]
> |node.AMNodeImpl|: Attempt failedon node: lemon-9.openstacklocal:50155 TA:
> attempt_1522319554594_0065_19_05_000119_14, container:
> container_222212222_0065_01_067763, numFailedTAs: 57
> 2018-04-10 13:20:24,605 [INFO] [Dispatcher thread {Central}]
> |impl.VertexImpl|: Source task attempt completed for vertex:
> vertex_1522319554594_0065_19_07 [Reducer 10] attempt:
> attempt_1522319554594_0065_19_05_000119_14 with state: FAILED vertexState:
> RUNNING
> 2018-04-10 13:20:24,605 [INFO] [Dispatcher thread {Central}]
> |impl.VertexImpl|: Source task attempt completed for vertex:
> vertex_1522319554594_0065_19_11 [Reducer 4] attempt:
> attempt_1522319554594_0065_19_05_000119_14 with state: FAILED vertexState:
> RUNNING
> 2018-04-10 13:20:24,605 [INFO] [Dispatcher thread {Central}]
> |impl.VertexImpl|: Source task attempt completed for vertex:
> vertex_1522319554594_0065_19_06 [Reducer 9] attempt:
> attempt_1522319554594_0065_19_05_000119_14 with state: FAILED vertexState:
> RUNNING
> TaskAttempt 14 failed,
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
> secs]
> 2018-04-10 13:20:25,020 [INFO] [Dispatcher thread {Central}]
> |HistoryEventHandler.criticalEvents|:
> [HISTORY][DAG:dag_1522319554594_0065_19][Event:VERTEX_FINISHED]:
> vertexName=Reducer 3, vertexId=vertex_1522319554594_0065_19_05,
> initRequestedTime=1523363710631, initedTime=1523363710645,
> startRequestedTime=1523363713645, startedTime=1523363713646,
> finishTime=1523366425017, timeTaken=2711371, status=FAILED, diagnostics=Task
> failed, taskId=task_1522319554594_0065_19_05_000119, diagnostics=[TaskAttempt
> 0 killed, TaskAttempt 1 failed,
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90
> secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed,
> TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt
> 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed,
> TaskAttempt 12 failed,
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
> secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
> secs], TaskAttempt 15 failed,
> info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
> attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
> removed
> 2018-04-10 13:20:25,157 [INFO] [Dispatcher thread {Central}]
> |HistoryEventHandler.criticalEvents|:
> [HISTORY][DAG:dag_1522319554594_0065_19][Event:DAG_FINISHED]:
> dagId=dag_1522319554594_0065_19, startTime=1523363710305,
> finishTime=1523366425121, timeTaken=2714816, status=FAILED,
> diagnostics=Vertex failed, vertexName=Reducer 3,
> vertexId=vertex_1522319554594_0065_19_05, diagnostics=[Task failed,
> taskId=task_1522319554594_0065_19_05_000119, diagnostics=[TaskAttempt 0
> killed, TaskAttempt 1 failed,
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_1 Timed out after 90
> secs], TaskAttempt 2 killed, TaskAttempt 3 killed, TaskAttempt 4 killed,
> TaskAttempt 5 killed, TaskAttempt 6 killed, TaskAttempt 7 killed, TaskAttempt
> 8 killed, TaskAttempt 9 killed, TaskAttempt 10 killed, TaskAttempt 11 killed,
> TaskAttempt 12 failed,
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_12 Timed out after 90
> secs], TaskAttempt 13 killed, TaskAttempt 14 failed,
> info=[AttemptID:attempt_1522319554594_0065_19_05_000119_14 Timed out after 90
> secs], TaskAttempt 15 failed,
> info=[org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException):
> attempt_1522319554594_0065_19_08_000018_6 was not registered and couldn't be
> removed
> {code}
> from llap0 logs; filtered for attempt_1522319554594_0065_19_05_000119_14 ;
> ordered by time:
> {code}
> java.lang.RuntimeException: attempt_1522319554594_0065_19_05_000119_14 was
> not registered and couldn't be removed
> 2018-04-10T13:18:45,455 INFO [IPC Server handler 3 on 50155 ()]
> org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl: Queueing
> container for execution:
> fragemendId=attempt_1522319554594_0065_19_05_000119_14,
> am_details=lemon-8.openstacklocal:35408, taskInfo= fragment 119 attempt 14,
> user=hive, queryId=hive_20180410113951_2fc85570-df7a-4cb1-8499-74d11f227e0d,
> appIdString=application_1522319554594_0065, appAttemptNum=1,
> containerIdString=container_222212222_0065_01_067763, dagName=with ws as
> 2018-04-10T13:18:45,457 INFO [IPC Server handler 3 on 50155
> (1522319554594_0065_19_05_000119_14)] impl.ContainerRunnerImpl:
> SubmissionState for attempt_1522319554594_0065_19_05_000119_14 : ACCEPTED
> 2018-04-10T13:19:13,533 INFO [Wait-Queue-Scheduler-0 ()]
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Adding
> attempt_1522319554594_0065_19_05_000119_14 to pre-emption queue
> 2018-04-10T13:19:13,533 INFO [Wait-Queue-Scheduler-0 ()]
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Attempting to
> execute TaskWrapper{task=attempt_1522319554594_0065_19_05_000119_14,
> inWaitQueue=true, inPreemptionQueue=false, registeredForNotifications=true,
> canFinish=true, canFinish(in queue)=true, isGuaranteed=false,
> firstAttemptStartTime=1523364757720, dagStartTime=1523363710305,
> withinDagPriority=194, vertexParallelism= 297, selfAndUpstreamParallelism=
> 1286, selfAndUpstreamComplete= 1243}
> 2018-04-10T13:19:13,534 WARN [ExecutionCompletionThread #0 ()]
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Received
> onSuccess/onFailure for a fragment for which a completing message was not
> received: attempt_1522319554594_0065_19_05_000119_14
> 2018-04-10T13:19:13,536 ERROR [ExecutionCompletionThread #0 ()]
> org.apache.hadoop.hive.llap.daemon.impl.TaskExecutorService: Failed
> notification received: Stacktrace: java.lang.RuntimeException:
> attempt_1522319554594_0065_19_05_000119_14 was not registered and couldn't be
> removed
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)