[
https://issues.apache.org/jira/browse/TEZ-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hitesh Shah updated TEZ-1122:
-----------------------------
Target Version/s: 0.8.0 (was: 0.6.0)
> Race between canCommit and Task moving into RUNNING state
> ---------------------------------------------------------
>
> Key: TEZ-1122
> URL: https://issues.apache.org/jira/browse/TEZ-1122
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.4.0
> Reporter: Siddharth Seth
> Assignee: Jeff Zhang
> Priority: Critical
> Attachments: Tez-1122.patch
>
>
> A task moves into RUNNING state via async events generated after a
> TaskAttempt moves into RUNNING state, which is triggered by getTask().
> canCommit() is a synchronous call on the umbilical - for short running tasks,
> a canCommit can come in before the async events are handled.
> {code}
> 2014-05-15 13:21:15,531 INFO [AsyncDispatcher event handler]
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl: TaskAttempt:
> [attempt_1400183444139_0007_1_00_000000_0] started. Is using containerId:
> [container_1400183444139_0007_01_000002] on NM: []
> 2014-05-15 13:21:15,533 INFO [AsyncDispatcher event handler]
> org.apache.tez.dag.history.HistoryEventHandler:
> [HISTORY][DAG:dag_1400183444139_0007_1][Event:TASK_ATTEMPT_STARTED]:
> vertexName=datagen, taskAttemptId=attempt_1400183444139_0007_1_00_000000_0,
> startTime=1400185273335, containerId=container_1400183444139_0007_01_000002,
> nodeId=,
> inProgressLogs=/node/containerlogs/container_1400183444139_0007_01_000002/,
> completedLogs=localhost:19888/jobhistory/logs///container_1400183444139_0007_01_000002/v_datagen_attempt_1400183444139_0007_1_00_000000_0/
> 2014-05-15 13:21:15,534 INFO [AsyncDispatcher event handler]
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl:
> attempt_1400183444139_0007_1_00_000000_0 TaskAttempt Transitioned from
> START_WAIT to RUNNING due to event TA_STARTED_REMOTELY
> 2014-05-15 13:21:15,534 INFO [IPC Server handler 6 on 61779]
> org.apache.tez.dag.app.dag.impl.TaskImpl: Task not running. Issuing kill to
> bad commit attempt attempt_1400183444139_0007_1_00_000000_0
> 2014-05-15 13:21:15,534 INFO [AMRM Callback Handler Thread]
> org.apache.tez.dag.app.rm.TaskScheduler: App total resource memory: 0 cpu: -1
> taskAllocations: 1
> 2014-05-15 13:21:15,537 INFO [AsyncDispatcher event handler]
> org.apache.tez.common.counters.Limits: Counter limits initialized with
> parameters: GROUP_NAME_MAX=128, MAX_GROUPS=500, COUNTER_NAME_MAX=64,
> MAX_COUNTERS=1200
> 2014-05-15 13:21:15,541 INFO [AsyncDispatcher event handler]
> org.apache.tez.dag.app.dag.impl.TaskImpl: task_1400183444139_0007_1_00_000000
> Task Transitioned from SCHEDULED to RUNNING
> 2014-05-15 13:21:15,544 INFO [AsyncDispatcher event handler]
> org.apache.tez.dag.history.HistoryEventHandler:
> [HISTORY][DAG:dag_1400183444139_0007_1][Event:TASK_ATTEMPT_FINISHED]:
> vertexName=datagen, taskAttemptId=attempt_1400183444139_0007_1_00_000000_0,
> startTime=1400185273335, finishTime=1400185275542, timeTaken=2207,
> status=KILLED, diagnostics=, counters=Counters: 0
> 2014-05-15 13:21:15,544 INFO [AsyncDispatcher event handler]
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl:
> attempt_1400183444139_0007_1_00_000000_0 TaskAttempt Transitioned from
> RUNNING to KILL_IN_PROGRESS due to event TA_KILL_REQUEST
> 2014-05-15 13:21:15,546 INFO [TaskSchedulerEventHandlerThread]
> org.apache.tez.dag.app.rm.TaskSchedulerEventHandler: Processing the event
> EventType: S_TA_ENDED
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)