[
https://issues.apache.org/jira/browse/TEZ-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501977#comment-16501977
]
Kuhu Shukla commented on TEZ-3938:
----------------------------------
bq. Consider a MockClock instead of a SytemClock and then incrementTime instead
of doing an actual sleep
Done.
bq. Remove unnecessary if failed event check. With this change my understanding
is the task attempt will always enter the submitted state.
Made changes to handle the fail progress event (as it is unexpected) and just
check the final state.
bq. The status update check now checks to see if it is initialized before
failing due to lack of progress, but there is no test to prove status update
before submitted transition works.
Based on the state machine, task init followed by a status update is not
possible. I have no added a test to check for it for this reason.
Thank you for the review comments [~jeagles]. Appreciate further comments post
pre-commit.
The test failures from the earlier precommit are not related to this fix.
> Task attempts failing due to not making progress
> ------------------------------------------------
>
> Key: TEZ-3938
> URL: https://issues.apache.org/jira/browse/TEZ-3938
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jonathan Eagles
> Assignee: Kuhu Shukla
> Priority: Major
> Attachments: TEZ-3938.001.patch, TEZ-3938.002.patch
>
>
> Last progress time is initialized at TaskAttemptImpl object creation.
> Heartbeats can be sent over the umbilical as soon as the container is
> assigned an attempt. If the container assignment takes longer than the task
> progress timeout, we can timeout the task on the first heartbeat.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)