[ 
https://issues.apache.org/jira/browse/TEZ-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496716#comment-16496716
 ] 

Jonathan Eagles commented on TEZ-3938:
--------------------------------------

Fix looks good in general. By updating the progress time stamp at submitted 
transition we reset the timeout clock and don't rely on the progress having to 
be set on the first status update.

Couple of things.
- Consider a MockClock instead of a SytemClock and then incrementTime instead 
of doing an actual sleep
- Remove unnecessary if failed event check. With this change my understanding 
is the task attempt will always enter the submitted state
- The status update check now checks to see if it is initialized before failing 
due to lack of progress, but there is no test to prove status update before 
submitted transition works.

> Task attempts failing due to not making progress
> ------------------------------------------------
>
>                 Key: TEZ-3938
>                 URL: https://issues.apache.org/jira/browse/TEZ-3938
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Kuhu Shukla
>            Priority: Major
>         Attachments: TEZ-3938.001.patch
>
>
> Last progress time is initialized at TaskAttemptImpl object creation. 
> Heartbeats can be sent over the umbilical as soon as the container is 
> assigned an attempt. If the container assignment takes longer than the task 
> progress timeout, we can timeout the task on the first heartbeat.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to