[ 
https://issues.apache.org/jira/browse/TEZ-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14957529#comment-14957529
 ] 

Rohini Palaniswamy commented on TEZ-808:
----------------------------------------

bq. Fixing IOs vs Fixing processor callback - which one of these would benefit 
the case for the stack trace shown above? 
   This issue was purely on the PigProcessor where it got stuck while 
processing. It was not waiting on IO.

bq. 2) since a processor could read all the input, spend a lot of time 
crunching through it before writing anything out. And during that pure 
processing time we could flag the task as hung because the IO's are not making 
progress.
   It is the responsibility of the processor to report progress when it is 
crunching through data. In mapreduce, Pig/Hive and even anyone writing simple 
mapreduce job reported progress for every record they processed. Default task 
timeout is 10 minutes and only time a job timed out and got killed was when 
each record processing took more than 10 mins. Most of the time it was an 
indicator for users to go fix their script or UDF as they were doing something 
inefficient. In rare cases when it actually required more than 10 mins, they 
went and increased the task timeout for their jobs.

> Handle task attempts that are not making progress
> -------------------------------------------------
>
>                 Key: TEZ-808
>                 URL: https://issues.apache.org/jira/browse/TEZ-808
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>
> If a task attempt is not making progress then it may cause the job to hang. 
> We may want to kill and restart the attempt. With speculation support and 
> free resources we may want to run another version in parallel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to