[
https://issues.apache.org/jira/browse/TEZ-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14957529#comment-14957529
]
Rohini Palaniswamy commented on TEZ-808:
----------------------------------------
bq. Fixing IOs vs Fixing processor callback - which one of these would benefit
the case for the stack trace shown above?
This issue was purely on the PigProcessor where it got stuck while
processing. It was not waiting on IO.
bq. 2) since a processor could read all the input, spend a lot of time
crunching through it before writing anything out. And during that pure
processing time we could flag the task as hung because the IO's are not making
progress.
It is the responsibility of the processor to report progress when it is
crunching through data. In mapreduce, Pig/Hive and even anyone writing simple
mapreduce job reported progress for every record they processed. Default task
timeout is 10 minutes and only time a job timed out and got killed was when
each record processing took more than 10 mins. Most of the time it was an
indicator for users to go fix their script or UDF as they were doing something
inefficient. In rare cases when it actually required more than 10 mins, they
went and increased the task timeout for their jobs.
> Handle task attempts that are not making progress
> -------------------------------------------------
>
> Key: TEZ-808
> URL: https://issues.apache.org/jira/browse/TEZ-808
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Bikas Saha
>
> If a task attempt is not making progress then it may cause the job to hang.
> We may want to kill and restart the attempt. With speculation support and
> free resources we may want to run another version in parallel.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)