[
https://issues.apache.org/jira/browse/TEZ-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438516#comment-15438516
]
Rajesh Balamohan edited comment on TEZ-3317 at 8/26/16 9:06 AM:
----------------------------------------------------------------
Thanks for sharing the patch. This can't be directly checked with app like
Hive, as Hive uses its own processor.
Couple of minor comments.
1. Can you please remove {{e.printStackTrace()}} statements
2. In OrderedGroupedKVInput, is it possible to move {{shuffledInputs,
shuffledBytes}} to avoid the lookup call, as getProgress could be in the hot
path.
3. In OrderedGroupedKVInput, progress split up by into two (0.5f). Is this
added to account for skew?
4. SimpleProcessor - wouldn't having 50ms refresh time for scheduling be too
aggressive? Is it possible to increase it?. Same in SleepProcessor (but here it
might be ok as it in examples)
was (Author: rajesh.balamohan):
Thanks for sharing the patch. I tried with couple of hive jobs and did not find
much increase in CPU usage and overall resposne time (for overhead). I will try
with a job with complex DAG soon.
Couple of minor comments.
1. Can you please remove {{e.printStackTrace()}} statements
2. In OrderedGroupedKVInput, is it possible to move {{shuffledInputs,
shuffledBytes}} to avoid the lookup call, as getProgress could be in the hot
path.
3. In OrderedGroupedKVInput, progress split up by into two (0.5f). Is this
added to account for skew?
4. SimpleProcessor - wouldn't having 50ms refresh time for scheduling be too
aggressive? Is it possible to increase it?. Same in SleepProcessor (but here it
might be ok as it in examples)
> Speculative execution starts too early due to 0 progress
> --------------------------------------------------------
>
> Key: TEZ-3317
> URL: https://issues.apache.org/jira/browse/TEZ-3317
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Jonathan Eagles
> Assignee: Kuhu Shukla
> Attachments: TEZ-3317.001.patch, TEZ-3317.002.patch,
> TEZ-3317.003.patch, TEZ-3317.004.patch, TEZ-3317.005.patch, TEZ-3317.006.patch
>
>
> Don't know at this point if this is a tez or a PigProcessor issue. There is
> some setProgress chain that is keeping task progress from being correctly
> reported. Task status is always zero, so as soon as the first task finishes,
> tasks up to the speculation limit are always launched.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)