[
https://issues.apache.org/jira/browse/TEZ-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14205562#comment-14205562
]
Hitesh Shah commented on TEZ-14:
--------------------------------
Comments:
- not sure how this is really helping - VertexEventTaskAttemptStatusUpdate.
It has a justStarted boolean field which is a bit of a concern. I also have a
worry about how this event is created, the timelag on when it was created and
when it is actually seen by a consumer. Plus, the additional "// TODO send copy
of status to prevent race???".
- VertexEventType.V_TASK_ATTEMPT_STATUS_UPDATE may need to be other
transitions too. For example, TaskCompletedAfterVertexSuccessTransition
{code}
if (vertex.conf.getBoolean(TezConfiguration.TEZ_AM_SPECULATION_ENABLED,
+ TezConfiguration.TEZ_AM_SPECULATION_ENABLED_DEFAULT)) {
{code}
- change to the flag to be a field of VertexImpl?
- no docs and tests added for DataStatistics
- MAX_WAITTING_TIME_FOR_HEARTBEAT - hardcoded instead of relying on a multiple
of configured heartbeat value - likewise for some other properties.
- Commented out code in LegacySpeculator shoudl be removed.
{code}
+ public long estimatedNewAttemptRuntime(TezTaskID id) {
+ return (long)mapperStatistics.mean();
+ }
{code}
- why does this need a taskId param?
{code}
+ float progress = taskAttempt.getProgress();
{code}
- today, the inputs and outputs are not wired to report progress. This may
create a problem where the reducer fetch will trigger speculations.
{code}
+ // If we are here, there's at most one task attempt.
+ if (numberRunningAttempts == 0) {
+ return NOT_RUNNING;
+ }
+
{code}
- couldnt this just rely on task state which goes into running only after the
first attempt heartbeats back?
- is an interface for TaskRuntimeEstimator needed?
- most new classes could use some overall docs
- additional docs in the speculator to explain logic would help
Other general concerns:
- how to disable sending attempt status events if speculation is disabled
- The status update may be read very late by the speculator if there is an
event backlog in the AM - should the speculator just query the task for its
state instead of relying on the event?
> Support for speculation of slow tasks
> -------------------------------------
>
> Key: TEZ-14
> URL: https://issues.apache.org/jira/browse/TEZ-14
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Bikas Saha
> Assignee: Bikas Saha
> Attachments: TEZ-14.1.patch, TEZ-14.2.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)