[
https://issues.apache.org/jira/browse/TEZ-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971719#comment-16971719
]
Jonathan Turner Eagles commented on TEZ-4067:
---------------------------------------------
[~ahussein],
Overall this will be a great feature for speculative execution. Thank for the
patch.
Overall the code looks good. As to Object design, I would like to suggest a
change and see if you agree with it. Before the patch, the DAGAppMaster knew
about services and the Vertex class. After the patch, the DAGAppMaster adds
knowledge about the VertexImpl and LegacySpeculator classes. Could we abstract
that knowledge away to improve design. For example, would it be better if
Vertex (or perhaps VetexImpl if needed) added a "getDependentServices" api.
This would allow the DAGAppMaster to add the dependent services and keep the
knowledge out of the DAGAppMaster that the service is a LegacySpeculator class.
This would also allow for other dependent services in the future. Let me know
if this is possible or what prevents this from being possible.
> Tez Speculation decision is calculated on each update by the dispatcher
> -----------------------------------------------------------------------
>
> Key: TEZ-4067
> URL: https://issues.apache.org/jira/browse/TEZ-4067
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Ahmed Hussein
> Assignee: Ahmed Hussein
> Priority: Minor
> Attachments: TEZ-4067.001.patch, TEZ-4067.002.patch,
> TEZ-4067.003.patch, TEZ-4067.004.patch
>
>
> LegacySpeculator is an object field in VertexImpl. Therefore, all events are
> handled synchronously by the caller (dispatcher). This implies the following:
> # the dispatcher spends long time executing updateStatus as it needs to
> check the runtime estimation of the tezAttempts within the vertex.
> # the speculator is per stage: lunching a speculation may not the optimum
> decision. Ideally, based on resources, speculated tasks should be the ones
> with slowest progress.
> # the time between speculation is skewed because there is a big delay for
> the dispatcher to complete a full cycle. Also, speculation will be more
> aggressive compared to MR because MR waits for
> "soonest.retry.after.speculate" whenever a task is speculated. On the other
> hand, Tez speculates more tasks as it processes stages in parallel.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)