[
https://issues.apache.org/jira/browse/TEZ-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977859#comment-16977859
]
Jonathan Turner Eagles commented on TEZ-4067:
---------------------------------------------
Closer, as the DAGAppMaster no longer has knowledge about the LegacySpeculator.
There are still a few things to fix to get full encapsulation.
* All references to speculators need to be abstracted away.
{code}
// Stop speculators if any
stopSpeculators(currentDAG);
{code}
Should be something like this
{code}
// Stop dependent services
stopDependentServices(currentDAG);
{code}
Similar for the following code should change references to speculators to
dependent services
{code}
+ // If we reach here, then we have recoverable DAG and we need to
reinitialize the speculators.
+ // start speculators of the recovered DAG
+ startSpeculators(currentDAG);
{code}
We need to avoid calling isSpeculationEnabled() and getSpeculator() and
startSpeculator(). Instead List<AbstractService> getDependentServices. The
vertex can return include the speculator in the dependent services is
speculation is enabled.
Do we need to call startSpeculator at all? As a dependent service, startService
will be called automatically. Similarly do we need a launch function at all?
I'm a little worried that launch will start a thread and the startService will
be called and launch another thread. Perhaps the state of the service will
prevent this. Could you explain the reasoning for calling launch manually
instead of relying on startServices to be called automatically?
{code}
+ private void startSpeculators(DAG dag) {
+ for (Vertex v : dag.getVertices().values()) {
+ if (!v.isSpeculationEnabled()) {
+ continue;
+ }
+ if (v.startSpeculator()) {
+ addIfService(v.getSpeculator(), false);
+ }
+ }
+ }
+
+ private Exception stopSpeculators(DAG dag) {
+ Exception firstException = null;
+ for (Vertex v : dag.getVertices().values()) {
+ if (!v.isSpeculationEnabled()) {
+ continue;
+ }
+
+ Exception ex = v.stopSpeculator();
+ if (ex != null && firstException == null) {
+ firstException = ex;
+ continue;
+ }
+ // remove the speculator service from the list of services
+ services.remove(v.getSpeculator());
+ }
+ return firstException;
+ }
{code}
> Tez Speculation decision is calculated on each update by the dispatcher
> -----------------------------------------------------------------------
>
> Key: TEZ-4067
> URL: https://issues.apache.org/jira/browse/TEZ-4067
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Ahmed Hussein
> Assignee: Ahmed Hussein
> Priority: Minor
> Attachments: TEZ-4067.001.patch, TEZ-4067.002.patch,
> TEZ-4067.003.patch, TEZ-4067.004.patch, TEZ-4067.005.patch
>
>
> LegacySpeculator is an object field in VertexImpl. Therefore, all events are
> handled synchronously by the caller (dispatcher). This implies the following:
> # the dispatcher spends long time executing updateStatus as it needs to
> check the runtime estimation of the tezAttempts within the vertex.
> # the speculator is per stage: lunching a speculation may not the optimum
> decision. Ideally, based on resources, speculated tasks should be the ones
> with slowest progress.
> # the time between speculation is skewed because there is a big delay for
> the dispatcher to complete a full cycle. Also, speculation will be more
> aggressive compared to MR because MR waits for
> "soonest.retry.after.speculate" whenever a task is speculated. On the other
> hand, Tez speculates more tasks as it processes stages in parallel.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)