Zhen Fan created SPARK-27082:
--------------------------------
Summary: Dynamic Allocation: we should consider the scenario that
speculative task being killed and never resubmit
Key: SPARK-27082
URL: https://issues.apache.org/jira/browse/SPARK-27082
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.4.0
Reporter: Zhen Fan
Issue background:
When we enable dynamic allocation, we expect that the executors can be removed
appropriately, especially in some stages with data skew. With speculation
enabled, the copying task can be killed by the original task and vice versa.
In TaskSetManager, we set successful(index)=true, and never resubmit the killed
tasks. However, in ExecutorAllocationManager which is very related to the
dynamic allocation function, doesn't handle this scenario.
See SPARK-8366. However, (SPARK-8366) ignores one scenario that copying task is
being killed. When this happens, the TaskSetManager will mark the task index of
the stage as success and never resubmit the killed task, so here we shouldn't
treat it as pending task.
This can do harm to the computing of maxNumExecutorsNeeded, as a result, we
always retain unnecessary executors and waste the computing resources of
clusters.
Solution:
When the task index is marked as speculative and the mirror task is successful,
we won't treat it as pending task.
Code:
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]