Thomas Graves created SPARK-30831:
-------------------------------------
Summary: Executors UI shows more active tasks then possible
Key: SPARK-30831
URL: https://issues.apache.org/jira/browse/SPARK-30831
Project: Spark
Issue Type: Bug
Components: Web UI
Affects Versions: 3.0.0
Reporter: Thomas Graves
I regularly see the executors web ui showing more active tasks then it has
cores. Looking at the code it seems that we track those separately and the
message that is sent for task end is asynchronous and thus ends up showing up
at the UI much later then the start event.
CoarseGrainedSchedulerBackend on statusUpdate increases the freeCores which
then allow scheduler to assign another task, but the taskEndEvent is
asynchronous.
We definitely don't want to slow down the scheduling part so not sure how
easily it will be to improve.
To reproduce I just ran:
val df = sc.makeRDD(1 to 10000000, 6).toDF
val df2 = sc.makeRDD(1 to 10000000, 6).toDF
spark.time(df.select( $"value" as "a").join(df2.select($"value" as "b"), $"a"
=== $"b").write.mode("overwrite").csv("somefile"))
And view the executors ui page. I started spark-shell with just 1 core per
executor and you see 2 active tasks
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]