Aleksandar Ivanovic created SPARK-49652:
-------------------------------------------
Summary: WholeStageCodegen lasts up to 47min
Key: SPARK-49652
URL: https://issues.apache.org/jira/browse/SPARK-49652
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.4.1
Environment: EMR 6.15.0 Spark 3.4.1
Reporter: Aleksandar Ivanovic
Using Spark 3.4.1 on EMR 6.15.0 and I've noticed that job is running long while
all jobs and stages are showing small times <1min but the SQL/DataFrame would
show Completed Queries and I can find that there is a query that runs long and
the issue is that WholeStageCodegen can last up to 47min. The step was Project
step and min time is 146ms, med is 232ms and max is 47.2min.
While I can do a rewrite of the logic (it is group by with 6 min/max
aggregation and 1 collect_list) I was surprised to learn that there is no
timeout to take out executor that is spending 47min in WholeStageCodegen stage.
One ask would be to have a threshold for how long WholeStageCodegen can last,
say 500ms and another ask is to automatically detect long lasting
WholeStageCodegen steps
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]