Aleksandar Ivanovic created SPARK-49652:
-------------------------------------------

             Summary: WholeStageCodegen lasts up to 47min
                 Key: SPARK-49652
                 URL: https://issues.apache.org/jira/browse/SPARK-49652
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.4.1
         Environment: EMR 6.15.0 Spark 3.4.1
            Reporter: Aleksandar Ivanovic


Using Spark 3.4.1 on EMR 6.15.0 and I've noticed that job is running long while 
all jobs and stages are showing small times <1min but the SQL/DataFrame  would 
show Completed Queries and I can find that there is a query that runs long and 
the issue is that WholeStageCodegen can last up to 47min.  The step was Project 
step and min time is 146ms, med is 232ms and max is 47.2min. 
While I can do a rewrite of the logic (it is group by with 6 min/max 
aggregation and 1 collect_list) I was surprised to learn that there is no 
timeout to take out executor that is spending 47min in WholeStageCodegen stage. 
One ask would be to have a threshold for how long WholeStageCodegen can last, 
say 500ms and another ask is to automatically detect long lasting 
WholeStageCodegen steps



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to