[
https://issues.apache.org/jira/browse/SPARK-54000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18032413#comment-18032413
]
lifulong edited comment on SPARK-54000 at 10/23/25 8:51 AM:
------------------------------------------------------------
!https://wiki.in.zhihu.com/download/attachments/640447372/image2025-10-2_13-32-26.png?version=1&modificationDate=1759383146774&api=v2!
from flame graph we can see, most time cost is setNullAt call when enable whole
stage code gen and not add -XX:-TieredCompilation jvm parameter
was (Author: lifulong):
!https://wiki.in.zhihu.com/download/attachments/640447372/image2025-10-2_11-52-55.png?version=1&modificationDate=1759377176124&api=v2!
> Complex sql with expand operator and code gen enabled, very slow
> ----------------------------------------------------------------
>
> Key: SPARK-54000
> URL: https://issues.apache.org/jira/browse/SPARK-54000
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.5.2
> Environment: spark sql 3.5.2
> Reporter: lifulong
> Priority: Major
>
> Complex sql with expand operator and code gen enabled, very slow
> sql format like select keya,keyb,count(distinct case when),...,count(distinct
> case when),sum(a),sum(b) from x group by keya,keyb
> when disable whole stage code gen, run will speed up 20x times
> when add executor jvm parameter -XX:-TieredCompilation, run will speed up 20x
> times
> reduce select column count, such as 28 -> 27, can speed up 10x times
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]