[
https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fei Wang updated SPARK-20184:
-----------------------------
Summary: performance regression for complex/long sql when enable codegen
(was: performance regression for complex sql when enable codegen)
> performance regression for complex/long sql when enable codegen
> ---------------------------------------------------------------
>
> Key: SPARK-20184
> URL: https://issues.apache.org/jira/browse/SPARK-20184
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 1.6.0, 2.1.0
> Reporter: Fei Wang
>
> Execute flowing sql with spark 2.x when codegen enabled, the performance is
> much worse than the case when turn off codegen.
> SELECT
> sum(COUNTER_57)
> ,sum(COUNTER_71)
> ,sum(COUNTER_3)
> ,sum(COUNTER_70)
> ,sum(COUNTER_66)
> ,sum(COUNTER_75)
> ,sum(COUNTER_69)
> ,sum(COUNTER_55)
> ,sum(COUNTER_63)
> ,sum(COUNTER_68)
> ,sum(COUNTER_56)
> ,sum(COUNTER_37)
> ,sum(COUNTER_51)
> ,sum(COUNTER_42)
> ,sum(COUNTER_43)
> ,sum(COUNTER_1)
> ,sum(COUNTER_76)
> ,sum(COUNTER_54)
> ,sum(COUNTER_44)
> ,sum(COUNTER_46)
> ,DIM_1
> ,DIM_2
> ,DIM_3
> FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100;
> codegen on: 40s
> codegen off: 6s
> after some analysis, i think this is related to the huge java method(a java
> method thousand of lines) which generated when codegen on. And If i config
> -XX:-DontCompileHugeMethods the performance get much better.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]