Fei Wang created SPARK-20184:
--------------------------------
Summary: performance regression for complex sql when enable codegen
Key: SPARK-20184
URL: https://issues.apache.org/jira/browse/SPARK-20184
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.1.0, 1.6.0
Reporter: Fei Wang
Execute flowing sql with spark 2.x when codegen enabled, the performance is
muchworse than the case when turn off codegen.
SELECT
sum(COUNTER_57)
,sum(COUNTER_71)
,sum(COUNTER_3)
,sum(COUNTER_70)
,sum(COUNTER_66)
,sum(COUNTER_75)
,sum(COUNTER_69)
,sum(COUNTER_55)
,sum(COUNTER_63)
,sum(COUNTER_68)
,sum(COUNTER_56)
,sum(COUNTER_37)
,sum(COUNTER_51)
,sum(COUNTER_42)
,sum(COUNTER_43)
,sum(COUNTER_1)
,sum(COUNTER_76)
,sum(COUNTER_54)
,sum(COUNTER_44)
,sum(COUNTER_46)
,DIM_1
,DIM_2
,DIM_3
FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100;
codegen on: 40s
codegen off: 6s
after some analysis, i think this is related to the huge java method which
generated when codegen on. And If i config -XX:-DontCompileHugeMethods the
performance of codegen on get much better.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]