[
https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fei Wang updated SPARK-20184:
-----------------------------
Description:
Execute flowing sql with spark 2.x when codegen enabled, the performance is
much worse than the case when turn off codegen.
SELECT
sum(COUNTER_57)
,sum(COUNTER_71)
,sum(COUNTER_3)
,sum(COUNTER_70)
,sum(COUNTER_66)
,sum(COUNTER_75)
,sum(COUNTER_69)
,sum(COUNTER_55)
,sum(COUNTER_63)
,sum(COUNTER_68)
,sum(COUNTER_56)
,sum(COUNTER_37)
,sum(COUNTER_51)
,sum(COUNTER_42)
,sum(COUNTER_43)
,sum(COUNTER_1)
,sum(COUNTER_76)
,sum(COUNTER_54)
,sum(COUNTER_44)
,sum(COUNTER_46)
,DIM_1
,DIM_2
,DIM_3
FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100;
codegen on: 40s
codegen off: 6s
after some analysis, i think this is related to the huge java method which
generated when codegen on. And If i config -XX:-DontCompileHugeMethods the
performance of codegen on get much better.
was:
Execute flowing sql with spark 2.x when codegen enabled, the performance is
muchworse than the case when turn off codegen.
SELECT
sum(COUNTER_57)
,sum(COUNTER_71)
,sum(COUNTER_3)
,sum(COUNTER_70)
,sum(COUNTER_66)
,sum(COUNTER_75)
,sum(COUNTER_69)
,sum(COUNTER_55)
,sum(COUNTER_63)
,sum(COUNTER_68)
,sum(COUNTER_56)
,sum(COUNTER_37)
,sum(COUNTER_51)
,sum(COUNTER_42)
,sum(COUNTER_43)
,sum(COUNTER_1)
,sum(COUNTER_76)
,sum(COUNTER_54)
,sum(COUNTER_44)
,sum(COUNTER_46)
,DIM_1
,DIM_2
,DIM_3
FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100;
codegen on: 40s
codegen off: 6s
after some analysis, i think this is related to the huge java method which
generated when codegen on. And If i config -XX:-DontCompileHugeMethods the
performance of codegen on get much better.
> performance regression for complex sql when enable codegen
> ----------------------------------------------------------
>
> Key: SPARK-20184
> URL: https://issues.apache.org/jira/browse/SPARK-20184
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 1.6.0, 2.1.0
> Reporter: Fei Wang
>
> Execute flowing sql with spark 2.x when codegen enabled, the performance is
> much worse than the case when turn off codegen.
> SELECT
> sum(COUNTER_57)
> ,sum(COUNTER_71)
> ,sum(COUNTER_3)
> ,sum(COUNTER_70)
> ,sum(COUNTER_66)
> ,sum(COUNTER_75)
> ,sum(COUNTER_69)
> ,sum(COUNTER_55)
> ,sum(COUNTER_63)
> ,sum(COUNTER_68)
> ,sum(COUNTER_56)
> ,sum(COUNTER_37)
> ,sum(COUNTER_51)
> ,sum(COUNTER_42)
> ,sum(COUNTER_43)
> ,sum(COUNTER_1)
> ,sum(COUNTER_76)
> ,sum(COUNTER_54)
> ,sum(COUNTER_44)
> ,sum(COUNTER_46)
> ,DIM_1
> ,DIM_2
> ,DIM_3
> FROM aggtable group by DIM_1, DIM_2, DIM_3 limit 100;
> codegen on: 40s
> codegen off: 6s
> after some analysis, i think this is related to the huge java method which
> generated when codegen on. And If i config -XX:-DontCompileHugeMethods the
> performance of codegen on get much better.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]