[
https://issues.apache.org/jira/browse/SPARK-32690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yang Jie updated SPARK-32690:
-----------------------------
Description:
I found that [Spark-32550|https://github.com/apache/spark/pull/29366] affected
the performance of some cases, the typical cases is "deterministic cardinality
estimation" in
HyperLogLogPlusPlusSuite when rsd is 0.001, we found the code that is
significantly slower is
[https://github.com/apache/spark/blob/08b951b1cb58cea2c34703e43202fe7c84725c8a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlusSuite.scala#L41]
!image-2020-08-24-19-30-55-380.png!
The results of comparison before and after spark-32550 merged are as follows:
| |After SPARK-32550 create createBuffer|After SPARK-32550 end to end |Before
SPARK-32550 create input|Before SPARK-32550 end to end |
|rsd 0.001, n 1000|52715513243|53004810687|195807999|773977677|
|rsd 0.001, n 5000|51881246165|52519358215|13689949|249974855|
|rsd 0.001, n 10000|52234282788|52374639172|14199071|183452846|
|rsd 0.001, n 50000|55503517122|55664035449|15219394|584477125|
|rsd 0.001, n 100000|51862662845|52116774177|19662834|166483678|
|rsd 0.001, n 500000|51619226715|52183189526|178048012|16681330|
|rsd 0.001, n 1000000|54861366981|54976399142|226178708|18826340|
|rsd 0.001, n 5000000|52023602143|52354615149|388173579|15446409|
|rsd 0.001, n 10000000|53008591660|53601392304|533454460|16033032|
was:
I found that [Spark-32550|https://github.com/apache/spark/pull/29366] affected
the performance of some cases, the typical cases is "deterministic cardinality
estimation" in
HyperLogLogPlusPlusSuite when rsd is 0.001, we found the code that is
significantly slower is
[https://github.com/apache/spark/blob/08b951b1cb58cea2c34703e43202fe7c84725c8a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlusSuite.scala#L41]
The results of comparison before and after spark-32550 merged are as follows:
| |After SPARK-32550 create createBuffer|After SPARK-32550 end to end |Before
SPARK-32550 create input|Before SPARK-32550 end to end |
|rsd 0.001, n 1000|52715513243|53004810687|195807999|773977677|
|rsd 0.001, n 5000|51881246165|52519358215|13689949|249974855|
|rsd 0.001, n 10000|52234282788|52374639172|14199071|183452846|
|rsd 0.001, n 50000|55503517122|55664035449|15219394|584477125|
|rsd 0.001, n 100000|51862662845|52116774177|19662834|166483678|
|rsd 0.001, n 500000|51619226715|52183189526|178048012|16681330|
|rsd 0.001, n 1000000|54861366981|54976399142|226178708|18826340|
|rsd 0.001, n 5000000|52023602143|52354615149|388173579|15446409|
|rsd 0.001, n 10000000|53008591660|53601392304|533454460|16033032|
> Spark-32550 affects the performance of some cases
> -------------------------------------------------
>
> Key: SPARK-32690
> URL: https://issues.apache.org/jira/browse/SPARK-32690
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.1.0
> Reporter: Yang Jie
> Priority: Major
> Attachments: image-2020-08-24-19-30-17-712.png,
> image-2020-08-24-19-30-55-380.png
>
>
> I found that [Spark-32550|https://github.com/apache/spark/pull/29366]
> affected the performance of some cases, the typical cases is "deterministic
> cardinality estimation" in
> HyperLogLogPlusPlusSuite when rsd is 0.001, we found the code that is
> significantly slower is
>
> [https://github.com/apache/spark/blob/08b951b1cb58cea2c34703e43202fe7c84725c8a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlusSuite.scala#L41]
>
> !image-2020-08-24-19-30-55-380.png!
>
> The results of comparison before and after spark-32550 merged are as follows:
> | |After SPARK-32550 create createBuffer|After SPARK-32550 end to end |Before
> SPARK-32550 create input|Before SPARK-32550 end to end |
> |rsd 0.001, n 1000|52715513243|53004810687|195807999|773977677|
> |rsd 0.001, n 5000|51881246165|52519358215|13689949|249974855|
> |rsd 0.001, n 10000|52234282788|52374639172|14199071|183452846|
> |rsd 0.001, n 50000|55503517122|55664035449|15219394|584477125|
> |rsd 0.001, n 100000|51862662845|52116774177|19662834|166483678|
> |rsd 0.001, n 500000|51619226715|52183189526|178048012|16681330|
> |rsd 0.001, n 1000000|54861366981|54976399142|226178708|18826340|
> |rsd 0.001, n 5000000|52023602143|52354615149|388173579|15446409|
> |rsd 0.001, n 10000000|53008591660|53601392304|533454460|16033032|
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]