[
https://issues.apache.org/jira/browse/SPARK-39766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566591#comment-17566591
]
Apache Spark commented on SPARK-39766:
--------------------------------------
User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/37185
> For the `arrayOfAnyAsSeq` scenario in `GenericArrayDataBenchmark`, using
> Scala 2.13 is slower than Scala 2.12
> -------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-39766
> URL: https://issues.apache.org/jira/browse/SPARK-39766
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.4.0
> Reporter: Yang Jie
> Priority: Minor
>
> Run `GenericArrayDataBenchmark` with Scala 2.13 and 2.12, for the
> `arrayOfAnyAsSeq` scenario in `GenericArrayDataBenchmark`, using Scala 2.13
> is slower than Scala 2.12:
> *Scala 2.12*
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.13.0-1021-azure
> Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
> constructor: Best Time(ms) Avg Time(ms)
> Stdev(ms) Rate(M/s) Per Row(ns) Relative
> ------------------------------------------------------------------------------------------------------------------------
> arrayOfAnyAsSeq 25 29
> 2 395.1 2.5 0.1X{code}
> *Scala 2.13*
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
> Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
> constructor: Best Time(ms) Avg Time(ms)
> Stdev(ms) Rate(M/s) Per Row(ns) Relative
> ------------------------------------------------------------------------------------------------------------------------
> arrayOfAnyAsSeq 241 243
> 1 41.4 24.1 0.0X {code}
> the test code as follows:
> {code:java}
> benchmark.addCase("arrayOfAnyAsSeq") { _ =>
> val arr: Seq[Any] = new Array[Any](arraySize)
> var n = 0
> while (n < valuesPerIteration) {
> new GenericArrayData(arr)
> n += 1
> }
> } {code}
> the constructor of GenericArrayData as follows:
> {code:java}
> def this(seq: scala.collection.Seq[Any]) = this(seq.toArray) {code}
>
> The performance difference is due to the following reasons:
> *When using Scala 2.12:*
> The class type of `arr` is `s.c.mutable.WrappedArrayWrappedArray$ofRef`,
> `toArray` return `array.asInstanceOf[Array[U]]`, there is no memory copy.
> *When using Scala 2.13:*
> The class type of `arr` is `s.c.immutable.ArraySeq$ofRef`, `toArray` will
> call `IterableOnceOps#toArray`, the corresponding implementation uses memory
> copy.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]