[ https://issues.apache.org/jira/browse/SPARK-39766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-39766: ------------------------------------ Assignee: (was: Apache Spark) > For the `arrayOfAnyAsSeq` scenario in `GenericArrayDataBenchmark`, using > Scala 2.13 is slower than Scala 2.12 > ------------------------------------------------------------------------------------------------------------- > > Key: SPARK-39766 > URL: https://issues.apache.org/jira/browse/SPARK-39766 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.4.0 > Reporter: Yang Jie > Priority: Minor > > Run `GenericArrayDataBenchmark` with Scala 2.13 and 2.12, for the > `arrayOfAnyAsSeq` scenario in `GenericArrayDataBenchmark`, using Scala 2.13 > is slower than Scala 2.12: > *Scala 2.12* > {code:java} > OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Linux 5.13.0-1021-azure > Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz > constructor: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------------------------------ > arrayOfAnyAsSeq 25 29 > 2 395.1 2.5 0.1X{code} > *Scala 2.13* > {code:java} > OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure > Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz > constructor: Best Time(ms) Avg Time(ms) > Stdev(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------------------------------ > arrayOfAnyAsSeq 241 243 > 1 41.4 24.1 0.0X {code} > the test code as follows: > {code:java} > benchmark.addCase("arrayOfAnyAsSeq") { _ => > val arr: Seq[Any] = new Array[Any](arraySize) > var n = 0 > while (n < valuesPerIteration) { > new GenericArrayData(arr) > n += 1 > } > } {code} > the constructor of GenericArrayData as follows: > {code:java} > def this(seq: scala.collection.Seq[Any]) = this(seq.toArray) {code} > > The performance difference is due to the following reasons: > *When using Scala 2.12:* > The class type of `arr` is `s.c.mutable.WrappedArrayWrappedArray$ofRef`, > `toArray` return `array.asInstanceOf[Array[U]]`, there is no memory copy. > *When using Scala 2.13:* > The class type of `arr` is `s.c.immutable.ArraySeq$ofRef`, `toArray` will > call `IterableOnceOps#toArray`, the corresponding implementation uses memory > copy. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org