This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 70ec696bce7 [SPARK-39798][SQL] Replcace `toSeq.toArray` with `.toArray[Any]` in constructor of `GenericArrayData` 70ec696bce7 is described below commit 70ec696bce7012b25ed6d8acec5e2f3b3e127f11 Author: yangjie01 <yangji...@baidu.com> AuthorDate: Tue Jul 19 08:41:50 2022 -0500 [SPARK-39798][SQL] Replcace `toSeq.toArray` with `.toArray[Any]` in constructor of `GenericArrayData` ### What changes were proposed in this pull request? There are many `Array.toSeq.toArray` calls in the constructor, this pr simplifies them to `Array.toArray[Any]` ### Why are the changes needed? - For Scala 2.12, just a code simplification, `toSeq` return `thisCollection`, can be omitted directly - For Scala 2.13, removing `toSeq` can save an unnecessary memory copy ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Manual test: ``` mvn clean install -DskipTests -pl sql/core -am -Pscala-2.13 mvn clean test -pl sql/catalyst -Pscala-2.13 mvn clean test -pl sql/core -Pscala-2.13 -Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest ``` ``` Run completed in 8 minutes, 59 seconds. Total number of tests run: 6594 Suites: completed 286, aborted 0 Tests: succeeded 6594, failed 0, canceled 0, ignored 5, pending 0 All tests passed. Run completed in 1 hour, 51 minutes, 29 seconds. Total number of tests run: 11895 Suites: completed 526, aborted 0 Tests: succeeded 11895, failed 0, canceled 11, ignored 33, pending 0 All tests passed. ``` - Run `GenericArrayDataBenchmark` using GA with Scala 2.13: **Before** ``` OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz constructor: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ arrayOfAny 3 3 0 2992.7 0.3 1.0X arrayOfAnyAsObject 215 215 0 46.6 21.5 0.0X arrayOfAnyAsSeq 232 235 2 43.1 23.2 0.0X arrayOfInt 514 515 1 19.5 51.4 0.0X arrayOfIntAsObject 724 725 1 13.8 72.4 0.0X ``` **After** ``` OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz constructor: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ arrayOfAny 3 3 0 2992.8 0.3 1.0X arrayOfAnyAsObject 215 215 0 46.6 21.5 0.0X arrayOfAnyAsSeq 233 237 2 42.8 23.3 0.0X arrayOfInt 416 416 1 24.1 41.6 0.0X arrayOfIntAsObject 737 737 0 13.6 73.7 0.0X ``` Seems `arrayOfInt` scene has improvement when using Scala 2.13 Closes #37208 from LuciferYang/GenericArrayData-toArray. Authored-by: yangjie01 <yangji...@baidu.com> Signed-off-by: Sean Owen <sro...@gmail.com> --- .../spark/sql/catalyst/util/GenericArrayData.scala | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala index e46d730afb4..e566e659db2 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala @@ -28,23 +28,23 @@ class GenericArrayData(val array: Array[Any]) extends ArrayData { // Specified this as`scala.collection.Seq` because seqOrArray can be // `mutable.ArraySeq` in Scala 2.13 def this(seq: scala.collection.Seq[Any]) = this(seq.toArray) - def this(list: java.util.List[Any]) = this(list.asScala.toSeq) + def this(list: java.util.List[Any]) = this(list.asScala.toArray) // TODO: This is boxing. We should specialize. - def this(primitiveArray: Array[Int]) = this(primitiveArray.toSeq) - def this(primitiveArray: Array[Long]) = this(primitiveArray.toSeq) - def this(primitiveArray: Array[Float]) = this(primitiveArray.toSeq) - def this(primitiveArray: Array[Double]) = this(primitiveArray.toSeq) - def this(primitiveArray: Array[Short]) = this(primitiveArray.toSeq) - def this(primitiveArray: Array[Byte]) = this(primitiveArray.toSeq) - def this(primitiveArray: Array[Boolean]) = this(primitiveArray.toSeq) + def this(primitiveArray: Array[Int]) = this(primitiveArray.toArray[Any]) + def this(primitiveArray: Array[Long]) = this(primitiveArray.toArray[Any]) + def this(primitiveArray: Array[Float]) = this(primitiveArray.toArray[Any]) + def this(primitiveArray: Array[Double]) = this(primitiveArray.toArray[Any]) + def this(primitiveArray: Array[Short]) = this(primitiveArray.toArray[Any]) + def this(primitiveArray: Array[Byte]) = this(primitiveArray.toArray[Any]) + def this(primitiveArray: Array[Boolean]) = this(primitiveArray.toArray[Any]) def this(seqOrArray: Any) = this(seqOrArray match { // Specified this as`scala.collection.Seq` because seqOrArray can be // `mutable.ArraySeq` in Scala 2.13 case seq: scala.collection.Seq[Any] => seq.toArray case array: Array[Any] => array // array of objects, so no need to convert - case array: Array[_] => array.toSeq.toArray[Any] // array of primitives, so box them + case array: Array[_] => array.toArray[Any] // array of primitives, so box them }) override def copy(): ArrayData = { --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org