This is an automated email from the ASF dual-hosted git repository.
srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 70ec696bce7 [SPARK-39798][SQL] Replcace `toSeq.toArray` with
`.toArray[Any]` in constructor of `GenericArrayData`
70ec696bce7 is described below
commit 70ec696bce7012b25ed6d8acec5e2f3b3e127f11
Author: yangjie01 <[email protected]>
AuthorDate: Tue Jul 19 08:41:50 2022 -0500
[SPARK-39798][SQL] Replcace `toSeq.toArray` with `.toArray[Any]` in
constructor of `GenericArrayData`
### What changes were proposed in this pull request?
There are many `Array.toSeq.toArray` calls in the constructor, this pr
simplifies them to `Array.toArray[Any]`
### Why are the changes needed?
- For Scala 2.12, just a code simplification, `toSeq` return
`thisCollection`, can be omitted directly
- For Scala 2.13, removing `toSeq` can save an unnecessary memory copy
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Pass GitHub Actions
- Manual test:
```
mvn clean install -DskipTests -pl sql/core -am -Pscala-2.13
mvn clean test -pl sql/catalyst -Pscala-2.13
mvn clean test -pl sql/core -Pscala-2.13
-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest
```
```
Run completed in 8 minutes, 59 seconds.
Total number of tests run: 6594
Suites: completed 286, aborted 0
Tests: succeeded 6594, failed 0, canceled 0, ignored 5, pending 0
All tests passed.
Run completed in 1 hour, 51 minutes, 29 seconds.
Total number of tests run: 11895
Suites: completed 526, aborted 0
Tests: succeeded 11895, failed 0, canceled 11, ignored 33, pending 0
All tests passed.
```
- Run `GenericArrayDataBenchmark` using GA with Scala 2.13:
**Before**
```
OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz
constructor: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
arrayOfAny 3 3
0 2992.7 0.3 1.0X
arrayOfAnyAsObject 215 215
0 46.6 21.5 0.0X
arrayOfAnyAsSeq 232 235
2 43.1 23.2 0.0X
arrayOfInt 514 515
1 19.5 51.4 0.0X
arrayOfIntAsObject 724 725
1 13.8 72.4 0.0X
```
**After**
```
OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
Intel(R) Xeon(R) Platinum 8272CL CPU 2.60GHz
constructor: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
arrayOfAny 3 3
0 2992.8 0.3 1.0X
arrayOfAnyAsObject 215 215
0 46.6 21.5 0.0X
arrayOfAnyAsSeq 233 237
2 42.8 23.3 0.0X
arrayOfInt 416 416
1 24.1 41.6 0.0X
arrayOfIntAsObject 737 737
0 13.6 73.7 0.0X
```
Seems `arrayOfInt` scene has improvement when using Scala 2.13
Closes #37208 from LuciferYang/GenericArrayData-toArray.
Authored-by: yangjie01 <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
---
.../spark/sql/catalyst/util/GenericArrayData.scala | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala
index e46d730afb4..e566e659db2 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala
@@ -28,23 +28,23 @@ class GenericArrayData(val array: Array[Any]) extends
ArrayData {
// Specified this as`scala.collection.Seq` because seqOrArray can be
// `mutable.ArraySeq` in Scala 2.13
def this(seq: scala.collection.Seq[Any]) = this(seq.toArray)
- def this(list: java.util.List[Any]) = this(list.asScala.toSeq)
+ def this(list: java.util.List[Any]) = this(list.asScala.toArray)
// TODO: This is boxing. We should specialize.
- def this(primitiveArray: Array[Int]) = this(primitiveArray.toSeq)
- def this(primitiveArray: Array[Long]) = this(primitiveArray.toSeq)
- def this(primitiveArray: Array[Float]) = this(primitiveArray.toSeq)
- def this(primitiveArray: Array[Double]) = this(primitiveArray.toSeq)
- def this(primitiveArray: Array[Short]) = this(primitiveArray.toSeq)
- def this(primitiveArray: Array[Byte]) = this(primitiveArray.toSeq)
- def this(primitiveArray: Array[Boolean]) = this(primitiveArray.toSeq)
+ def this(primitiveArray: Array[Int]) = this(primitiveArray.toArray[Any])
+ def this(primitiveArray: Array[Long]) = this(primitiveArray.toArray[Any])
+ def this(primitiveArray: Array[Float]) = this(primitiveArray.toArray[Any])
+ def this(primitiveArray: Array[Double]) = this(primitiveArray.toArray[Any])
+ def this(primitiveArray: Array[Short]) = this(primitiveArray.toArray[Any])
+ def this(primitiveArray: Array[Byte]) = this(primitiveArray.toArray[Any])
+ def this(primitiveArray: Array[Boolean]) = this(primitiveArray.toArray[Any])
def this(seqOrArray: Any) = this(seqOrArray match {
// Specified this as`scala.collection.Seq` because seqOrArray can be
// `mutable.ArraySeq` in Scala 2.13
case seq: scala.collection.Seq[Any] => seq.toArray
case array: Array[Any] => array // array of objects, so no need to convert
- case array: Array[_] => array.toSeq.toArray[Any] // array of primitives,
so box them
+ case array: Array[_] => array.toArray[Any] // array of primitives, so box
them
})
override def copy(): ArrayData = {
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]