[spark] branch master updated: [SPARK-39798][SQL] Replcace `toSeq.toArray` with `.toArray[Any]` in constructor of `GenericArrayData`

srowen Tue, 19 Jul 2022 06:42:25 -0700

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 70ec696bce7 [SPARK-39798][SQL] Replcace `toSeq.toArray` with 
`.toArray[Any]` in constructor of `GenericArrayData`
70ec696bce7 is described below

commit 70ec696bce7012b25ed6d8acec5e2f3b3e127f11
Author: yangjie01 <yangji...@baidu.com>
AuthorDate: Tue Jul 19 08:41:50 2022 -0500

    [SPARK-39798][SQL] Replcace `toSeq.toArray` with `.toArray[Any]` in 
constructor of `GenericArrayData`
    
    ### What changes were proposed in this pull request?
    There are many `Array.toSeq.toArray` calls in the constructor, this pr 
simplifies them to `Array.toArray[Any]`
    
    ### Why are the changes needed?
    
    - For Scala 2.12, just a code simplification, `toSeq` return 
`thisCollection`, can be omitted directly
    - For Scala 2.13, removing `toSeq` can save an unnecessary memory copy
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    
    - Pass GitHub Actions
    - Manual test：
    
    ```
    mvn clean install -DskipTests -pl sql/core -am -Pscala-2.13
    mvn clean test -pl sql/catalyst -Pscala-2.13
    mvn clean test -pl sql/core -Pscala-2.13 
-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest
    ```
    
    ```
    Run completed in 8 minutes, 59 seconds.
    Total number of tests run: 6594
    Suites: completed 286, aborted 0
    Tests: succeeded 6594, failed 0, canceled 0, ignored 5, pending 0
    All tests passed.
    
    Run completed in 1 hour, 51 minutes, 29 seconds.
    Total number of tests run: 11895
    Suites: completed 526, aborted 0
    Tests: succeeded 11895, failed 0, canceled 11, ignored 33, pending 0
    All tests passed.
    ```
    
    - Run `GenericArrayDataBenchmark` using GA with Scala 2.13:
    
    **Before**
    
    ```
    OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
    Intel(R) Xeon(R) Platinum 8272CL CPU  2.60GHz
    constructor:                              Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------------------------------
    arrayOfAny                                            3              3      
     0       2992.7           0.3       1.0X
    arrayOfAnyAsObject                                  215            215      
     0         46.6          21.5       0.0X
    arrayOfAnyAsSeq                                     232            235      
     2         43.1          23.2       0.0X
    arrayOfInt                                          514            515      
     1         19.5          51.4       0.0X
    arrayOfIntAsObject                                  724            725      
     1         13.8          72.4       0.0X
    ```
    **After**
    
    ```
    OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1031-azure
    Intel(R) Xeon(R) Platinum 8272CL CPU  2.60GHz
    constructor:                              Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------------------------------
    arrayOfAny                                            3              3      
     0       2992.8           0.3       1.0X
    arrayOfAnyAsObject                                  215            215      
     0         46.6          21.5       0.0X
    arrayOfAnyAsSeq                                     233            237      
     2         42.8          23.3       0.0X
    arrayOfInt                                          416            416      
     1         24.1          41.6       0.0X
    arrayOfIntAsObject                                  737            737      
     0         13.6          73.7       0.0X
    ```
    
    Seems `arrayOfInt` scene has improvement when using Scala 2.13
    
    Closes #37208 from LuciferYang/GenericArrayData-toArray.
    
    Authored-by: yangjie01 <yangji...@baidu.com>
    Signed-off-by: Sean Owen <sro...@gmail.com>
---
 .../spark/sql/catalyst/util/GenericArrayData.scala     | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala
index e46d730afb4..e566e659db2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala
@@ -28,23 +28,23 @@ class GenericArrayData(val array: Array[Any]) extends 
ArrayData {
   // Specified this as`scala.collection.Seq` because seqOrArray can be
   // `mutable.ArraySeq` in Scala 2.13
   def this(seq: scala.collection.Seq[Any]) = this(seq.toArray)
-  def this(list: java.util.List[Any]) = this(list.asScala.toSeq)
+  def this(list: java.util.List[Any]) = this(list.asScala.toArray)
 
   // TODO: This is boxing.  We should specialize.
-  def this(primitiveArray: Array[Int]) = this(primitiveArray.toSeq)
-  def this(primitiveArray: Array[Long]) = this(primitiveArray.toSeq)
-  def this(primitiveArray: Array[Float]) = this(primitiveArray.toSeq)
-  def this(primitiveArray: Array[Double]) = this(primitiveArray.toSeq)
-  def this(primitiveArray: Array[Short]) = this(primitiveArray.toSeq)
-  def this(primitiveArray: Array[Byte]) = this(primitiveArray.toSeq)
-  def this(primitiveArray: Array[Boolean]) = this(primitiveArray.toSeq)
+  def this(primitiveArray: Array[Int]) = this(primitiveArray.toArray[Any])
+  def this(primitiveArray: Array[Long]) = this(primitiveArray.toArray[Any])
+  def this(primitiveArray: Array[Float]) = this(primitiveArray.toArray[Any])
+  def this(primitiveArray: Array[Double]) = this(primitiveArray.toArray[Any])
+  def this(primitiveArray: Array[Short]) = this(primitiveArray.toArray[Any])
+  def this(primitiveArray: Array[Byte]) = this(primitiveArray.toArray[Any])
+  def this(primitiveArray: Array[Boolean]) = this(primitiveArray.toArray[Any])
 
   def this(seqOrArray: Any) = this(seqOrArray match {
     // Specified this as`scala.collection.Seq` because seqOrArray can be
     // `mutable.ArraySeq` in Scala 2.13
     case seq: scala.collection.Seq[Any] => seq.toArray
     case array: Array[Any] => array  // array of objects, so no need to convert
-    case array: Array[_] => array.toSeq.toArray[Any] // array of primitives, 
so box them
+    case array: Array[_] => array.toArray[Any] // array of primitives, so box 
them
   })
 
   override def copy(): ArrayData = {


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39798][SQL] Replcace `toSeq.toArray` with `.toArray[Any]` in constructor of `GenericArrayData`

Reply via email to