[ 
https://issues.apache.org/jira/browse/SPARK-49178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-49178:
-----------------------------------
    Labels: pull-request-available  (was: )

> `Row#getSeq` exhibits a performance regression between master and 3.5.
> ----------------------------------------------------------------------
>
>                 Key: SPARK-49178
>                 URL: https://issues.apache.org/jira/browse/SPARK-49178
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Yang Jie
>            Priority: Major
>              Labels: pull-request-available
>
> {code:java}
> object GetSeqBenchmark extends SqlBasedBenchmark {
>   import spark.implicits._
>   def testRowGetSeq(valuesPerIteration: Int, arraySize: Int): Unit = {
>     val data = (0 until arraySize).toArray
>     val row = Seq(data).toDF().collect().head
>     val benchmark = new Benchmark(
>       s"Test get seq with $arraySize from row",
>       valuesPerIteration,
>       output = output)
>     benchmark.addCase("Get Seq") { _: Int =>
>       for (_ <- 0L until valuesPerIteration) {
>         val ret = row.getSeq(0)
>       }
>     }
>     benchmark.run()
>   }
>   override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
>     val valuesPerIteration = 100000
>     testRowGetSeq(valuesPerIteration, 10)
>     testRowGetSeq(valuesPerIteration, 100)
>     testRowGetSeq(valuesPerIteration, 1000)
>     testRowGetSeq(valuesPerIteration, 10000)
>     testRowGetSeq(valuesPerIteration, 100000)
>   }
> } {code}
>  
> branch-3.5
> {code:java}
> OpenJDK 64-Bit Server VM 1.8.0_422-b05 on Linux 5.15.0-1068-azure
> AMD EPYC 7763 64-Core Processor
> Test get seq with 10 from row:            Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> Get Seq                                               1              1        
>    0        194.8           5.1       1.0XOpenJDK 64-Bit Server VM 
> 1.8.0_422-b05 on Linux 5.15.0-1068-azure
> AMD EPYC 7763 64-Core Processor
> Test get seq with 100 from row:           Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> Get Seq                                               1              1        
>    0         96.8          10.3       1.0XOpenJDK 64-Bit Server VM 
> 1.8.0_422-b05 on Linux 5.15.0-1068-azure
> AMD EPYC 7763 64-Core Processor
> Test get seq with 1000 from row:          Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> Get Seq                                               1              1        
>    0         97.0          10.3       1.0XOpenJDK 64-Bit Server VM 
> 1.8.0_422-b05 on Linux 5.15.0-1068-azure
> AMD EPYC 7763 64-Core Processor
> Test get seq with 10000 from row:         Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> Get Seq                                               1              1        
>    0         96.8          10.3       1.0XOpenJDK 64-Bit Server VM 
> 1.8.0_422-b05 on Linux 5.15.0-1068-azure
> AMD EPYC 7763 64-Core Processor
> Test get seq with 100000 from row:        Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> Get Seq                                               1              1        
>    0         96.9          10.3       1.0X {code}
> master
> {code:java}
> OpenJDK 64-Bit Server VM 17.0.12+7-LTS on Linux 6.5.0-1025-azure
> AMD EPYC 7763 64-Core Processor
> Test get seq with 10 from row:            Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> Get Seq                                               9             10        
>    0         10.5          94.8       1.0XOpenJDK 64-Bit Server VM 
> 17.0.12+7-LTS on Linux 6.5.0-1025-azure
> AMD EPYC 7763 64-Core Processor
> Test get seq with 100 from row:           Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> Get Seq                                              65             65        
>    1          1.5         646.4       1.0XOpenJDK 64-Bit Server VM 
> 17.0.12+7-LTS on Linux 6.5.0-1025-azure
> AMD EPYC 7763 64-Core Processor
> Test get seq with 1000 from row:          Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> Get Seq                                             614            615        
>    1          0.2        6140.2       1.0XOpenJDK 64-Bit Server VM 
> 17.0.12+7-LTS on Linux 6.5.0-1025-azure
> AMD EPYC 7763 64-Core Processor
> Test get seq with 10000 from row:         Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> Get Seq                                            6122           6128        
>    8          0.0       61223.1       1.0XOpenJDK 64-Bit Server VM 
> 17.0.12+7-LTS on Linux 6.5.0-1025-azure
> AMD EPYC 7763 64-Core Processor
> Test get seq with 100000 from row:        Best Time(ms)   Avg Time(ms)   
> Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> Get Seq                                           61247          61268        
>   30          0.0      612468.1       1.0X {code}
> We can observe that in branch-3.5, the latency of `Row#getSeq` is constant, 
> whereas in master, the latency of `Row#getSeq` exhibits a linearly increasing 
> trend with the length of the array.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to