Frank Wong created SPARK-49408:
----------------------------------

             Summary: Low performance in ProjectingInternalRow
                 Key: SPARK-49408
                 URL: https://issues.apache.org/jira/browse/SPARK-49408
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.5.2
            Reporter: Frank Wong


In {*}ProjectingInternalRow{*}, the *colOrdinals* is passed as a {_}List{_}. 
According to the Scala documentation, the _{{apply}}_ method for _{{List}}_ has 
a linear time complexity, and it is used in all methods of 
ProjectingInternalRow for every row. This can have a significant impact on 
performance.

The following flame graph was captured in a {*}merge into sql{*}. A 
considerable amount of time was spent on {{{}List.apply{}}}. Changing this to 
_{{IndexedSeq}}_ would improve the performance.

!image-2024-08-27-19-25-50-462.png|width=549,height=86!

 

https://docs.scala-lang.org/overviews/collections-2.13/performance-characteristics.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to