Takuya Ueshin created SPARK-21781:
-------------------------------------

             Summary: Modify DataSourceScanExec to use concrete ColumnVector 
type.
                 Key: SPARK-21781
                 URL: https://issues.apache.org/jira/browse/SPARK-21781
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Takuya Ueshin


As mentioned at 
https://github.com/apache/spark/pull/18680#issuecomment-316820409, when we have 
more {{ColumnVector}} implementations, it might (or might not) have huge 
performance implications because it might disable inlining, or force virtual 
dispatches.

As for read path, one of the major paths is the one generated by 
{{ColumnBatchScan}}. Currently it refers {{ColumnVector}} so the penalty will 
be bigger as we have more classes, but we can know the concrete type from its 
usage, e.g. vectorized Parquet reader uses {{OnHeapColumnVector}}. We can use 
the concrete type in the generated code directly to avoid the penalty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to