Takuya Ueshin created SPARK-21781:
-------------------------------------
Summary: Modify DataSourceScanExec to use concrete ColumnVector
type.
Key: SPARK-21781
URL: https://issues.apache.org/jira/browse/SPARK-21781
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.3.0
Reporter: Takuya Ueshin
As mentioned at
https://github.com/apache/spark/pull/18680#issuecomment-316820409, when we have
more {{ColumnVector}} implementations, it might (or might not) have huge
performance implications because it might disable inlining, or force virtual
dispatches.
As for read path, one of the major paths is the one generated by
{{ColumnBatchScan}}. Currently it refers {{ColumnVector}} so the penalty will
be bigger as we have more classes, but we can know the concrete type from its
usage, e.g. vectorized Parquet reader uses {{OnHeapColumnVector}}. We can use
the concrete type in the generated code directly to avoid the penalty.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]