[
https://issues.apache.org/jira/browse/SPARK-21781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-21781:
------------------------------------
Assignee: Apache Spark
> Modify DataSourceScanExec to use concrete ColumnVector type.
> ------------------------------------------------------------
>
> Key: SPARK-21781
> URL: https://issues.apache.org/jira/browse/SPARK-21781
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.3.0
> Reporter: Takuya Ueshin
> Assignee: Apache Spark
>
> As mentioned at
> https://github.com/apache/spark/pull/18680#issuecomment-316820409, when we
> have more {{ColumnVector}} implementations, it might (or might not) have huge
> performance implications because it might disable inlining, or force virtual
> dispatches.
> As for read path, one of the major paths is the one generated by
> {{ColumnBatchScan}}. Currently it refers {{ColumnVector}} so the penalty will
> be bigger as we have more classes, but we can know the concrete type from its
> usage, e.g. vectorized Parquet reader uses {{OnHeapColumnVector}}. We can use
> the concrete type in the generated code directly to avoid the penalty.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]