[
https://issues.apache.org/jira/browse/SPARK-20960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034118#comment-16034118
]
Wenchen Fan commented on SPARK-20960:
-------------------------------------
cc [~wesmckinn]
> make ColumnVector public
> ------------------------
>
> Key: SPARK-20960
> URL: https://issues.apache.org/jira/browse/SPARK-20960
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 2.3.0
> Reporter: Wenchen Fan
> Assignee: Wenchen Fan
>
> ColumnVector is an internal interface in Spark SQL, which is only used for
> vectorized parquet reader to represent the in-memory columnar format.
> In Spark 2.3 we want to make ColumnVector public, so that we can provide a
> more efficient way for data exchanges between Spark and external systems. For
> example, we can use ColumnVector to build the columnar read API in data
> source framework, we can use ColumnVector to build a more efficient UDF API,
> etc.
> We also want to introduce a new ColumnVector implementation based on Apache
> Arrow(basically just a wrapper over Arrow), so that external systems(like
> Python Pandas DataFrame) can build ColumnVector very easily.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]