[jira] [Commented] (SPARK-20960) make ColumnVector public

Wenchen Fan (JIRA) Thu, 01 Jun 2017 21:38:54 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-20960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034118#comment-16034118
 ]


Wenchen Fan commented on SPARK-20960:
-------------------------------------

cc [~wesmckinn]

> make ColumnVector public
> ------------------------
>
>                 Key: SPARK-20960
>                 URL: https://issues.apache.org/jira/browse/SPARK-20960
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Wenchen Fan
>            Assignee: Wenchen Fan
>
> ColumnVector is an internal interface in Spark SQL, which is only used for 
> vectorized parquet reader to represent the in-memory columnar format.
> In Spark 2.3 we want to make ColumnVector public, so that we can provide a 
> more efficient way for data exchanges between Spark and external systems. For 
> example, we can use ColumnVector to build the columnar read API in data 
> source framework, we can use ColumnVector to build a more efficient UDF API, 
> etc.
> We also want to introduce a new ColumnVector implementation based on Apache 
> Arrow(basically just a wrapper over Arrow), so that external systems(like 
> Python Pandas DataFrame) can build ColumnVector very easily.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-20960) make ColumnVector public

Reply via email to