[
https://issues.apache.org/jira/browse/SPARK-15689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reynold Xin updated SPARK-15689:
--------------------------------
Labels: releasenotes (was: )
> Data source API v2
> ------------------
>
> Key: SPARK-15689
> URL: https://issues.apache.org/jira/browse/SPARK-15689
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Reporter: Reynold Xin
> Labels: releasenotes
>
> This ticket tracks progress in creating the v2 of data source API. This new
> API should focus on:
> 1. Have a small surface so it is easy to freeze and maintain compatibility
> for a long time. Ideally, this API should survive architectural rewrites and
> user-facing API revamps of Spark.
> 2. Have a well-defined column batch interface for high performance.
> Convenience methods should exist to convert row-oriented formats into column
> batches for data source developers.
> 3. Still support filter push down, similar to the existing API.
> 4. Nice-to-have: support additional common operators, including limit and
> sampling.
> Note that both 1 and 2 are problems that the current data source API (v1)
> suffers. The current data source API has a wide surface with dependency on
> DataFrame/SQLContext, making the data source API compatibility depending on
> the upper level API. The current data source API is also only row oriented
> and has to go through an expensive external data type conversion to internal
> data type.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]