[
https://issues.apache.org/jira/browse/SPARK-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070027#comment-15070027
]
Reynold Xin commented on SPARK-12449:
-------------------------------------
This is great, but our internal logical/physical plan structure changes all the
time, and as a result I don't think we can provide a stable interface based on
that. The cost to stabilize those interfaces is way too high. We need to
flexibility in order to improve Spark.
> Pushing down arbitrary logical plans to data sources
> ----------------------------------------------------
>
> Key: SPARK-12449
> URL: https://issues.apache.org/jira/browse/SPARK-12449
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Stephan Kessler
> Attachments: pushingDownLogicalPlans.pdf
>
>
> With the help of the DataSource API we can pull data from external sources
> for processing. Implementing interfaces such as {{PrunedFilteredScan}} allows
> to push down filters and projects pruning unnecessary fields and rows
> directly in the data source.
> However, data sources such as SQL Engines are capable of doing even more
> preprocessing, e.g., evaluating aggregates. This is beneficial because it
> would reduce the amount of data transferred from the source to Spark. The
> existing interfaces do not allow such kind of processing in the source.
> We would propose to add a new interface {{CatalystSource}} that allows to
> defer the processing of arbitrary logical plans to the data source. We have
> already shown the details at the Spark Summit 2015 Europe
> [https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/]
> I will add a design document explaining details.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]