Stephan Kessler created SPARK-12449:
---------------------------------------
Summary: Pushing down arbitrary logical plans to data sources
Key: SPARK-12449
URL: https://issues.apache.org/jira/browse/SPARK-12449
Project: Spark
Issue Type: Improvement
Components: SQL
Reporter: Stephan Kessler
With the help of the DataSource API we can pull data from external sources for
processing. Implementing interfaces such as {{PrunedFilteredScan}} allows to
push down filters and projects pruning unnecessary fields and rows directly in
the data source.
However, data sources such as SQL Engines are capable of doing even more
preprocessing, e.g., evaluating aggregates. This is beneficial because it would
reduce the amount of data transferred from the source to Spark. The existing
interfaces do not allow such kind of processing in the source.
We would propose to add a new interface ({{CatalystSource}} that allows to
defer the processing of arbitrary logical plans to the data source. We have
already shown the details at the Spark Summit 2015 Europe
[https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/]
I will add a design document explaining details.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]