[jira] [Commented] (SPARK-14820) Reduce shuffle data by pushing filter toward storage

Takeshi Yamamuro (JIRA) Fri, 22 Apr 2016 02:25:34 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253608#comment-15253608
 ]


Takeshi Yamamuro commented on SPARK-14820:
------------------------------------------

Seems `Optimizer#PushPredicateThroughJoin` handles this kind of push-down 
optimization.
Why cannot the current impl. apply filter push-downs into the query described 
in your pdf?

> Reduce shuffle data by pushing filter toward storage
> ----------------------------------------------------
>
>                 Key: SPARK-14820
>                 URL: https://issues.apache.org/jira/browse/SPARK-14820
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Ali Tootoonchian
>            Priority: Trivial
>         Attachments: Reduce Shuffle Data by pushing filter toward storage.pdf
>
>
> SQL query planner can have intelligence to push down filter commands towards 
> the storage layer. If we optimize the query planner such that the IO to the 
> storage is reduced at the cost of running multiple filters (i.e., compute), 
> this should be desirable when the system is IO bound.
> Proven analysis and example is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-14820) Reduce shuffle data by pushing filter toward storage

Reply via email to