[ 
https://issues.apache.org/jira/browse/DRILL-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-7558:
-----------------------------------
    Target Version/s: 1.19.0

> Generalize filter push-down planner phase
> -----------------------------------------
>
>                 Key: DRILL-7558
>                 URL: https://issues.apache.org/jira/browse/DRILL-7558
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.18.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>             Fix For: 1.18.0
>
>
> DRILL-7458 provides a base framework for storage plugins, including a 
> simplified filter push-down mechanism. [~volodymyr] notes that it may be 
> *too* simple:
> {quote}
> What about the case when this rule was applied for one filter, but planner at 
> some point pushed another filter above the scan, for example, if we have such 
> case:
> {code}
> Filter(a=2)
>   Join(t1.b=t2.b, type=inner)
>     Filter(b=3)
>     Scan(t1)
>     Scan(t2)
> {code}
> Filter b=3 will be pushed into scan, planner will push filter above join:
> {code}
> Join(t1.b=t2.b, type=inner)
>     Filter(a=2)
>     Scan(t1, b=3)
>     Scan(t2)
> {code}
> In this case, check whether filter was pushed is not enough.
> {quote}
> Drill divides planning into a number of *phases*, each defined by a set of 
> *rules*. Most storage plugins perform filter push-down during the physical 
> planning stage. However, by this point, Drill has already decided on the 
> degree of parallelism: it is too late to use filter push-down to set the 
> degree of parallelism. Yet, if using something like a REST API, we want to 
> use filters to help us shard the query (that is, to set the degree of 
> parallelism.)
>  
> DRILL-7458 performs filter push-down at *logical* planning time to work 
> around the above limitation. (In Drill, there are three different phases that 
> could be considered the logical phase, depending on which planning options 
> are set to control Calcite.)
> [~volodymyr] points out that the the logical plan phase may be wrong because 
> it will perform rewrites of the type he cited.
> Thus, we need to research where to insert filter push down. It must come:
> * After rewrites of the kind described above.
> * After join equivalence computations. (See DRILL-7556.)
> * Before the decision is made about the number of minor fragments.
> The goal of this ticket is to either:
> * Research to identify an existing phase which satisfies these requirements, 
> or
> * Create a new phase.
> Due to the way Calcite works, it is not a good idea to have a single phase 
> handle two tasks that depend on one another. That is, we cannot combine 
> filter push down in a phase which defines the filters, nor can we add filter 
> push-down in a phase that choose parallelism.
> Background: Calcite is a rule-based query planner inspired by 
> [Volcano|https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf].
> The above issue is a flaw with rule-based planners and was identified as 
> early as the [Cascades query framework 
> paper|https://www.csd.uoc.gr/~hy460/pdf/CascadesFrameworkForQueryOptimization.pdf]
>  which was the follow-up to Volcano.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to