[
https://issues.apache.org/jira/browse/DRILL-7558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Abhishek Girish updated DRILL-7558:
-----------------------------------
Target Version/s: 1.19.0
> Generalize filter push-down planner phase
> -----------------------------------------
>
> Key: DRILL-7558
> URL: https://issues.apache.org/jira/browse/DRILL-7558
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.18.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Major
> Fix For: 1.18.0
>
>
> DRILL-7458 provides a base framework for storage plugins, including a
> simplified filter push-down mechanism. [~volodymyr] notes that it may be
> *too* simple:
> {quote}
> What about the case when this rule was applied for one filter, but planner at
> some point pushed another filter above the scan, for example, if we have such
> case:
> {code}
> Filter(a=2)
> Join(t1.b=t2.b, type=inner)
> Filter(b=3)
> Scan(t1)
> Scan(t2)
> {code}
> Filter b=3 will be pushed into scan, planner will push filter above join:
> {code}
> Join(t1.b=t2.b, type=inner)
> Filter(a=2)
> Scan(t1, b=3)
> Scan(t2)
> {code}
> In this case, check whether filter was pushed is not enough.
> {quote}
> Drill divides planning into a number of *phases*, each defined by a set of
> *rules*. Most storage plugins perform filter push-down during the physical
> planning stage. However, by this point, Drill has already decided on the
> degree of parallelism: it is too late to use filter push-down to set the
> degree of parallelism. Yet, if using something like a REST API, we want to
> use filters to help us shard the query (that is, to set the degree of
> parallelism.)
>
> DRILL-7458 performs filter push-down at *logical* planning time to work
> around the above limitation. (In Drill, there are three different phases that
> could be considered the logical phase, depending on which planning options
> are set to control Calcite.)
> [~volodymyr] points out that the the logical plan phase may be wrong because
> it will perform rewrites of the type he cited.
> Thus, we need to research where to insert filter push down. It must come:
> * After rewrites of the kind described above.
> * After join equivalence computations. (See DRILL-7556.)
> * Before the decision is made about the number of minor fragments.
> The goal of this ticket is to either:
> * Research to identify an existing phase which satisfies these requirements,
> or
> * Create a new phase.
> Due to the way Calcite works, it is not a good idea to have a single phase
> handle two tasks that depend on one another. That is, we cannot combine
> filter push down in a phase which defines the filters, nor can we add filter
> push-down in a phase that choose parallelism.
> Background: Calcite is a rule-based query planner inspired by
> [Volcano|https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf].
> The above issue is a flaw with rule-based planners and was identified as
> early as the [Cascades query framework
> paper|https://www.csd.uoc.gr/~hy460/pdf/CascadesFrameworkForQueryOptimization.pdf]
> which was the follow-up to Volcano.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)