Re: Improvements to storage plugin planning integration support

Maryann Xue Thu, 22 Oct 2015 11:32:44 -0700

Thank you JinFeng for the education on Drill planning! That probably
justifies putting secondary index into physical planning.
What I was trying to say was that secondary index is not "a faster physical
access mechanism", it is just a Phoenix table. And it makes big difference
in planning related to Sort, Join and Aggregate as you said. In the pure
Calcite world, this is more of a Logical thing.



Thanks,
Maryann

On Thu, Oct 22, 2015 at 2:26 PM, Jinfeng Ni <[email protected]> wrote:

> I do not know how Phoenix's planning works. For Drill, my
> understanding is during logical planning, "collation" trait is only
> used in SortRemoveRule, to remove the redundant sort operator. (Those
> "sort" operators are the one created by Calcite for user-explicit
> "ORDER BY" / "LIMIT", not the "enforcer" created in physical
> planning).
>
> The "collation" trait would not have impact in logical planning for
> join / aggregation.   The decision between sort-based vs hash-based
> join / aggregation is made in physical planning. At that stage, the
> "collation" would matter a lot, as it would mean whether Drill has to
> add an "enforcer" to get certain trait, in order to get a plan with
> sort-based join / aggregation.
>
> The "collation" trait acts like a physical property, it's more nature
> to expose "collation" in physical planning in stead of logical
> planning, which more focus on properties inherent in relational
> expression. Aman's view that secondary index is part of physical
> planning makes sense to me.
>
> On Thu, Oct 22, 2015 at 10:54 AM, Maryann Xue <[email protected]>
> wrote:
> > Hi Aman Sinha,
> >
> > Yes, Phoenix uses materialization in Calcite to model its secondary index
> > querying. But it's not right to say "In that sense, it would seem to fit
> > into physical planning phase rather than logical, since indexes are a
> > faster physical access mechanism for a scan.  The logical properties of a
> > table don't change due to presence of an index."
> >
> > A secondary index in Phoenix is a projection of part or all of the
> columns
> > of the original table, and is usually indexed (and sorted) on a different
> > key other than the primary key of the original table. The key in Phoenix
> > table (HBase table) is crucial in two ways:
> > 1. Filtering: the use of skip-scan or range-scan vs. full scan.
> > 2. Ordering
> >
> > The second aspect is represented in Calcite by "collation" trait, which
> can
> > make a radical difference in logical planning. Replacing the original
> table
> > with one of its indices might end up changing the whole plan completely.
> >
> > I am not sure yet which stage the Phoenix materialization will eventually
> > go, but one certain thing is that it should be available for all the
> > general optimizations to take effect.
> >
> >
> > Thanks,
> > Maryann
> >
> > On Wed, Oct 14, 2015 at 12:55 PM, Aman Sinha <[email protected]>
> wrote:
> >
> >> Catching up on this thread.  Jacques, if I understand correctly,  you
> are
> >> proposing that instead of the single point of initialization of rules
> when
> >> we instantiate FrameworkConfig (in DrillSqlWorker), we would have more
> >> entry points to plug into different phases of planning and storage
> plugins
> >> would register different sets of rules in these separate phases.   It
> seems
> >> fine to me (assuming that there are no side effects where we somehow
> end up
> >> increasing the search space for the existing plans).
> >>
> >> When talking about the Phoenix integration or the JDBC storage plugin, I
> >> am curious about which phase(s) would they register the rules for ?  I
> >> believe Phoenix's materialized view usage in Calcite is actually for
> >> secondary indexing, not for materialized views per se.  In that sense,
> it
> >> would seem to fit into physical planning phase rather than logical,
> since
> >> indexes are a faster physical access mechanism for a scan.  The logical
> >> properties of a table don't change due to presence of an index.
> >>
> >> On the other hand, I think the JDBC plugin might register rules for
> >> logical phase since  it would have filter and projection pushdowns that
> do
> >> change logical properties.
> >>
> >> Aman
> >>
> >>
> >> On Mon, Oct 12, 2015 at 5:36 PM, Hanifi Gunes <[email protected]>
> wrote:
> >>
> >>> I would +1 (1-3) for sure. I do not have much understanding of programs
> >>> however additional flexibility for storage plugin devs sounds cool in
> >>> general when used responsibly =) so +0 for (4)
> >>>
> >>>
> >>> -H+
> >>>
> >>> On Mon, Oct 12, 2015 at 4:12 PM, Jacques Nadeau <[email protected]>
> >>> wrote:
> >>>
> >>> > The dead air must mean that everyone is onboard with my
> recommendation
> >>> >
> >>> > PlannerIntegration StoragePlugin.getPlannerIntegrations()
> >>> >
> >>> > interface PlannerIntegration{
> >>> >   void initialize(Planner, Phase)
> >>> > }
> >>> >
> >>> > Right :D
> >>> >
> >>> > --
> >>> > Jacques Nadeau
> >>> > CTO and Co-Founder, Dremio
> >>> >
> >>> > On Fri, Oct 9, 2015 at 7:03 AM, Jacques Nadeau <[email protected]>
> >>> wrote:
> >>> >
> >>> > > A number of us were meeting last week to work through integrating
> the
> >>> > > Phoenix storage plugin. This plugin is interesting because it also
> >>> uses
> >>> > > Calcite for planning. In some ways, this should make integration
> easy.
> >>> > > However, it also allowed us to see certain constraints who how we
> >>> expose
> >>> > > planner integration between storage plugins and Drill internals.
> >>> > > Currently, Drill asks the plugin to provide a set of optimizer
> rules
> >>> > which
> >>> > > it incorporates into one of the many stages of planning. This is
> too
> >>> > > constraining in two ways:
> >>> > >
> >>> > > 1. it doesn't allow a plugin to decide which phase of planning to
> >>> > > integrate with. (This was definitely a problem in the Phoenix case.
> >>> Our
> >>> > > hack solution for now is to incorporate storage plugin rules in
> phases
> >>> > > instead of just one [1].)
> >>> > > 2. it doesn't allow arbitrary transformations. Calcite provides a
> >>> program
> >>> > > concept. It may be that a plugin needs to do some of its own work
> >>> using
> >>> > the
> >>> > > Hep planner. Currently there isn't an elegant way to do this in the
> >>> > context
> >>> > > of the rule.
> >>> > > 3. There is no easy way to incorporate additional planner
> >>> initialization
> >>> > > options. This was almost a problem in the case of the JDBC plugin.
> It
> >>> > > turned out that a hidden integration using register() here [2]
> >>> allowed us
> >>> > > to continue throughout the planning phases. However, we have to
> >>> register
> >>> > > all the rules for all the phases of planning which is a bit
> unclean.
> >>> > We're
> >>> > > hitting the same problem in the case of Phoenix where we need to
> >>> register
> >>> > > materialized views as part of planner initialization but the hack
> from
> >>> > the
> >>> > > JDBC case won't really work.
> >>> > >
> >>> > > I suggest we update the interface to allow better support for these
> >>> types
> >>> > > of integrations.
> >>> > >
> >>> > > These seem to be the main requirements:
> >>> > > 1. Expose concrete planning phases to storage plugins
> >>> > > 2. Allow a storage plugin to provide additional planner
> initialization
> >>> > > behavior
> >>> > > 3. Allow a storage plugin to provide rules to include a particular
> >>> > > planning phase (merged with other rules during that phase).
> >>> > > 4. (possibly) allow a storage plugin to provide transformation
> >>> programs
> >>> > > that are to be executed in between the concrete planning phases.
> >>> > >
> >>> > > Item (4) above is the most questionable to me as I wonder whether
> or
> >>> not
> >>> > > this could simply be solved by creating a transformation rule (or
> >>> program
> >>> > > rule in Calcite's terminology) that creates an alternative tree and
> >>> thus
> >>> > be
> >>> > > solved by (3).
> >>> > >
> >>> > > A simple solution might be (if we ignore #4):
> >>> > >
> >>> > > PlannerIntegration StoragePlugin.getPlannerIntegrations()
> >>> > >
> >>> > > interface PlannerIntegration{
> >>> > >   void initialize(Planner, Phase)
> >>> > > }
> >>> > >
> >>> > > This way, a storage plugin could register rules (or materialized
> >>> views)
> >>> > at
> >>> > > setup time.
> >>> > >
> >>> > > What do others think?
> >>> > >
> >>> > > [1]
> >>> > >
> >>> >
> >>>
> https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcStoragePlugin.java#L145
> >>> > > [2]
> >>> > >
> >>> >
> >>>
> https://github.com/jacques-n/drill/commit/d463f9098ef63b9a2844206950334cb16fc00327#diff-e67ba82ec2fbb8bc15eed30ec6a5379cR119
> >>> > >
> >>> > > --
> >>> > > Jacques Nadeau
> >>> > > CTO and Co-Founder, Dremio
> >>> > >
> >>> >
> >>>
> >>
> >>
>

Re: Improvements to storage plugin planning integration support

Reply via email to