Re: Improvements to storage plugin planning integration support

Aman Sinha Thu, 22 Oct 2015 13:30:25 -0700

Thanks Maryann.

On Thu, Oct 22, 2015 at 1:04 PM, Maryann Xue <[email protected]> wrote:


> Hi Aman,
>
> Thought these entries might be related to our discussion about
> non-covering indices just now:
>
> https://issues.apache.org/jira/browse/CALCITE-772
> https://issues.apache.org/jira/browse/CALCITE-773
>
>
> Thanks,
> Maryann
>
> On Thu, Oct 22, 2015 at 2:48 PM, Aman Sinha <[email protected]> wrote:
>
>> Thanks Maryann and Jinfeng for your comments.   I understand the Phoenix
>> approach better now that Maryann clarified that the index is actually a
>> projection of some or all columns (non primary key columns) of the table.
>> In the relational world, this is similar to what systems such as Vertica
>> have done.
>>
>> Aman
>>
>> On Thu, Oct 22, 2015 at 11:32 AM, Maryann Xue <[email protected]>
>> wrote:
>>
>>> Thank you JinFeng for the education on Drill planning! That probably
>>> justifies putting secondary index into physical planning.
>>> What I was trying to say was that secondary index is not "a faster
>>> physical access mechanism", it is just a Phoenix table. And it makes big
>>> difference in planning related to Sort, Join and Aggregate as you said. In
>>> the pure Calcite world, this is more of a Logical thing.
>>>
>>>
>>> Thanks,
>>> Maryann
>>>
>>> On Thu, Oct 22, 2015 at 2:26 PM, Jinfeng Ni <[email protected]>
>>> wrote:
>>>
>>>> I do not know how Phoenix's planning works. For Drill, my
>>>> understanding is during logical planning, "collation" trait is only
>>>> used in SortRemoveRule, to remove the redundant sort operator. (Those
>>>> "sort" operators are the one created by Calcite for user-explicit
>>>> "ORDER BY" / "LIMIT", not the "enforcer" created in physical
>>>> planning).
>>>>
>>>> The "collation" trait would not have impact in logical planning for
>>>> join / aggregation.   The decision between sort-based vs hash-based
>>>> join / aggregation is made in physical planning. At that stage, the
>>>> "collation" would matter a lot, as it would mean whether Drill has to
>>>> add an "enforcer" to get certain trait, in order to get a plan with
>>>> sort-based join / aggregation.
>>>>
>>>> The "collation" trait acts like a physical property, it's more nature
>>>> to expose "collation" in physical planning in stead of logical
>>>> planning, which more focus on properties inherent in relational
>>>> expression. Aman's view that secondary index is part of physical
>>>> planning makes sense to me.
>>>>
>>>> On Thu, Oct 22, 2015 at 10:54 AM, Maryann Xue <[email protected]>
>>>> wrote:
>>>> > Hi Aman Sinha,
>>>> >
>>>> > Yes, Phoenix uses materialization in Calcite to model its secondary
>>>> index
>>>> > querying. But it's not right to say "In that sense, it would seem to
>>>> fit
>>>> > into physical planning phase rather than logical, since indexes are a
>>>> > faster physical access mechanism for a scan.  The logical properties
>>>> of a
>>>> > table don't change due to presence of an index."
>>>> >
>>>> > A secondary index in Phoenix is a projection of part or all of the
>>>> columns
>>>> > of the original table, and is usually indexed (and sorted) on a
>>>> different
>>>> > key other than the primary key of the original table. The key in
>>>> Phoenix
>>>> > table (HBase table) is crucial in two ways:
>>>> > 1. Filtering: the use of skip-scan or range-scan vs. full scan.
>>>> > 2. Ordering
>>>> >
>>>> > The second aspect is represented in Calcite by "collation" trait,
>>>> which can
>>>> > make a radical difference in logical planning. Replacing the original
>>>> table
>>>> > with one of its indices might end up changing the whole plan
>>>> completely.
>>>> >
>>>> > I am not sure yet which stage the Phoenix materialization will
>>>> eventually
>>>> > go, but one certain thing is that it should be available for all the
>>>> > general optimizations to take effect.
>>>> >
>>>> >
>>>> > Thanks,
>>>> > Maryann
>>>> >
>>>> > On Wed, Oct 14, 2015 at 12:55 PM, Aman Sinha <[email protected]>
>>>> wrote:
>>>> >
>>>> >> Catching up on this thread.  Jacques, if I understand correctly,
>>>> you are
>>>> >> proposing that instead of the single point of initialization of
>>>> rules when
>>>> >> we instantiate FrameworkConfig (in DrillSqlWorker), we would have
>>>> more
>>>> >> entry points to plug into different phases of planning and storage
>>>> plugins
>>>> >> would register different sets of rules in these separate phases.
>>>>  It seems
>>>> >> fine to me (assuming that there are no side effects where we somehow
>>>> end up
>>>> >> increasing the search space for the existing plans).
>>>> >>
>>>> >> When talking about the Phoenix integration or the JDBC storage
>>>> plugin, I
>>>> >> am curious about which phase(s) would they register the rules for ?
>>>> I
>>>> >> believe Phoenix's materialized view usage in Calcite is actually for
>>>> >> secondary indexing, not for materialized views per se.  In that
>>>> sense, it
>>>> >> would seem to fit into physical planning phase rather than logical,
>>>> since
>>>> >> indexes are a faster physical access mechanism for a scan.  The
>>>> logical
>>>> >> properties of a table don't change due to presence of an index.
>>>> >>
>>>> >> On the other hand, I think the JDBC plugin might register rules for
>>>> >> logical phase since  it would have filter and projection pushdowns
>>>> that do
>>>> >> change logical properties.
>>>> >>
>>>> >> Aman
>>>> >>
>>>> >>
>>>> >> On Mon, Oct 12, 2015 at 5:36 PM, Hanifi Gunes <[email protected]>
>>>> wrote:
>>>> >>
>>>> >>> I would +1 (1-3) for sure. I do not have much understanding of
>>>> programs
>>>> >>> however additional flexibility for storage plugin devs sounds cool
>>>> in
>>>> >>> general when used responsibly =) so +0 for (4)
>>>> >>>
>>>> >>>
>>>> >>> -H+
>>>> >>>
>>>> >>> On Mon, Oct 12, 2015 at 4:12 PM, Jacques Nadeau <[email protected]
>>>> >
>>>> >>> wrote:
>>>> >>>
>>>> >>> > The dead air must mean that everyone is onboard with my
>>>> recommendation
>>>> >>> >
>>>> >>> > PlannerIntegration StoragePlugin.getPlannerIntegrations()
>>>> >>> >
>>>> >>> > interface PlannerIntegration{
>>>> >>> >   void initialize(Planner, Phase)
>>>> >>> > }
>>>> >>> >
>>>> >>> > Right :D
>>>> >>> >
>>>> >>> > --
>>>> >>> > Jacques Nadeau
>>>> >>> > CTO and Co-Founder, Dremio
>>>> >>> >
>>>> >>> > On Fri, Oct 9, 2015 at 7:03 AM, Jacques Nadeau <
>>>> [email protected]>
>>>> >>> wrote:
>>>> >>> >
>>>> >>> > > A number of us were meeting last week to work through
>>>> integrating the
>>>> >>> > > Phoenix storage plugin. This plugin is interesting because it
>>>> also
>>>> >>> uses
>>>> >>> > > Calcite for planning. In some ways, this should make
>>>> integration easy.
>>>> >>> > > However, it also allowed us to see certain constraints who how
>>>> we
>>>> >>> expose
>>>> >>> > > planner integration between storage plugins and Drill internals.
>>>> >>> > > Currently, Drill asks the plugin to provide a set of optimizer
>>>> rules
>>>> >>> > which
>>>> >>> > > it incorporates into one of the many stages of planning. This
>>>> is too
>>>> >>> > > constraining in two ways:
>>>> >>> > >
>>>> >>> > > 1. it doesn't allow a plugin to decide which phase of planning
>>>> to
>>>> >>> > > integrate with. (This was definitely a problem in the Phoenix
>>>> case.
>>>> >>> Our
>>>> >>> > > hack solution for now is to incorporate storage plugin rules in
>>>> phases
>>>> >>> > > instead of just one [1].)
>>>> >>> > > 2. it doesn't allow arbitrary transformations. Calcite provides
>>>> a
>>>> >>> program
>>>> >>> > > concept. It may be that a plugin needs to do some of its own
>>>> work
>>>> >>> using
>>>> >>> > the
>>>> >>> > > Hep planner. Currently there isn't an elegant way to do this in
>>>> the
>>>> >>> > context
>>>> >>> > > of the rule.
>>>> >>> > > 3. There is no easy way to incorporate additional planner
>>>> >>> initialization
>>>> >>> > > options. This was almost a problem in the case of the JDBC
>>>> plugin. It
>>>> >>> > > turned out that a hidden integration using register() here [2]
>>>> >>> allowed us
>>>> >>> > > to continue throughout the planning phases. However, we have to
>>>> >>> register
>>>> >>> > > all the rules for all the phases of planning which is a bit
>>>> unclean.
>>>> >>> > We're
>>>> >>> > > hitting the same problem in the case of Phoenix where we need to
>>>> >>> register
>>>> >>> > > materialized views as part of planner initialization but the
>>>> hack from
>>>> >>> > the
>>>> >>> > > JDBC case won't really work.
>>>> >>> > >
>>>> >>> > > I suggest we update the interface to allow better support for
>>>> these
>>>> >>> types
>>>> >>> > > of integrations.
>>>> >>> > >
>>>> >>> > > These seem to be the main requirements:
>>>> >>> > > 1. Expose concrete planning phases to storage plugins
>>>> >>> > > 2. Allow a storage plugin to provide additional planner
>>>> initialization
>>>> >>> > > behavior
>>>> >>> > > 3. Allow a storage plugin to provide rules to include a
>>>> particular
>>>> >>> > > planning phase (merged with other rules during that phase).
>>>> >>> > > 4. (possibly) allow a storage plugin to provide transformation
>>>> >>> programs
>>>> >>> > > that are to be executed in between the concrete planning phases.
>>>> >>> > >
>>>> >>> > > Item (4) above is the most questionable to me as I wonder
>>>> whether or
>>>> >>> not
>>>> >>> > > this could simply be solved by creating a transformation rule
>>>> (or
>>>> >>> program
>>>> >>> > > rule in Calcite's terminology) that creates an alternative tree
>>>> and
>>>> >>> thus
>>>> >>> > be
>>>> >>> > > solved by (3).
>>>> >>> > >
>>>> >>> > > A simple solution might be (if we ignore #4):
>>>> >>> > >
>>>> >>> > > PlannerIntegration StoragePlugin.getPlannerIntegrations()
>>>> >>> > >
>>>> >>> > > interface PlannerIntegration{
>>>> >>> > >   void initialize(Planner, Phase)
>>>> >>> > > }
>>>> >>> > >
>>>> >>> > > This way, a storage plugin could register rules (or materialized
>>>> >>> views)
>>>> >>> > at
>>>> >>> > > setup time.
>>>> >>> > >
>>>> >>> > > What do others think?
>>>> >>> > >
>>>> >>> > > [1]
>>>> >>> > >
>>>> >>> >
>>>> >>>
>>>> https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcStoragePlugin.java#L145
>>>> >>> > > [2]
>>>> >>> > >
>>>> >>> >
>>>> >>>
>>>> https://github.com/jacques-n/drill/commit/d463f9098ef63b9a2844206950334cb16fc00327#diff-e67ba82ec2fbb8bc15eed30ec6a5379cR119
>>>> >>> > >
>>>> >>> > > --
>>>> >>> > > Jacques Nadeau
>>>> >>> > > CTO and Co-Founder, Dremio
>>>> >>> > >
>>>> >>> >
>>>> >>>
>>>> >>
>>>> >>
>>>>
>>>
>>>
>>
>

Re: Improvements to storage plugin planning integration support

Reply via email to