Thanks Maryann. On Thu, Oct 22, 2015 at 1:04 PM, Maryann Xue <[email protected]> wrote:
> Hi Aman, > > Thought these entries might be related to our discussion about > non-covering indices just now: > > https://issues.apache.org/jira/browse/CALCITE-772 > https://issues.apache.org/jira/browse/CALCITE-773 > > > Thanks, > Maryann > > On Thu, Oct 22, 2015 at 2:48 PM, Aman Sinha <[email protected]> wrote: > >> Thanks Maryann and Jinfeng for your comments. I understand the Phoenix >> approach better now that Maryann clarified that the index is actually a >> projection of some or all columns (non primary key columns) of the table. >> In the relational world, this is similar to what systems such as Vertica >> have done. >> >> Aman >> >> On Thu, Oct 22, 2015 at 11:32 AM, Maryann Xue <[email protected]> >> wrote: >> >>> Thank you JinFeng for the education on Drill planning! That probably >>> justifies putting secondary index into physical planning. >>> What I was trying to say was that secondary index is not "a faster >>> physical access mechanism", it is just a Phoenix table. And it makes big >>> difference in planning related to Sort, Join and Aggregate as you said. In >>> the pure Calcite world, this is more of a Logical thing. >>> >>> >>> Thanks, >>> Maryann >>> >>> On Thu, Oct 22, 2015 at 2:26 PM, Jinfeng Ni <[email protected]> >>> wrote: >>> >>>> I do not know how Phoenix's planning works. For Drill, my >>>> understanding is during logical planning, "collation" trait is only >>>> used in SortRemoveRule, to remove the redundant sort operator. (Those >>>> "sort" operators are the one created by Calcite for user-explicit >>>> "ORDER BY" / "LIMIT", not the "enforcer" created in physical >>>> planning). >>>> >>>> The "collation" trait would not have impact in logical planning for >>>> join / aggregation. The decision between sort-based vs hash-based >>>> join / aggregation is made in physical planning. At that stage, the >>>> "collation" would matter a lot, as it would mean whether Drill has to >>>> add an "enforcer" to get certain trait, in order to get a plan with >>>> sort-based join / aggregation. >>>> >>>> The "collation" trait acts like a physical property, it's more nature >>>> to expose "collation" in physical planning in stead of logical >>>> planning, which more focus on properties inherent in relational >>>> expression. Aman's view that secondary index is part of physical >>>> planning makes sense to me. >>>> >>>> On Thu, Oct 22, 2015 at 10:54 AM, Maryann Xue <[email protected]> >>>> wrote: >>>> > Hi Aman Sinha, >>>> > >>>> > Yes, Phoenix uses materialization in Calcite to model its secondary >>>> index >>>> > querying. But it's not right to say "In that sense, it would seem to >>>> fit >>>> > into physical planning phase rather than logical, since indexes are a >>>> > faster physical access mechanism for a scan. The logical properties >>>> of a >>>> > table don't change due to presence of an index." >>>> > >>>> > A secondary index in Phoenix is a projection of part or all of the >>>> columns >>>> > of the original table, and is usually indexed (and sorted) on a >>>> different >>>> > key other than the primary key of the original table. The key in >>>> Phoenix >>>> > table (HBase table) is crucial in two ways: >>>> > 1. Filtering: the use of skip-scan or range-scan vs. full scan. >>>> > 2. Ordering >>>> > >>>> > The second aspect is represented in Calcite by "collation" trait, >>>> which can >>>> > make a radical difference in logical planning. Replacing the original >>>> table >>>> > with one of its indices might end up changing the whole plan >>>> completely. >>>> > >>>> > I am not sure yet which stage the Phoenix materialization will >>>> eventually >>>> > go, but one certain thing is that it should be available for all the >>>> > general optimizations to take effect. >>>> > >>>> > >>>> > Thanks, >>>> > Maryann >>>> > >>>> > On Wed, Oct 14, 2015 at 12:55 PM, Aman Sinha <[email protected]> >>>> wrote: >>>> > >>>> >> Catching up on this thread. Jacques, if I understand correctly, >>>> you are >>>> >> proposing that instead of the single point of initialization of >>>> rules when >>>> >> we instantiate FrameworkConfig (in DrillSqlWorker), we would have >>>> more >>>> >> entry points to plug into different phases of planning and storage >>>> plugins >>>> >> would register different sets of rules in these separate phases. >>>> It seems >>>> >> fine to me (assuming that there are no side effects where we somehow >>>> end up >>>> >> increasing the search space for the existing plans). >>>> >> >>>> >> When talking about the Phoenix integration or the JDBC storage >>>> plugin, I >>>> >> am curious about which phase(s) would they register the rules for ? >>>> I >>>> >> believe Phoenix's materialized view usage in Calcite is actually for >>>> >> secondary indexing, not for materialized views per se. In that >>>> sense, it >>>> >> would seem to fit into physical planning phase rather than logical, >>>> since >>>> >> indexes are a faster physical access mechanism for a scan. The >>>> logical >>>> >> properties of a table don't change due to presence of an index. >>>> >> >>>> >> On the other hand, I think the JDBC plugin might register rules for >>>> >> logical phase since it would have filter and projection pushdowns >>>> that do >>>> >> change logical properties. >>>> >> >>>> >> Aman >>>> >> >>>> >> >>>> >> On Mon, Oct 12, 2015 at 5:36 PM, Hanifi Gunes <[email protected]> >>>> wrote: >>>> >> >>>> >>> I would +1 (1-3) for sure. I do not have much understanding of >>>> programs >>>> >>> however additional flexibility for storage plugin devs sounds cool >>>> in >>>> >>> general when used responsibly =) so +0 for (4) >>>> >>> >>>> >>> >>>> >>> -H+ >>>> >>> >>>> >>> On Mon, Oct 12, 2015 at 4:12 PM, Jacques Nadeau <[email protected] >>>> > >>>> >>> wrote: >>>> >>> >>>> >>> > The dead air must mean that everyone is onboard with my >>>> recommendation >>>> >>> > >>>> >>> > PlannerIntegration StoragePlugin.getPlannerIntegrations() >>>> >>> > >>>> >>> > interface PlannerIntegration{ >>>> >>> > void initialize(Planner, Phase) >>>> >>> > } >>>> >>> > >>>> >>> > Right :D >>>> >>> > >>>> >>> > -- >>>> >>> > Jacques Nadeau >>>> >>> > CTO and Co-Founder, Dremio >>>> >>> > >>>> >>> > On Fri, Oct 9, 2015 at 7:03 AM, Jacques Nadeau < >>>> [email protected]> >>>> >>> wrote: >>>> >>> > >>>> >>> > > A number of us were meeting last week to work through >>>> integrating the >>>> >>> > > Phoenix storage plugin. This plugin is interesting because it >>>> also >>>> >>> uses >>>> >>> > > Calcite for planning. In some ways, this should make >>>> integration easy. >>>> >>> > > However, it also allowed us to see certain constraints who how >>>> we >>>> >>> expose >>>> >>> > > planner integration between storage plugins and Drill internals. >>>> >>> > > Currently, Drill asks the plugin to provide a set of optimizer >>>> rules >>>> >>> > which >>>> >>> > > it incorporates into one of the many stages of planning. This >>>> is too >>>> >>> > > constraining in two ways: >>>> >>> > > >>>> >>> > > 1. it doesn't allow a plugin to decide which phase of planning >>>> to >>>> >>> > > integrate with. (This was definitely a problem in the Phoenix >>>> case. >>>> >>> Our >>>> >>> > > hack solution for now is to incorporate storage plugin rules in >>>> phases >>>> >>> > > instead of just one [1].) >>>> >>> > > 2. it doesn't allow arbitrary transformations. Calcite provides >>>> a >>>> >>> program >>>> >>> > > concept. It may be that a plugin needs to do some of its own >>>> work >>>> >>> using >>>> >>> > the >>>> >>> > > Hep planner. Currently there isn't an elegant way to do this in >>>> the >>>> >>> > context >>>> >>> > > of the rule. >>>> >>> > > 3. There is no easy way to incorporate additional planner >>>> >>> initialization >>>> >>> > > options. This was almost a problem in the case of the JDBC >>>> plugin. It >>>> >>> > > turned out that a hidden integration using register() here [2] >>>> >>> allowed us >>>> >>> > > to continue throughout the planning phases. However, we have to >>>> >>> register >>>> >>> > > all the rules for all the phases of planning which is a bit >>>> unclean. >>>> >>> > We're >>>> >>> > > hitting the same problem in the case of Phoenix where we need to >>>> >>> register >>>> >>> > > materialized views as part of planner initialization but the >>>> hack from >>>> >>> > the >>>> >>> > > JDBC case won't really work. >>>> >>> > > >>>> >>> > > I suggest we update the interface to allow better support for >>>> these >>>> >>> types >>>> >>> > > of integrations. >>>> >>> > > >>>> >>> > > These seem to be the main requirements: >>>> >>> > > 1. Expose concrete planning phases to storage plugins >>>> >>> > > 2. Allow a storage plugin to provide additional planner >>>> initialization >>>> >>> > > behavior >>>> >>> > > 3. Allow a storage plugin to provide rules to include a >>>> particular >>>> >>> > > planning phase (merged with other rules during that phase). >>>> >>> > > 4. (possibly) allow a storage plugin to provide transformation >>>> >>> programs >>>> >>> > > that are to be executed in between the concrete planning phases. >>>> >>> > > >>>> >>> > > Item (4) above is the most questionable to me as I wonder >>>> whether or >>>> >>> not >>>> >>> > > this could simply be solved by creating a transformation rule >>>> (or >>>> >>> program >>>> >>> > > rule in Calcite's terminology) that creates an alternative tree >>>> and >>>> >>> thus >>>> >>> > be >>>> >>> > > solved by (3). >>>> >>> > > >>>> >>> > > A simple solution might be (if we ignore #4): >>>> >>> > > >>>> >>> > > PlannerIntegration StoragePlugin.getPlannerIntegrations() >>>> >>> > > >>>> >>> > > interface PlannerIntegration{ >>>> >>> > > void initialize(Planner, Phase) >>>> >>> > > } >>>> >>> > > >>>> >>> > > This way, a storage plugin could register rules (or materialized >>>> >>> views) >>>> >>> > at >>>> >>> > > setup time. >>>> >>> > > >>>> >>> > > What do others think? >>>> >>> > > >>>> >>> > > [1] >>>> >>> > > >>>> >>> > >>>> >>> >>>> https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcStoragePlugin.java#L145 >>>> >>> > > [2] >>>> >>> > > >>>> >>> > >>>> >>> >>>> https://github.com/jacques-n/drill/commit/d463f9098ef63b9a2844206950334cb16fc00327#diff-e67ba82ec2fbb8bc15eed30ec6a5379cR119 >>>> >>> > > >>>> >>> > > -- >>>> >>> > > Jacques Nadeau >>>> >>> > > CTO and Co-Founder, Dremio >>>> >>> > > >>>> >>> > >>>> >>> >>>> >> >>>> >> >>>> >>> >>> >> >
