Hi Aman, Thought these entries might be related to our discussion about non-covering indices just now:
https://issues.apache.org/jira/browse/CALCITE-772 https://issues.apache.org/jira/browse/CALCITE-773 Thanks, Maryann On Thu, Oct 22, 2015 at 2:48 PM, Aman Sinha <[email protected]> wrote: > Thanks Maryann and Jinfeng for your comments. I understand the Phoenix > approach better now that Maryann clarified that the index is actually a > projection of some or all columns (non primary key columns) of the table. > In the relational world, this is similar to what systems such as Vertica > have done. > > Aman > > On Thu, Oct 22, 2015 at 11:32 AM, Maryann Xue <[email protected]> > wrote: > >> Thank you JinFeng for the education on Drill planning! That probably >> justifies putting secondary index into physical planning. >> What I was trying to say was that secondary index is not "a faster >> physical access mechanism", it is just a Phoenix table. And it makes big >> difference in planning related to Sort, Join and Aggregate as you said. In >> the pure Calcite world, this is more of a Logical thing. >> >> >> Thanks, >> Maryann >> >> On Thu, Oct 22, 2015 at 2:26 PM, Jinfeng Ni <[email protected]> >> wrote: >> >>> I do not know how Phoenix's planning works. For Drill, my >>> understanding is during logical planning, "collation" trait is only >>> used in SortRemoveRule, to remove the redundant sort operator. (Those >>> "sort" operators are the one created by Calcite for user-explicit >>> "ORDER BY" / "LIMIT", not the "enforcer" created in physical >>> planning). >>> >>> The "collation" trait would not have impact in logical planning for >>> join / aggregation. The decision between sort-based vs hash-based >>> join / aggregation is made in physical planning. At that stage, the >>> "collation" would matter a lot, as it would mean whether Drill has to >>> add an "enforcer" to get certain trait, in order to get a plan with >>> sort-based join / aggregation. >>> >>> The "collation" trait acts like a physical property, it's more nature >>> to expose "collation" in physical planning in stead of logical >>> planning, which more focus on properties inherent in relational >>> expression. Aman's view that secondary index is part of physical >>> planning makes sense to me. >>> >>> On Thu, Oct 22, 2015 at 10:54 AM, Maryann Xue <[email protected]> >>> wrote: >>> > Hi Aman Sinha, >>> > >>> > Yes, Phoenix uses materialization in Calcite to model its secondary >>> index >>> > querying. But it's not right to say "In that sense, it would seem to >>> fit >>> > into physical planning phase rather than logical, since indexes are a >>> > faster physical access mechanism for a scan. The logical properties >>> of a >>> > table don't change due to presence of an index." >>> > >>> > A secondary index in Phoenix is a projection of part or all of the >>> columns >>> > of the original table, and is usually indexed (and sorted) on a >>> different >>> > key other than the primary key of the original table. The key in >>> Phoenix >>> > table (HBase table) is crucial in two ways: >>> > 1. Filtering: the use of skip-scan or range-scan vs. full scan. >>> > 2. Ordering >>> > >>> > The second aspect is represented in Calcite by "collation" trait, >>> which can >>> > make a radical difference in logical planning. Replacing the original >>> table >>> > with one of its indices might end up changing the whole plan >>> completely. >>> > >>> > I am not sure yet which stage the Phoenix materialization will >>> eventually >>> > go, but one certain thing is that it should be available for all the >>> > general optimizations to take effect. >>> > >>> > >>> > Thanks, >>> > Maryann >>> > >>> > On Wed, Oct 14, 2015 at 12:55 PM, Aman Sinha <[email protected]> >>> wrote: >>> > >>> >> Catching up on this thread. Jacques, if I understand correctly, you >>> are >>> >> proposing that instead of the single point of initialization of rules >>> when >>> >> we instantiate FrameworkConfig (in DrillSqlWorker), we would have more >>> >> entry points to plug into different phases of planning and storage >>> plugins >>> >> would register different sets of rules in these separate phases. It >>> seems >>> >> fine to me (assuming that there are no side effects where we somehow >>> end up >>> >> increasing the search space for the existing plans). >>> >> >>> >> When talking about the Phoenix integration or the JDBC storage >>> plugin, I >>> >> am curious about which phase(s) would they register the rules for ? I >>> >> believe Phoenix's materialized view usage in Calcite is actually for >>> >> secondary indexing, not for materialized views per se. In that >>> sense, it >>> >> would seem to fit into physical planning phase rather than logical, >>> since >>> >> indexes are a faster physical access mechanism for a scan. The >>> logical >>> >> properties of a table don't change due to presence of an index. >>> >> >>> >> On the other hand, I think the JDBC plugin might register rules for >>> >> logical phase since it would have filter and projection pushdowns >>> that do >>> >> change logical properties. >>> >> >>> >> Aman >>> >> >>> >> >>> >> On Mon, Oct 12, 2015 at 5:36 PM, Hanifi Gunes <[email protected]> >>> wrote: >>> >> >>> >>> I would +1 (1-3) for sure. I do not have much understanding of >>> programs >>> >>> however additional flexibility for storage plugin devs sounds cool in >>> >>> general when used responsibly =) so +0 for (4) >>> >>> >>> >>> >>> >>> -H+ >>> >>> >>> >>> On Mon, Oct 12, 2015 at 4:12 PM, Jacques Nadeau <[email protected]> >>> >>> wrote: >>> >>> >>> >>> > The dead air must mean that everyone is onboard with my >>> recommendation >>> >>> > >>> >>> > PlannerIntegration StoragePlugin.getPlannerIntegrations() >>> >>> > >>> >>> > interface PlannerIntegration{ >>> >>> > void initialize(Planner, Phase) >>> >>> > } >>> >>> > >>> >>> > Right :D >>> >>> > >>> >>> > -- >>> >>> > Jacques Nadeau >>> >>> > CTO and Co-Founder, Dremio >>> >>> > >>> >>> > On Fri, Oct 9, 2015 at 7:03 AM, Jacques Nadeau <[email protected] >>> > >>> >>> wrote: >>> >>> > >>> >>> > > A number of us were meeting last week to work through >>> integrating the >>> >>> > > Phoenix storage plugin. This plugin is interesting because it >>> also >>> >>> uses >>> >>> > > Calcite for planning. In some ways, this should make integration >>> easy. >>> >>> > > However, it also allowed us to see certain constraints who how we >>> >>> expose >>> >>> > > planner integration between storage plugins and Drill internals. >>> >>> > > Currently, Drill asks the plugin to provide a set of optimizer >>> rules >>> >>> > which >>> >>> > > it incorporates into one of the many stages of planning. This is >>> too >>> >>> > > constraining in two ways: >>> >>> > > >>> >>> > > 1. it doesn't allow a plugin to decide which phase of planning to >>> >>> > > integrate with. (This was definitely a problem in the Phoenix >>> case. >>> >>> Our >>> >>> > > hack solution for now is to incorporate storage plugin rules in >>> phases >>> >>> > > instead of just one [1].) >>> >>> > > 2. it doesn't allow arbitrary transformations. Calcite provides a >>> >>> program >>> >>> > > concept. It may be that a plugin needs to do some of its own work >>> >>> using >>> >>> > the >>> >>> > > Hep planner. Currently there isn't an elegant way to do this in >>> the >>> >>> > context >>> >>> > > of the rule. >>> >>> > > 3. There is no easy way to incorporate additional planner >>> >>> initialization >>> >>> > > options. This was almost a problem in the case of the JDBC >>> plugin. It >>> >>> > > turned out that a hidden integration using register() here [2] >>> >>> allowed us >>> >>> > > to continue throughout the planning phases. However, we have to >>> >>> register >>> >>> > > all the rules for all the phases of planning which is a bit >>> unclean. >>> >>> > We're >>> >>> > > hitting the same problem in the case of Phoenix where we need to >>> >>> register >>> >>> > > materialized views as part of planner initialization but the >>> hack from >>> >>> > the >>> >>> > > JDBC case won't really work. >>> >>> > > >>> >>> > > I suggest we update the interface to allow better support for >>> these >>> >>> types >>> >>> > > of integrations. >>> >>> > > >>> >>> > > These seem to be the main requirements: >>> >>> > > 1. Expose concrete planning phases to storage plugins >>> >>> > > 2. Allow a storage plugin to provide additional planner >>> initialization >>> >>> > > behavior >>> >>> > > 3. Allow a storage plugin to provide rules to include a >>> particular >>> >>> > > planning phase (merged with other rules during that phase). >>> >>> > > 4. (possibly) allow a storage plugin to provide transformation >>> >>> programs >>> >>> > > that are to be executed in between the concrete planning phases. >>> >>> > > >>> >>> > > Item (4) above is the most questionable to me as I wonder >>> whether or >>> >>> not >>> >>> > > this could simply be solved by creating a transformation rule (or >>> >>> program >>> >>> > > rule in Calcite's terminology) that creates an alternative tree >>> and >>> >>> thus >>> >>> > be >>> >>> > > solved by (3). >>> >>> > > >>> >>> > > A simple solution might be (if we ignore #4): >>> >>> > > >>> >>> > > PlannerIntegration StoragePlugin.getPlannerIntegrations() >>> >>> > > >>> >>> > > interface PlannerIntegration{ >>> >>> > > void initialize(Planner, Phase) >>> >>> > > } >>> >>> > > >>> >>> > > This way, a storage plugin could register rules (or materialized >>> >>> views) >>> >>> > at >>> >>> > > setup time. >>> >>> > > >>> >>> > > What do others think? >>> >>> > > >>> >>> > > [1] >>> >>> > > >>> >>> > >>> >>> >>> https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcStoragePlugin.java#L145 >>> >>> > > [2] >>> >>> > > >>> >>> > >>> >>> >>> https://github.com/jacques-n/drill/commit/d463f9098ef63b9a2844206950334cb16fc00327#diff-e67ba82ec2fbb8bc15eed30ec6a5379cR119 >>> >>> > > >>> >>> > > -- >>> >>> > > Jacques Nadeau >>> >>> > > CTO and Co-Founder, Dremio >>> >>> > > >>> >>> > >>> >>> >>> >> >>> >> >>> >> >> >
