Thank you JinFeng for the education on Drill planning! That probably justifies putting secondary index into physical planning. What I was trying to say was that secondary index is not "a faster physical access mechanism", it is just a Phoenix table. And it makes big difference in planning related to Sort, Join and Aggregate as you said. In the pure Calcite world, this is more of a Logical thing.
Thanks, Maryann On Thu, Oct 22, 2015 at 2:26 PM, Jinfeng Ni <[email protected]> wrote: > I do not know how Phoenix's planning works. For Drill, my > understanding is during logical planning, "collation" trait is only > used in SortRemoveRule, to remove the redundant sort operator. (Those > "sort" operators are the one created by Calcite for user-explicit > "ORDER BY" / "LIMIT", not the "enforcer" created in physical > planning). > > The "collation" trait would not have impact in logical planning for > join / aggregation. The decision between sort-based vs hash-based > join / aggregation is made in physical planning. At that stage, the > "collation" would matter a lot, as it would mean whether Drill has to > add an "enforcer" to get certain trait, in order to get a plan with > sort-based join / aggregation. > > The "collation" trait acts like a physical property, it's more nature > to expose "collation" in physical planning in stead of logical > planning, which more focus on properties inherent in relational > expression. Aman's view that secondary index is part of physical > planning makes sense to me. > > On Thu, Oct 22, 2015 at 10:54 AM, Maryann Xue <[email protected]> > wrote: > > Hi Aman Sinha, > > > > Yes, Phoenix uses materialization in Calcite to model its secondary index > > querying. But it's not right to say "In that sense, it would seem to fit > > into physical planning phase rather than logical, since indexes are a > > faster physical access mechanism for a scan. The logical properties of a > > table don't change due to presence of an index." > > > > A secondary index in Phoenix is a projection of part or all of the > columns > > of the original table, and is usually indexed (and sorted) on a different > > key other than the primary key of the original table. The key in Phoenix > > table (HBase table) is crucial in two ways: > > 1. Filtering: the use of skip-scan or range-scan vs. full scan. > > 2. Ordering > > > > The second aspect is represented in Calcite by "collation" trait, which > can > > make a radical difference in logical planning. Replacing the original > table > > with one of its indices might end up changing the whole plan completely. > > > > I am not sure yet which stage the Phoenix materialization will eventually > > go, but one certain thing is that it should be available for all the > > general optimizations to take effect. > > > > > > Thanks, > > Maryann > > > > On Wed, Oct 14, 2015 at 12:55 PM, Aman Sinha <[email protected]> > wrote: > > > >> Catching up on this thread. Jacques, if I understand correctly, you > are > >> proposing that instead of the single point of initialization of rules > when > >> we instantiate FrameworkConfig (in DrillSqlWorker), we would have more > >> entry points to plug into different phases of planning and storage > plugins > >> would register different sets of rules in these separate phases. It > seems > >> fine to me (assuming that there are no side effects where we somehow > end up > >> increasing the search space for the existing plans). > >> > >> When talking about the Phoenix integration or the JDBC storage plugin, I > >> am curious about which phase(s) would they register the rules for ? I > >> believe Phoenix's materialized view usage in Calcite is actually for > >> secondary indexing, not for materialized views per se. In that sense, > it > >> would seem to fit into physical planning phase rather than logical, > since > >> indexes are a faster physical access mechanism for a scan. The logical > >> properties of a table don't change due to presence of an index. > >> > >> On the other hand, I think the JDBC plugin might register rules for > >> logical phase since it would have filter and projection pushdowns that > do > >> change logical properties. > >> > >> Aman > >> > >> > >> On Mon, Oct 12, 2015 at 5:36 PM, Hanifi Gunes <[email protected]> > wrote: > >> > >>> I would +1 (1-3) for sure. I do not have much understanding of programs > >>> however additional flexibility for storage plugin devs sounds cool in > >>> general when used responsibly =) so +0 for (4) > >>> > >>> > >>> -H+ > >>> > >>> On Mon, Oct 12, 2015 at 4:12 PM, Jacques Nadeau <[email protected]> > >>> wrote: > >>> > >>> > The dead air must mean that everyone is onboard with my > recommendation > >>> > > >>> > PlannerIntegration StoragePlugin.getPlannerIntegrations() > >>> > > >>> > interface PlannerIntegration{ > >>> > void initialize(Planner, Phase) > >>> > } > >>> > > >>> > Right :D > >>> > > >>> > -- > >>> > Jacques Nadeau > >>> > CTO and Co-Founder, Dremio > >>> > > >>> > On Fri, Oct 9, 2015 at 7:03 AM, Jacques Nadeau <[email protected]> > >>> wrote: > >>> > > >>> > > A number of us were meeting last week to work through integrating > the > >>> > > Phoenix storage plugin. This plugin is interesting because it also > >>> uses > >>> > > Calcite for planning. In some ways, this should make integration > easy. > >>> > > However, it also allowed us to see certain constraints who how we > >>> expose > >>> > > planner integration between storage plugins and Drill internals. > >>> > > Currently, Drill asks the plugin to provide a set of optimizer > rules > >>> > which > >>> > > it incorporates into one of the many stages of planning. This is > too > >>> > > constraining in two ways: > >>> > > > >>> > > 1. it doesn't allow a plugin to decide which phase of planning to > >>> > > integrate with. (This was definitely a problem in the Phoenix case. > >>> Our > >>> > > hack solution for now is to incorporate storage plugin rules in > phases > >>> > > instead of just one [1].) > >>> > > 2. it doesn't allow arbitrary transformations. Calcite provides a > >>> program > >>> > > concept. It may be that a plugin needs to do some of its own work > >>> using > >>> > the > >>> > > Hep planner. Currently there isn't an elegant way to do this in the > >>> > context > >>> > > of the rule. > >>> > > 3. There is no easy way to incorporate additional planner > >>> initialization > >>> > > options. This was almost a problem in the case of the JDBC plugin. > It > >>> > > turned out that a hidden integration using register() here [2] > >>> allowed us > >>> > > to continue throughout the planning phases. However, we have to > >>> register > >>> > > all the rules for all the phases of planning which is a bit > unclean. > >>> > We're > >>> > > hitting the same problem in the case of Phoenix where we need to > >>> register > >>> > > materialized views as part of planner initialization but the hack > from > >>> > the > >>> > > JDBC case won't really work. > >>> > > > >>> > > I suggest we update the interface to allow better support for these > >>> types > >>> > > of integrations. > >>> > > > >>> > > These seem to be the main requirements: > >>> > > 1. Expose concrete planning phases to storage plugins > >>> > > 2. Allow a storage plugin to provide additional planner > initialization > >>> > > behavior > >>> > > 3. Allow a storage plugin to provide rules to include a particular > >>> > > planning phase (merged with other rules during that phase). > >>> > > 4. (possibly) allow a storage plugin to provide transformation > >>> programs > >>> > > that are to be executed in between the concrete planning phases. > >>> > > > >>> > > Item (4) above is the most questionable to me as I wonder whether > or > >>> not > >>> > > this could simply be solved by creating a transformation rule (or > >>> program > >>> > > rule in Calcite's terminology) that creates an alternative tree and > >>> thus > >>> > be > >>> > > solved by (3). > >>> > > > >>> > > A simple solution might be (if we ignore #4): > >>> > > > >>> > > PlannerIntegration StoragePlugin.getPlannerIntegrations() > >>> > > > >>> > > interface PlannerIntegration{ > >>> > > void initialize(Planner, Phase) > >>> > > } > >>> > > > >>> > > This way, a storage plugin could register rules (or materialized > >>> views) > >>> > at > >>> > > setup time. > >>> > > > >>> > > What do others think? > >>> > > > >>> > > [1] > >>> > > > >>> > > >>> > https://github.com/apache/drill/blob/master/contrib/storage-jdbc/src/main/java/org/apache/drill/exec/store/jdbc/JdbcStoragePlugin.java#L145 > >>> > > [2] > >>> > > > >>> > > >>> > https://github.com/jacques-n/drill/commit/d463f9098ef63b9a2844206950334cb16fc00327#diff-e67ba82ec2fbb8bc15eed30ec6a5379cR119 > >>> > > > >>> > > -- > >>> > > Jacques Nadeau > >>> > > CTO and Co-Founder, Dremio > >>> > > > >>> > > >>> > >> > >> >
