Right now this type of work is done here: https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashAggPrule.java https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/AggPruleBase.java
With Distribution Trait application here: https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTraitDef.java To me, the easiest way to solve the Phoenix issue is by providing a rule that matches HashAgg and StreamAgg but requires Phoenix convention as input. It would replace everywhere but would only be plannable when it is the first phase of aggregation. Thoughts? -- Jacques Nadeau CTO and Co-Founder, Dremio On Thu, Oct 1, 2015 at 2:30 PM, Julian Hyde <[email protected]> wrote: > Phoenix is able to perform quite a few relational operations on the > region server: scan, filter, project, aggregate, sort (optionally with > limit). However, the sort and aggregate are necessarily "local". They > can only deal with data on that region server, and there needs to be a > further operation to combine the results from the region servers. > > The question is how to plan such queries. I think the answer is an > AggregateExchangeTransposeRule. > > The rule would spot an Aggregate on a data source that is split into > multiple locations (partitions) and split it into a partial Aggregate > that computes sub-totals and a summarizing Aggregate that combines > those totals. > > How does the planner know that the Aggregate needs to be split? Since > the data's distribution has changed, there would need to be an > Exchange operator. It is the Exchange operator that triggers the rule > to fire. > > There are some special cases. If the data is sorted as well as > partitioned (say because the local aggregate uses a sort-based > algorithm) we could maybe use a more efficient plan. And if the > partition key is the same as the aggregation key we don't need a > summarizing Aggregate, just a Union. > > It turns out not to be very Phoenix-specific. In the Drill-on-Phoenix > scenario, once the Aggregate has been pushed through the Exchange > (i.e. onto the drill-bit residing on the region server) we can then > push the DrillAggregate across the drill-to-phoenix membrane and make > it into a PhoenixServerAggregate that executes in the region server. > > Related issues: > * https://issues.apache.org/jira/browse/DRILL-3840 > * https://issues.apache.org/jira/browse/CALCITE-751 > > Julian >
