I think it's a good proposal to put Exchange/Distribution into Calcite library.
Make sense to me. +1 On Wed, Feb 11, 2015 at 12:45 PM, Julian Hyde <[email protected]> wrote: > Drill guys: What do you think of the proposal? > > On Feb 11, 2015, at 11:34 AM, Ashutosh Chauhan <[email protected]> > wrote: > > Overall proposal sounds good to me. +1 > > On Tue, Feb 10, 2015 at 3:35 PM, Julian Hyde <[email protected]> wrote: > > I've had some discussions about adding an Exchange operator and > Distribution trait to Hive's cost-based optimizer, which uses Calcite. > Ashutosh has logged a bug [ > https://issues.apache.org/jira/browse/CALCITE-594 ] and pull request > containing a proof-of-concept [ > https://github.com/apache/incubator-calcite/pull/52/files ]. > > I know that Drill has a Distribution trait and several sub-classes of > Exchange operator (DrillDistributionTrait, ExchangePrel, > BroadcastExchangePrel, HashToMergeExchangePrel, HashToRandomExchangePrel, > OrderedPartitionExchangePrel and SimpleMergeExchangePrel, in > > > https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical > ) > > I propose to create a Distribution trait and Exchange operator base class > in Calcite, with the goal that both Drill and Hive would use them. (I am > adopting Drill terminology -- Distribution rather than Partition, Exchange > rather than Shuffle -- but I am pretty sure that the concepts are the > same.) > > public abstract class Exchange extends SingleRel { > public final RelDistribution distribution; > > protected Exchange(RelCluster cluster, RelTraitSet traitSet, RelNode > input, RelDistribution distribution) { > super(cluster, traitSet, input); > this.distribution = distribution; > } > } > > public interface RelDistribution extends RelMultipleTrait { > enum DistributionType { > SINGLETON, > HASH_DISTRIBUTED, > RANGE_DISTRIBUTED, > RANDOM_DISTRIBUTED, > ROUND_ROBIN_DISTRIBUTED, > BROADCAST_DISTRIBUTED > } > > public DistributionType getType(); > public ImmutableIntList getFields(); > } > > Calcite would not contain any particular exchange algorithms. However, > since it is common to combine sort and exchange, I would create a base > class for it: > > public abstract class SortExchange extends Exchange { > public final Collation collation; > > ... > } > > The physical operators would remain in Drill/Hive and would likely be fully > specified by the distribution and collation; they would not need any > additional attributes. We would not be able to port > DrillDistributionTraitDef.convert directly -- it would create a > LogicalExchange (analogous to how RelCollationTraitDef.convert creates a > LogicalSort) and then Drill rules would need to kick in to convert that to > HashToRandomExchangePrel etc. > > I do not think that RelDistribution needs to be a "multiple" trait (compare > with RelCollation extends RelMultipleTrait, which allows a RelNode to have > more than one sort-order) but I may be wrong. > > The advantages of making Exchange a first-class operator and Distribution a > trait are clear. We will be able to build a library of rules (e.g. > FilterExchangePushRule, ExchangeRemoveRule), a RelMdDistribution metadata > interface, and start working on stats and cost model. > > Drill and Hive stakeholders, please let me know what you think of this > plan. > > Julian >
