TL;DR Both top-down physical TraitSet request and bottom-up TraitSet derivation have their strongth and weakness, we propose on-demand TraitSet request to combine the above two, to reduce the number of plan alternatives that are genereated, especially in distributed system.
e.g. select * from foo join bar on f1=b1 and f2=b2 and f3=b3; In non-distributed system, we can generate a sort merge join, requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3. But if foo happens to be sorted by f3,f2,f1, we may miss the chance of making use of the delivered ordering of foo. Because if we require bar to be sorted by b3,b2,b1, we don't need to sort on foo anymore. There are so many choices, n!, not even considering asc/desc and null direction. We can't request all the possible traitsets in top-down way, and can't derive all the possible traitsets in bottom-up way either. We propose on-demand traitset request by adding a new type of metadata DerivedTraitSets into the built-in metadata system. List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery) In this metadata, every operator returns several possbile traitsets that may be derived from this operator. Using above query as an example, the tablescan on foo should return traiset with collation on f3, f2, f1. In physical implementation rules, e.g. the SortMergeJoinRule, it gets possible traitsets from both child operators, uses the join keys to eliminate useless traitsets, leaves out usefull traitsets, and requests corresponding traitset on the other child. This relies on the feature of AbstractConverter, which is turned off by default, due to performance issue [1]. Thoughts? [1] https://issues.apache.org/jira/browse/CALCITE-2970 Haisheng