Yes, in general collation is a better fit as a physical property rather than logical property of a plan node. With regard to places where it makes sense to treat it as logical property, agree with the ORDER-BY comments and these should be extended to window functions too: SELECT b, RANK() OVER (ORDER BY b) FROM table; I would think the LogicalWindow should have collation on b.
Jinfeng, the subquery's ORDER-BY can be dropped in some cases but not all.. for instance in the following query: SELECT a1 FROM (SELECT a1 FROM t1 WHERE .... ORDER BY a1) LIMIT 10; The OB should not be dropped. There are other cases, this is one example. Aman On Fri, Jul 31, 2015 at 9:09 AM, Jinfeng Ni <[email protected]> wrote: > I think it makes sense that LogicalAggregate does not have collation, since > a LogicalAggregate could be implemented with different physical operator, > either hash-based aggregation, or sort-based aggregation. Only when > LogicalAggregate is converted into physical aggregator, it makes sense to > have collation, depending on the which physical operator is used. > > Same thing could be applied to LogicalJoin, which could be implemented > either as hash-join, or sort-based join. > > At logical level, the only collation will come from the top level ORDER BY > clause. In that sense, I feel that the ORDER BY clause in a SUBQUERY, or > VIEW probably should be removed in logical planning, since semantically it > does not impact query result. > > SELECT S.C1, T2.C4 > FROM (SELECT C1, C2, C3 > FROM T1 ORDER BY C1) AS S JOIN > T2 > ON S ... > ORDER BY T2.C4; > > In Drill, we separate logical planning from physical planning, where the > collation (together with distribution trait) will matter in physical > planing. > > > > > On Fri, Jul 31, 2015 at 7:27 AM, Milinda Pathirage <[email protected]> > wrote: > > > Thanks Julian for looking in to this. Thanks Maryann for detecting the > > issue in CALCITE-783 patch. > > > > As I understand we only need input's (input to aggregate) order related > > metadata at the level of aggregate. I think I was wrong saying that > > LogicalAggregate discards collation metadata from input in CALCITE-784 > > given that input is accessible from LogicalAggregate. We will only need > to > > do some calculations on input's collation metadata (or something similar) > > if we need to infer something about LogicalAggregate to be use by > operators > > which take aggregate as an input. > > > > Thanks > > Milinda > > > > On Thu, Jul 30, 2015 at 11:32 PM, Maryann Xue <[email protected]> > > wrote: > > > > > Thanks Julian for taking time to sort out all these requirements and > > > rethink about the model! > > > Thank you Milinda! Really appreciate your quick response to the issue. > > > > > > On Thu, Jul 30, 2015 at 4:57 PM, Julian Hyde <[email protected]> wrote: > > > > > >> There are a few issues in play regarding collations (783, 784, 793; > see > > >> links below) and they seem to be overlapping. Maryann and Milinda have > > been > > >> at odds with each other (in the politest possible way!) > > >> > > >> The cause is that they are both doing very interesting new work using > > >> collation: > > >> * Maryann is optimizing Phoenix plans to use secondary indexes. These > > are > > >> tables that are project-sort materializations of a base table, itself > > >> sorted. > > >> * Milinda is planning Samza streaming-aggregation queries. A plan can > > >> only be found if you know that the stream is sorted on one of the > > >> aggregation keys, usually a time column. > > >> > > >> I spoke with Maryann about this today. I think that logical plans > should > > >> not have a sort order: > > >> * In 783 and 784, I think I was wrong to allow logical RelNodes > > >> (LogicalProject and LogicalAggregate) to have collations. Because they > > are > > >> logical, they are inherently un-sorted. (But they may be based on a > > table, > > >> say an ArrayTable, that does have a sort order.) > > >> * In 793, Maryann was right so say that we should not bake in the > > >> collation that a plan *happens to have* when the SQL is first > > translated, > > >> because trying to find a physical plan with the same collation > restricts > > >> our options. > > >> > > >> But SQL ASTs should have a sort order (if the top node is an ORDER BY > > >> clause, or if a table referenced in the FROM clause is a stream) and > > >> physical RelNodes should also have a sort order. > > >> > > >> And Milinda’s logical plans need a concept similar to sorting. Maybe a > > >> piece of metadata that this RelNode *could be sorted by X, Y if > > desired*. > > >> Any table can, of course, be re-sorted into any order you like, but a > > >> stream, which is infinite, can only be re-sorted to an order that does > > not > > >> conflict with the order of the incoming data. > > >> > > >> I still need to roll up my sleeves and help these patient developers > > >> (especially Milinda) get something working, but I hope it helps to > have > > a > > >> general direction. > > >> > > >> Julian > > >> > > >> * https://issues.apache.org/jira/browse/CALCITE-783 Infer collation > of > > >> Project using monotonicity > > >> * https://issues.apache.org/jira/browse/CALCITE-784 > LogicalAggregate's > > >> create method discards any collation traits from input > > >> * https://issues.apache.org/jira/browse/CALCITE-793 The compiler asks > > >> for unnecessary collation trait on plan with materialized view > > >> * https://issues.apache.org/jira/browse/CALCITE-825 Allow user to > > >> specify sort order of an ArrayTable > > >> > > > > > > > > > > > > -- > > Milinda Pathirage > > > > PhD Student | Research Assistant > > School of Informatics and Computing | Data to Insight Center > > Indiana University > > > > twitter: milindalakmal > > skype: milinda.pathirage > > blog: http://milinda.pathirage.org > > >
