Right. In case ORDER BY is used with LIMIT in the subquery, we could not
drop ORDER BY.


On Fri, Jul 31, 2015 at 9:22 AM, Aman Sinha <[email protected]> wrote:

> Yes, in general collation is a better fit as a physical property rather
> than logical property of a plan node.  With regard to places where it makes
> sense to treat it as logical property, agree with the ORDER-BY comments and
> these should be extended to window functions too:
>    SELECT  b,  RANK() OVER (ORDER BY b) FROM table;
> I would think the LogicalWindow  should have collation on b.
>
> Jinfeng, the subquery's ORDER-BY can be dropped in some cases but not all..
> for instance in the following query:
>   SELECT  a1 FROM (SELECT a1 FROM t1 WHERE .... ORDER BY a1)  LIMIT 10;
> The OB should not be dropped.  There are other cases, this is one example.
>
> Aman
>
> On Fri, Jul 31, 2015 at 9:09 AM, Jinfeng Ni <[email protected]> wrote:
>
> > I think it makes sense that LogicalAggregate does not have collation,
> since
> > a LogicalAggregate could be implemented with different physical operator,
> > either hash-based aggregation, or sort-based aggregation. Only when
> > LogicalAggregate is converted into physical aggregator,  it makes sense
> to
> > have collation, depending on the which physical operator is used.
> >
> > Same thing could be applied to LogicalJoin, which could be implemented
> > either as hash-join, or sort-based join.
> >
> > At logical level, the only collation will come from the top level ORDER
> BY
> > clause.  In that sense, I feel that the ORDER BY clause in a SUBQUERY, or
> > VIEW probably should be removed in logical planning, since semantically
> it
> > does not impact query result.
> >
> >  SELECT   S.C1, T2.C4
> >  FROM  (SELECT C1, C2, C3
> >                FROM T1 ORDER BY C1) AS S JOIN
> >                T2
> > ON S  ...
> > ORDER BY T2.C4;
> >
> > In Drill, we separate logical planning from physical planning, where the
> > collation (together with distribution trait) will matter in physical
> > planing.
> >
> >
> >
> >
> > On Fri, Jul 31, 2015 at 7:27 AM, Milinda Pathirage <
> [email protected]>
> > wrote:
> >
> > > Thanks Julian for looking in to this. Thanks Maryann for detecting the
> > > issue in CALCITE-783 patch.
> > >
> > > As I understand we only need input's (input to aggregate) order related
> > > metadata at the level of aggregate. I think I was wrong saying that
> > > LogicalAggregate discards collation metadata from input in CALCITE-784
> > > given that input is accessible from LogicalAggregate. We will only need
> > to
> > > do some calculations on input's collation metadata (or something
> similar)
> > > if we need to infer something about LogicalAggregate to be use by
> > operators
> > > which take aggregate as an input.
> > >
> > > Thanks
> > > Milinda
> > >
> > > On Thu, Jul 30, 2015 at 11:32 PM, Maryann Xue <[email protected]>
> > > wrote:
> > >
> > > > Thanks Julian for taking time to sort out all these requirements and
> > > > rethink about the model!
> > > > Thank you Milinda! Really appreciate your quick response to the
> issue.
> > > >
> > > > On Thu, Jul 30, 2015 at 4:57 PM, Julian Hyde <[email protected]>
> wrote:
> > > >
> > > >> There are a few issues in play regarding collations (783, 784, 793;
> > see
> > > >> links below) and they seem to be overlapping. Maryann and Milinda
> have
> > > been
> > > >> at odds with each other (in the politest possible way!)
> > > >>
> > > >> The cause is that they are both doing very interesting new work
> using
> > > >> collation:
> > > >> * Maryann is optimizing Phoenix plans to use secondary indexes.
> These
> > > are
> > > >> tables that are project-sort materializations of a base table,
> itself
> > > >> sorted.
> > > >> * Milinda is planning Samza streaming-aggregation queries. A plan
> can
> > > >> only be found if you know that the stream is sorted on one of the
> > > >> aggregation keys, usually a time column.
> > > >>
> > > >> I spoke with Maryann about this today. I think that logical plans
> > should
> > > >> not have a sort order:
> > > >> * In 783 and 784, I think I was wrong to allow logical RelNodes
> > > >> (LogicalProject and LogicalAggregate) to have collations. Because
> they
> > > are
> > > >> logical, they are inherently un-sorted. (But they may be based on a
> > > table,
> > > >> say an ArrayTable, that does have a sort order.)
> > > >> * In 793, Maryann was right so say that we should not bake in the
> > > >> collation that a plan *happens to have* when the SQL is first
> > > translated,
> > > >> because trying to find a physical plan with the same collation
> > restricts
> > > >> our options.
> > > >>
> > > >> But SQL ASTs should have a sort order (if the top node is an ORDER
> BY
> > > >> clause, or if a table referenced in the FROM clause is a stream) and
> > > >> physical RelNodes should also have a sort order.
> > > >>
> > > >> And Milinda’s logical plans need a concept similar to sorting.
> Maybe a
> > > >> piece of metadata that this RelNode *could be sorted by X, Y if
> > > desired*.
> > > >> Any table can, of course, be re-sorted into any order you like, but
> a
> > > >> stream, which is infinite, can only be re-sorted to an order that
> does
> > > not
> > > >> conflict with the order of the incoming data.
> > > >>
> > > >> I still need to roll up my sleeves and help these patient developers
> > > >> (especially Milinda) get something working, but I hope it helps to
> > have
> > > a
> > > >> general direction.
> > > >>
> > > >> Julian
> > > >>
> > > >> * https://issues.apache.org/jira/browse/CALCITE-783 Infer collation
> > of
> > > >> Project using monotonicity
> > > >> * https://issues.apache.org/jira/browse/CALCITE-784
> > LogicalAggregate's
> > > >> create method discards any collation traits from input
> > > >> * https://issues.apache.org/jira/browse/CALCITE-793 The compiler
> asks
> > > >> for unnecessary collation trait on plan with materialized view
> > > >> * https://issues.apache.org/jira/browse/CALCITE-825 Allow user to
> > > >> specify sort order of an ArrayTable
> > > >>
> > > >
> > > >
> > >
> > >
> > > --
> > > Milinda Pathirage
> > >
> > > PhD Student | Research Assistant
> > > School of Informatics and Computing | Data to Insight Center
> > > Indiana University
> > >
> > > twitter: milindalakmal
> > > skype: milinda.pathirage
> > > blog: http://milinda.pathirage.org
> > >
> >
>

Reply via email to