There are a few issues in play regarding collations (783, 784, 793; see links 
below) and they seem to be overlapping. Maryann and Milinda have been at odds 
with each other (in the politest possible way!)

The cause is that they are both doing very interesting new work using collation:
* Maryann is optimizing Phoenix plans to use secondary indexes. These are 
tables that are project-sort materializations of a base table, itself sorted.
* Milinda is planning Samza streaming-aggregation queries. A plan can only be 
found if you know that the stream is sorted on one of the aggregation keys, 
usually a time column.

I spoke with Maryann about this today. I think that logical plans should not 
have a sort order:
* In 783 and 784, I think I was wrong to allow logical RelNodes (LogicalProject 
and LogicalAggregate) to have collations. Because they are logical, they are 
inherently un-sorted. (But they may be based on a table, say an ArrayTable, 
that does have a sort order.)
* In 793, Maryann was right so say that we should not bake in the collation 
that a plan *happens to have* when the SQL is first translated, because trying 
to find a physical plan with the same collation restricts our options.

But SQL ASTs should have a sort order (if the top node is an ORDER BY clause, 
or if a table referenced in the FROM clause is a stream) and physical RelNodes 
should also have a sort order.

And Milinda’s logical plans need a concept similar to sorting. Maybe a piece of 
metadata that this RelNode *could be sorted by X, Y if desired*. Any table can, 
of course, be re-sorted into any order you like, but a stream, which is 
infinite, can only be re-sorted to an order that does not conflict with the 
order of the incoming data.

I still need to roll up my sleeves and help these patient developers 
(especially Milinda) get something working, but I hope it helps to have a 
general direction.

Julian

* https://issues.apache.org/jira/browse/CALCITE-783 
<https://issues.apache.org/jira/browse/CALCITE-783> Infer collation of Project 
using monotonicity
* https://issues.apache.org/jira/browse/CALCITE-784 
<https://issues.apache.org/jira/browse/CALCITE-784> LogicalAggregate's create 
method discards any collation traits from input
* https://issues.apache.org/jira/browse/CALCITE-793 
<https://issues.apache.org/jira/browse/CALCITE-793> The compiler asks for 
unnecessary collation trait on plan with materialized view
* https://issues.apache.org/jira/browse/CALCITE-825 
<https://issues.apache.org/jira/browse/CALCITE-825> Allow user to specify sort 
order of an ArrayTable

Reply via email to