Re: Collation meets relational algebra

Milinda Pathirage Fri, 31 Jul 2015 07:27:57 -0700

Thanks Julian for looking in to this. Thanks Maryann for detecting the
issue in CALCITE-783 patch.


As I understand we only need input's (input to aggregate) order related
metadata at the level of aggregate. I think I was wrong saying that
LogicalAggregate discards collation metadata from input in CALCITE-784
given that input is accessible from LogicalAggregate. We will only need to
do some calculations on input's collation metadata (or something similar)
if we need to infer something about LogicalAggregate to be use by operators
which take aggregate as an input.

Thanks
Milinda

On Thu, Jul 30, 2015 at 11:32 PM, Maryann Xue <[email protected]> wrote:

> Thanks Julian for taking time to sort out all these requirements and
> rethink about the model!
> Thank you Milinda! Really appreciate your quick response to the issue.
>
> On Thu, Jul 30, 2015 at 4:57 PM, Julian Hyde <[email protected]> wrote:
>
>> There are a few issues in play regarding collations (783, 784, 793; see
>> links below) and they seem to be overlapping. Maryann and Milinda have been
>> at odds with each other (in the politest possible way!)
>>
>> The cause is that they are both doing very interesting new work using
>> collation:
>> * Maryann is optimizing Phoenix plans to use secondary indexes. These are
>> tables that are project-sort materializations of a base table, itself
>> sorted.
>> * Milinda is planning Samza streaming-aggregation queries. A plan can
>> only be found if you know that the stream is sorted on one of the
>> aggregation keys, usually a time column.
>>
>> I spoke with Maryann about this today. I think that logical plans should
>> not have a sort order:
>> * In 783 and 784, I think I was wrong to allow logical RelNodes
>> (LogicalProject and LogicalAggregate) to have collations. Because they are
>> logical, they are inherently un-sorted. (But they may be based on a table,
>> say an ArrayTable, that does have a sort order.)
>> * In 793, Maryann was right so say that we should not bake in the
>> collation that a plan *happens to have* when the SQL is first translated,
>> because trying to find a physical plan with the same collation restricts
>> our options.
>>
>> But SQL ASTs should have a sort order (if the top node is an ORDER BY
>> clause, or if a table referenced in the FROM clause is a stream) and
>> physical RelNodes should also have a sort order.
>>
>> And Milinda’s logical plans need a concept similar to sorting. Maybe a
>> piece of metadata that this RelNode *could be sorted by X, Y if desired*.
>> Any table can, of course, be re-sorted into any order you like, but a
>> stream, which is infinite, can only be re-sorted to an order that does not
>> conflict with the order of the incoming data.
>>
>> I still need to roll up my sleeves and help these patient developers
>> (especially Milinda) get something working, but I hope it helps to have a
>> general direction.
>>
>> Julian
>>
>> * https://issues.apache.org/jira/browse/CALCITE-783 Infer collation of
>> Project using monotonicity
>> * https://issues.apache.org/jira/browse/CALCITE-784 LogicalAggregate's
>> create method discards any collation traits from input
>> * https://issues.apache.org/jira/browse/CALCITE-793 The compiler asks
>> for unnecessary collation trait on plan with materialized view
>> * https://issues.apache.org/jira/browse/CALCITE-825 Allow user to
>> specify sort order of an ArrayTable
>>
>
>


-- 
Milinda Pathirage

PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org

Re: Collation meets relational algebra

Reply via email to