Re: Find Monotonic Column in GROUP BY

Julian Hyde Wed, 23 Nov 2016 09:16:14 -0800

Use collations, which are a kind of metadata:

  RelNode r;
  RelMetadataQuery mq = RelMetadataQuery.instance();
  List<RelCollation> collations = mq.collations(r);

This example creates a RelMetadataQuery instance, but a RelMetadataQuery 
instance is expensive to create, and contains data structures that cache 
intermediate results and prevent cycles.  So, if you already have a 
RelMetadataQuery instance (e.g. if you are implementing a metadata method) then 
use it rather than creating a new one.

There are lots of other kinds of metadata, including lots of statistics. The 
methods on RelMetadataQuery[1] give you an idea of the built-in metadata, and 
you can also add your own metadata types.

Two things make “collations” of streams more complex:

1. It is the validator that determines whether a SQL query is valid. It works 
on the SqlNode tree, and information available from the catalog, before the 
first RelNode is created. The implication of this is that the monotonicity 
available to the validator is different (though hopefully not too different).

2. At present, we validate based on “is sorted”. In future, to deal with the 
variety of streaming systems, and even hybrid problems like continuous ETL, we 
will want to validate based on “could be sorted”. For example, if your orderId 
is allocated from parallel sequence generators that are never more than 5 
minutes apart, then someone could say “group by floor(orderId / 1000)” if they 
are prepared for their query to have a 5 minute latency.

These areas both need some work over the next months.

Julian

[1] 
https://calcite.apache.org/apidocs/org/apache/calcite/rel/metadata/RelMetadataQuery.html

<https://calcite.apache.org/apidocs/org/apache/calcite/rel/metadata/RelMetadataQuery.html>

> On Nov 23, 2016, at 4:04 AM, Chinmay Kolhatkar <[email protected]> wrote:
> 
> Dear Community,
> 
> I'm trying to add support for GROUP BY clause in Apache Apex-Calcite
> integration.
> 
> I am assuming that calcite knows which is the monotonic column because
> query fails to parse if there is no monotonic column present in group set.
> 
> Is there any way to find out which is the monotonic column in the GROUP BY
> clause from Aggregate/LogicalAggregate object?
> 
> Thanks,
> Chinmay.

Re: Find Monotonic Column in GROUP BY

Reply via email to