RelNodes know their sort order: call rel.getTraitSet().getTraits(RelCollationTraitDef.INSTANCE). That said, in non-streaming planning we start with unsorted relational expressions and deduce sort order later if it is beneficial (e.g. if it allows a sort-merge join) and it seems that we're currently doing that for streaming planning.
To see what I'm talking about, run StreamTest.testStreamGroupByHaving and put a break point in LogicalAggregate's constructor. It's traitSet contains only the empty collation. We need to fix that. Julian On Fri, Jun 26, 2015 at 11:40 AM, Milinda Pathirage <[email protected]> wrote: > Hi Yi, > > In this specific case ordering is declared in the schema. Quoting from > Calcite documentation > > ~~~~~~~~~~~~~~~~~~~~ > Monotonic columns need to be declared in the schema. The monotonicity is > enforced when records enter the stream and assumed by queries that read > from that stream. We recommend that you give each stream a timestamp column > called rowtime, but you can declare others, orderId, for example. > ~~~~~~~~~~~~~~~~~~~~ > > > > If we can propagate this ordering information to LogicalAggregate then we > can easily handle this. As I understand required information is accessible > to Calcite query planner. But in our case we need this information after we > get the query plan from Calcite. AFAIK, current API doesn't provide a way > to get this information in scenarios like above where ORDER BY is not > specified in the query (I am not 100% sure about ORDER BY case too. I need > to have a look at a query plan generated for a query with ORDER BY). > > Thanks > Milinda > > On Fri, Jun 26, 2015 at 2:30 PM, Yi Pan <[email protected]> wrote: > >> Hi, Milinda, >> >> I thought that in your example, the ordering field is given in GROUP BY. >> Are we missing a way to pass the ordering field(s) to the LogicalAggregate? >> >> -Yi >> >> On Fri, Jun 26, 2015 at 10:49 AM, Milinda Pathirage <[email protected] >> > >> wrote: >> >> > Hi Julian, >> > >> > Even though this is a general question across all the streaming >> aggregates >> > which utilize GROUP BY clause and a monotonic timestamp field for >> > specifying the window, but I am going to stick to most basic example >> (which >> > is from Calcite Streaming document). >> > >> > SELECT STREAM FLOOR(rowtime TO HOUR) AS rowtime, >> > productId, >> > COUNT(*) AS c, >> > SUM(units) AS units >> > FROM Orders >> > GROUP BY FLOOR(rowtime TO HOUR), productId; >> > >> > I was trying to implement an aggregate operator which handles tumbling >> > windows via the monotonic field in GROUP By clause in addition to the >> > general aggregations. I went in this path because I thought integrating >> > windowing aspects (at least for tumbling and hopping) into aggregate >> > operator will be easier than trying to extract the window spec from the >> > query plan for a query like above. But I hit a wall when trying to figure >> > out trigger condition for emitting aggregate results. I was initially >> > planning to detect new values for FLOOR(rowtime TO HOUR) and emit current >> > aggregate results for previous groups (I was thinking to keep old groups >> > around until we clean them up after a timeout). But when trying to >> > implement this I figured out that I don’t know how to check which GROUP >> BY >> > field is monotonic so that I only detect new values for the monotonic >> > field/fields, not for the all the other fields. I think this is not a >> > problem for tables because we have the whole input before computation and >> > we wait till we are done with the input before emitting the results. >> > >> > With regards to above can you please clarify following things: >> > >> > - Is the method I described above for handling streaming aggregates make >> > sense at all? >> > - Is there a way that I can figure out which fields/expressions in >> > LogicalAggregate are monotonic? >> > - Or can we write a rule to annotate or add extra metadata to >> > LogicalAggregate so that we can get monotonic fields in the GROUP By >> clause >> > >> > Thanks in advance >> > Milinda >> > >> > >> > -- >> > Milinda Pathirage >> > >> > PhD Student | Research Assistant >> > School of Informatics and Computing | Data to Insight Center >> > Indiana University >> > >> > twitter: milindalakmal >> > skype: milinda.pathirage >> > blog: http://milinda.pathirage.org >> > >> > > > > -- > Milinda Pathirage > > PhD Student | Research Assistant > School of Informatics and Computing | Data to Insight Center > Indiana University > > twitter: milindalakmal > skype: milinda.pathirage > blog: http://milinda.pathirage.org
