Thanks Julian. I'll give it a try. Milinda
On Fri, Jun 26, 2015 at 6:14 PM, Julian Hyde <[email protected]> wrote: > RelNodes know their sort order: call > rel.getTraitSet().getTraits(RelCollationTraitDef.INSTANCE). That said, > in non-streaming planning we start with unsorted relational > expressions and deduce sort order later if it is beneficial (e.g. if > it allows a sort-merge join) and it seems that we're currently doing > that for streaming planning. > > To see what I'm talking about, run StreamTest.testStreamGroupByHaving > and put a break point in LogicalAggregate's constructor. It's traitSet > contains only the empty collation. We need to fix that. > > Julian > > > On Fri, Jun 26, 2015 at 11:40 AM, Milinda Pathirage > <[email protected]> wrote: > > Hi Yi, > > > > In this specific case ordering is declared in the schema. Quoting from > > Calcite documentation > > > > ~~~~~~~~~~~~~~~~~~~~ > > Monotonic columns need to be declared in the schema. The monotonicity is > > enforced when records enter the stream and assumed by queries that read > > from that stream. We recommend that you give each stream a timestamp > column > > called rowtime, but you can declare others, orderId, for example. > > ~~~~~~~~~~~~~~~~~~~~ > > > > > > > > If we can propagate this ordering information to LogicalAggregate then we > > can easily handle this. As I understand required information is > accessible > > to Calcite query planner. But in our case we need this information after > we > > get the query plan from Calcite. AFAIK, current API doesn't provide a way > > to get this information in scenarios like above where ORDER BY is not > > specified in the query (I am not 100% sure about ORDER BY case too. I > need > > to have a look at a query plan generated for a query with ORDER BY). > > > > Thanks > > Milinda > > > > On Fri, Jun 26, 2015 at 2:30 PM, Yi Pan <[email protected]> wrote: > > > >> Hi, Milinda, > >> > >> I thought that in your example, the ordering field is given in GROUP BY. > >> Are we missing a way to pass the ordering field(s) to the > LogicalAggregate? > >> > >> -Yi > >> > >> On Fri, Jun 26, 2015 at 10:49 AM, Milinda Pathirage < > [email protected] > >> > > >> wrote: > >> > >> > Hi Julian, > >> > > >> > Even though this is a general question across all the streaming > >> aggregates > >> > which utilize GROUP BY clause and a monotonic timestamp field for > >> > specifying the window, but I am going to stick to most basic example > >> (which > >> > is from Calcite Streaming document). > >> > > >> > SELECT STREAM FLOOR(rowtime TO HOUR) AS rowtime, > >> > productId, > >> > COUNT(*) AS c, > >> > SUM(units) AS units > >> > FROM Orders > >> > GROUP BY FLOOR(rowtime TO HOUR), productId; > >> > > >> > I was trying to implement an aggregate operator which handles tumbling > >> > windows via the monotonic field in GROUP By clause in addition to the > >> > general aggregations. I went in this path because I thought > integrating > >> > windowing aspects (at least for tumbling and hopping) into aggregate > >> > operator will be easier than trying to extract the window spec from > the > >> > query plan for a query like above. But I hit a wall when trying to > figure > >> > out trigger condition for emitting aggregate results. I was initially > >> > planning to detect new values for FLOOR(rowtime TO HOUR) and emit > current > >> > aggregate results for previous groups (I was thinking to keep old > groups > >> > around until we clean them up after a timeout). But when trying to > >> > implement this I figured out that I don’t know how to check which > GROUP > >> BY > >> > field is monotonic so that I only detect new values for the monotonic > >> > field/fields, not for the all the other fields. I think this is not a > >> > problem for tables because we have the whole input before computation > and > >> > we wait till we are done with the input before emitting the results. > >> > > >> > With regards to above can you please clarify following things: > >> > > >> > - Is the method I described above for handling streaming aggregates > make > >> > sense at all? > >> > - Is there a way that I can figure out which fields/expressions in > >> > LogicalAggregate are monotonic? > >> > - Or can we write a rule to annotate or add extra metadata to > >> > LogicalAggregate so that we can get monotonic fields in the GROUP By > >> clause > >> > > >> > Thanks in advance > >> > Milinda > >> > > >> > > >> > -- > >> > Milinda Pathirage > >> > > >> > PhD Student | Research Assistant > >> > School of Informatics and Computing | Data to Insight Center > >> > Indiana University > >> > > >> > twitter: milindalakmal > >> > skype: milinda.pathirage > >> > blog: http://milinda.pathirage.org > >> > > >> > > > > > > > > -- > > Milinda Pathirage > > > > PhD Student | Research Assistant > > School of Informatics and Computing | Data to Insight Center > > Indiana University > > > > twitter: milindalakmal > > skype: milinda.pathirage > > blog: http://milinda.pathirage.org > -- Milinda Pathirage PhD Student | Research Assistant School of Informatics and Computing | Data to Insight Center Indiana University twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org
