Re: Triggering emits for streaming window aggregates

Julian Hyde Fri, 26 Jun 2015 15:15:12 -0700

RelNodes know their sort order: call
rel.getTraitSet().getTraits(RelCollationTraitDef.INSTANCE). That said,
in non-streaming planning we start with unsorted relational
expressions and deduce sort order later if it is beneficial (e.g. if
it allows a sort-merge join) and it seems that we're currently doing
that for streaming planning.


To see what I'm talking about, run StreamTest.testStreamGroupByHaving
and put a break point in LogicalAggregate's constructor. It's traitSet
contains only the empty collation. We need to fix that.

Julian


On Fri, Jun 26, 2015 at 11:40 AM, Milinda Pathirage
<[email protected]> wrote:
> Hi Yi,
>
> In this specific case ordering is declared in the schema. Quoting from
> Calcite documentation
>
> ~~~~~~~~~~~~~~~~~~~~
> Monotonic columns need to be declared in the schema. The monotonicity is
> enforced when records enter the stream and assumed by queries that read
> from that stream. We recommend that you give each stream a timestamp column
> called rowtime, but you can declare others, orderId, for example.
> ~~~~~~~~~~~~~~~~~~~~
>
>
>
> If we can propagate this ordering information to LogicalAggregate then we
> can easily handle this. As I understand required information is accessible
> to Calcite query planner. But in our case we need this information after we
> get the query plan from Calcite. AFAIK, current API doesn't provide a way
> to get this information in scenarios like above where ORDER BY is not
> specified in the query (I am not 100% sure about ORDER BY case too. I need
> to have a look at a query plan generated for a query with ORDER BY).
>
> Thanks
> Milinda
>
> On Fri, Jun 26, 2015 at 2:30 PM, Yi Pan <[email protected]> wrote:
>
>> Hi, Milinda,
>>
>> I thought that in your example, the ordering field is given in GROUP BY.
>> Are we missing a way to pass the ordering field(s) to the LogicalAggregate?
>>
>> -Yi
>>
>> On Fri, Jun 26, 2015 at 10:49 AM, Milinda Pathirage <[email protected]
>> >
>> wrote:
>>
>> > Hi Julian,
>> >
>> > Even though this is a general question across all the streaming
>> aggregates
>> > which utilize GROUP BY clause and a monotonic timestamp field for
>> > specifying the window, but I am going to stick to most basic example
>> (which
>> > is from Calcite Streaming document).
>> >
>> > SELECT STREAM FLOOR(rowtime TO HOUR) AS rowtime,
>> >   productId,
>> >   COUNT(*) AS c,
>> >   SUM(units) AS units
>> > FROM Orders
>> > GROUP BY FLOOR(rowtime TO HOUR), productId;
>> >
>> > I was trying to implement an aggregate operator which handles tumbling
>> > windows via the monotonic field in GROUP By clause in addition to the
>> > general aggregations. I went in this path because I thought integrating
>> > windowing aspects (at least for tumbling and hopping) into aggregate
>> > operator will be easier than trying to extract the window spec from the
>> > query plan for a query like above. But I hit a wall when trying to figure
>> > out trigger condition for emitting aggregate results. I was initially
>> > planning to detect new values for FLOOR(rowtime TO HOUR) and emit current
>> > aggregate results for previous groups (I was thinking to keep old groups
>> > around until we clean them up after a timeout). But when trying to
>> > implement this I figured out that I don’t know how to check which GROUP
>> BY
>> > field is monotonic so that I only detect new values for the monotonic
>> > field/fields, not for the all the other fields. I think this is not a
>> > problem for tables because we have the whole input before computation and
>> > we wait till we are done with the input before emitting the results.
>> >
>> > With regards to above can you please clarify following things:
>> >
>> > - Is the method I described above for handling streaming aggregates make
>> > sense at all?
>> > - Is there a way that I can figure out which fields/expressions in
>> > LogicalAggregate are monotonic?
>> > - Or can we write a rule to annotate or add extra metadata to
>> > LogicalAggregate so that we can get monotonic fields in the GROUP By
>> clause
>> >
>> > Thanks in advance
>> > Milinda
>> >
>> >
>> > --
>> > Milinda Pathirage
>> >
>> > PhD Student | Research Assistant
>> > School of Informatics and Computing | Data to Insight Center
>> > Indiana University
>> >
>> > twitter: milindalakmal
>> > skype: milinda.pathirage
>> > blog: http://milinda.pathirage.org
>> >
>>
>
>
>
> --
> Milinda Pathirage
>
> PhD Student | Research Assistant
> School of Informatics and Computing | Data to Insight Center
> Indiana University
>
> twitter: milindalakmal
> skype: milinda.pathirage
> blog: http://milinda.pathirage.org

Re: Triggering emits for streaming window aggregates

Reply via email to