Thanks Julian. I'll give it a try.

Milinda

On Fri, Jun 26, 2015 at 6:14 PM, Julian Hyde <[email protected]> wrote:

> RelNodes know their sort order: call
> rel.getTraitSet().getTraits(RelCollationTraitDef.INSTANCE). That said,
> in non-streaming planning we start with unsorted relational
> expressions and deduce sort order later if it is beneficial (e.g. if
> it allows a sort-merge join) and it seems that we're currently doing
> that for streaming planning.
>
> To see what I'm talking about, run StreamTest.testStreamGroupByHaving
> and put a break point in LogicalAggregate's constructor. It's traitSet
> contains only the empty collation. We need to fix that.
>
> Julian
>
>
> On Fri, Jun 26, 2015 at 11:40 AM, Milinda Pathirage
> <[email protected]> wrote:
> > Hi Yi,
> >
> > In this specific case ordering is declared in the schema. Quoting from
> > Calcite documentation
> >
> > ~~~~~~~~~~~~~~~~~~~~
> > Monotonic columns need to be declared in the schema. The monotonicity is
> > enforced when records enter the stream and assumed by queries that read
> > from that stream. We recommend that you give each stream a timestamp
> column
> > called rowtime, but you can declare others, orderId, for example.
> > ~~~~~~~~~~~~~~~~~~~~
> >
> >
> >
> > If we can propagate this ordering information to LogicalAggregate then we
> > can easily handle this. As I understand required information is
> accessible
> > to Calcite query planner. But in our case we need this information after
> we
> > get the query plan from Calcite. AFAIK, current API doesn't provide a way
> > to get this information in scenarios like above where ORDER BY is not
> > specified in the query (I am not 100% sure about ORDER BY case too. I
> need
> > to have a look at a query plan generated for a query with ORDER BY).
> >
> > Thanks
> > Milinda
> >
> > On Fri, Jun 26, 2015 at 2:30 PM, Yi Pan <[email protected]> wrote:
> >
> >> Hi, Milinda,
> >>
> >> I thought that in your example, the ordering field is given in GROUP BY.
> >> Are we missing a way to pass the ordering field(s) to the
> LogicalAggregate?
> >>
> >> -Yi
> >>
> >> On Fri, Jun 26, 2015 at 10:49 AM, Milinda Pathirage <
> [email protected]
> >> >
> >> wrote:
> >>
> >> > Hi Julian,
> >> >
> >> > Even though this is a general question across all the streaming
> >> aggregates
> >> > which utilize GROUP BY clause and a monotonic timestamp field for
> >> > specifying the window, but I am going to stick to most basic example
> >> (which
> >> > is from Calcite Streaming document).
> >> >
> >> > SELECT STREAM FLOOR(rowtime TO HOUR) AS rowtime,
> >> >   productId,
> >> >   COUNT(*) AS c,
> >> >   SUM(units) AS units
> >> > FROM Orders
> >> > GROUP BY FLOOR(rowtime TO HOUR), productId;
> >> >
> >> > I was trying to implement an aggregate operator which handles tumbling
> >> > windows via the monotonic field in GROUP By clause in addition to the
> >> > general aggregations. I went in this path because I thought
> integrating
> >> > windowing aspects (at least for tumbling and hopping) into aggregate
> >> > operator will be easier than trying to extract the window spec from
> the
> >> > query plan for a query like above. But I hit a wall when trying to
> figure
> >> > out trigger condition for emitting aggregate results. I was initially
> >> > planning to detect new values for FLOOR(rowtime TO HOUR) and emit
> current
> >> > aggregate results for previous groups (I was thinking to keep old
> groups
> >> > around until we clean them up after a timeout). But when trying to
> >> > implement this I figured out that I don’t know how to check which
> GROUP
> >> BY
> >> > field is monotonic so that I only detect new values for the monotonic
> >> > field/fields, not for the all the other fields. I think this is not a
> >> > problem for tables because we have the whole input before computation
> and
> >> > we wait till we are done with the input before emitting the results.
> >> >
> >> > With regards to above can you please clarify following things:
> >> >
> >> > - Is the method I described above for handling streaming aggregates
> make
> >> > sense at all?
> >> > - Is there a way that I can figure out which fields/expressions in
> >> > LogicalAggregate are monotonic?
> >> > - Or can we write a rule to annotate or add extra metadata to
> >> > LogicalAggregate so that we can get monotonic fields in the GROUP By
> >> clause
> >> >
> >> > Thanks in advance
> >> > Milinda
> >> >
> >> >
> >> > --
> >> > Milinda Pathirage
> >> >
> >> > PhD Student | Research Assistant
> >> > School of Informatics and Computing | Data to Insight Center
> >> > Indiana University
> >> >
> >> > twitter: milindalakmal
> >> > skype: milinda.pathirage
> >> > blog: http://milinda.pathirage.org
> >> >
> >>
> >
> >
> >
> > --
> > Milinda Pathirage
> >
> > PhD Student | Research Assistant
> > School of Informatics and Computing | Data to Insight Center
> > Indiana University
> >
> > twitter: milindalakmal
> > skype: milinda.pathirage
> > blog: http://milinda.pathirage.org
>



-- 
Milinda Pathirage

PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org

Reply via email to