Re: Questions about Stream SQL plan

Cody Innowhere Tue, 14 Jun 2016 01:17:31 -0700

Thanks Fabian, this clears my confusion.

On Tue, Jun 14, 2016 at 3:21 PM, Fabian Hueske <[email protected]> wrote:


> Hi Cody,
>
> the monotone (or quasi-monotone) attribute required for grouping a stream
> in Calcite is a generalization of the timestamp/watermark concept in Flink.
> The timestamps in Flink are quasi-monotone, i.e., they are increasing but
> might be slightly out of order. This out-of-orderness is controlled by the
> watermarks.
> In order to compute consistent results using event-time you need a
> timestamp in Flink as well.
>
> Since Calcite envisions a StreamSQL that is fully compatible and integrated
> with SQL on static data sets, it generalizes the timestamp concept to
> (quasi-)monotone attributes, i.e., you could also compute a windowed query
> over a static table that is sorted on an arbitrary field.
>
> The keyBy attribute in Flink's DataStream API refers to an additional
> attribute on which is grouped. If you have the query:
>
> SELECT STREAM TUMBLE_END(rowtime, INTERVAL '1' HOUR) AS rowtime,
>   productId,
>   COUNT(*) AS c,
>   SUM(units) AS unitsFROM OrdersGROUP BY TUMBLE(rowtime, INTERVAL '1'
> HOUR), productId;
>
> The grouping would be expressed in Flink as
>
> stream.keyBy("productId").timeWindow(Time.hours(1)).apply(...)
>
> Here, productId is the partitioning key, and rowTime is the implicit
> timestamp in Flink (timestamps are metadata in Flink but actual
> columns in Calcite's model).
>
> I hope that clarifies the relationship of the monotone attributes and
> Flink's timestamp / watermark concept.
>
> Best, Fabian
>
>
>
>
> 2016-06-14 4:43 GMT+02:00 Cody Innowhere <[email protected]>:
>
> > Hi guys,
> > I went through Stream SQL doc on calcite website and have a little
> question
> > about grouping. calcite's grouping requires that a table column must be
> > monotonic or quasi-monotonic while in real world cases we don't
> necessarily
> > have such fields in streams, unless we use a virtual field, say, the emit
> > timestamp of each stream msg. A similar case would be flink stream API,
> it
> > has a KeyedStream, which is kind of groupBy, but it does not actually
> > require the keyed field to be monotonic. So in such cases, how do you
> plan
> > to implement this?
> >
> > Also I noticed that calcite 1.8.0 has been released, there seems to be no
> > updates regarding Stream SQL in this release, do you have a plan or
> roadmap
> > on Stream SQL?
> >
> > Thanks~
> >
>

Re: Questions about Stream SQL plan

Reply via email to