Fabian,

Apologies for the late reply.

I would rather that the specification for streaming SQL was not too 
prescriptive for how late events were handled. Approaches 1, 2 and 3 are all 
viable, and engines can differentiate themselves by the strength of their 
support for this. But for the SQL to be considered valid, I think the validator 
just needs to know that it can make progress.

There is a large area of functionality I’d call “quality of service” (QoS). 
This includes latency, reliability, at-least-once, at-most-once or in-order 
guarantees, as well as the late-row-handling this thread is concerned with. 
What the QoS metrics have in common is that they are end-to-end. To deliver a 
high QoS to the consumer, the producer needs to conform to a high QoS. The QoS 
is beyond the control of the SQL statement. (Although you can ask what a SQL 
statement is able to deliver, given the upstream QoS guarantees.) QoS is best 
managed by the whole system, and in my opinion this is the biggest reason to 
have a DSMS.

For this reason, I would be inclined to put QoS constraints on the stream 
definition, not on the query. For example, taking latency as the QoS metric of 
interest, you could flag the Orders stream as “at most 10 ms latency between 
the record’s timestamp and the wall-clock time of the server receiving the 
records, and any records arriving after that time are logged and discarded”, 
and that QoS constraint applies to both producers and consumers.

Given a query Q ‘select stream * from Orders’, it is valid to ask “what is the 
expected latency of Q?” or tell the planner “produce an implementation of Q 
with a latency of no more than 15 ms, and if you cannot achieve that latency, 
fail”. You could even register Q in the system and tell the system to tighten 
up the latency of any upstream streams and the standing queries that populate 
them. But it’s not valid to say “execute Q with a latency of 15 ms”: the system 
may not be able to achieve it.

In summary: I would allow latency and late-row-handling and other QoS 
annotations in the query but it’s not the most natural or powerful place to put 
them.

Julian


> On Feb 6, 2016, at 1:28 AM, Fabian Hueske <[email protected]> wrote:
> 
> Excellent! I missed the punctuations in the todo list.
> 
> What kind of strategies do you have in mind to handle events that arrive
> too late? I see
> 1. dropping of late events
> 2. computing an updated window result for each late arriving
> element (implies that the window state is stored for a certain period
> before it is discarded)
> 3. computing a delta to the previous window result for each late arriving
> element (requires window state as well, not applicable to all aggregation
> types)
> 
> It would be nice if strategies to handle late-arrivers could be defined in
> the query.
> 
> I think the plans of the Flink community are quite well aligned with your
> ideas for SQL on Streams.
> Should we start by updating / extending the Stream document on the Calcite
> website to include the new window definitions (TUMBLE, HOP) and a
> discussion of punctuations/watermarks/time bounds?
> 
> Fabian
> 
> 
> 
> 
> 
> 
> 2016-02-06 2:35 GMT+01:00 Julian Hyde <[email protected]>:
> 
>> Let me rephrase: The *majority* of the literature, of which I cited
>> just one example, calls them punctuation, and a couple of recent
>> papers out of Mountain View doesn't change that.
>> 
>> There are some fine distinctions between punctuation, heartbeats,
>> watermarks and rowtime bounds, mostly in terms of how they are
>> generated and propagated, that matter little when planning the query.
>> 
>> On Fri, Feb 5, 2016 at 5:18 PM, Ted Dunning <[email protected]> wrote:
>>> On Fri, Feb 5, 2016 at 5:10 PM, Julian Hyde <[email protected]> wrote:
>>> 
>>>> Yes, watermarks, absolutely. The "to do" list has "punctuation", which
>>>> is the same thing. (Actually, I prefer to call it "rowtime bound"
>>>> because it is feels more like a dynamic constraint than a piece of
>>>> data, but the literature[1] calls them punctuation.)
>>>> 
>>> 
>>> Some of the literature calls them punctuation, other literature [1] calls
>>> them watermarks.
>>> 
>>> [1] http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf
>> 

Reply via email to