Yep I think you got the idea.

>Therefore to execute this query you need an unbounded memory for
>`order_item` and `product` relations, so it's not a good candidate for a
>streaming query, but let's put that aside, I'm interested in the semantics.

The watermark idea is used to give a hint when data is complete, thus
future data can be considered as late data. "EMIT STREAM" keeps a room to
specify some strategies to handle late data (e.g. a syntax to indicate
dropping all late data). Spark, Flink Beam all offer similar mechanisms to
control the memory cost for unbounded dataset.


-Rui

On Mon, Mar 23, 2020 at 8:41 AM Viliam Durina <[email protected]> wrote:

> I've read again parts of the paper and now I've finally got the idea, I
> hope. The semantics still works with plain relations, relational operators
> are applied as in instantaneous queries, just the resulting relation is
> "streamed", in case of EMIT STREAM clause in the form of records and
> retractions.
>
> So I'll try to make up an example for the query, let's repeat it for
> clarity:
>
>   CREATE TABLE order_item (product_id INT, amount INT, ...);
>   CREATE TABLE product (product_id INT PRIMARY KEY, name VARCHAR);
>
>   SELECT *
>   FROM order_item o
>     JOIN product p USING(product_id)
>   EMIT STREAM;
>
> When we execute the above query:
> - until any change occurs in either table, there's no output
> - if a new order_item is inserted:
>   -> a joined record is emitted with the order item and matching product
> - if a product name is updated
>   -> for every matching order_item a retraction and a new record is emitted
>
> Therefore to execute this query you need an unbounded memory for
> `order_item` and `product` relations, so it's not a good candidate for a
> streaming query, but let's put that aside, I'm interested in the semantics.
>
> Did I put the example correctly?
>
> Viliam
>
> On Sat, 21 Mar 2020 at 01:00, Julian Hyde <[email protected]> wrote:
>
> > Our thinking in the "One SQL to rule them all" paper [1] is that there
> > are not "streams" and "tables". Both product and order_items are
> > time-varying relations (TVFs).
> >
> > Whether it is a streaming query is determined by whether you specify
> > "EMIT STREAM" in the query, not by what objects are referenced in the
> > query.
> >
> > (There is a strong analogy between streaming queries and the
> > differentiation operation in differential calculus. Consider the
> > product rule in calculus: (uv)' = u'v + u.v'. If you want to compute
> > the join of two time-varying relations istream(u join v) = (istream(u)
> > join v) union (u join istream(v)). So you see that we are using the
> > 'stream' of each side. I find this symmetric treatment of all TVRs to
> > be compelling.)
> >
> > Julian
> >
> > [1] https://arxiv.org/pdf/1905.12133.pdf
> >
> > On Fri, Mar 20, 2020 at 3:17 PM Viliam Durina <[email protected]>
> > wrote:
> > >
> > > > Does it matter which table is a steam? if the "STREAM" query runs
> > > > continuously, the output (table) from the query is a stream, and
> likely
> > > > this stream gives you delta updates periodically.
> > >
> > >
> > > In my understanding, it does. If both tables are a stream, you get a
> > change
> > > stream from both. You're joining two change streams. So if there's a
> > change
> > > in product name, a change event will occur and the change event should
> be
> > > joined to all previous (and future) change events on order_items
> matching
> > > that product. Similarly if there's a new order_item, it should be
> joined
> > > with all previous change events on the matching product.
> > >
> > > The paper doesn't discuss queries with joins at all. But it's unclear
> to
> > me
> > > how it's supposed to work. Maybe if you could give an example for the
> > above
> > > query and what happens when there's a change in order_item and when in
> > > product.
> > >
> > > Viliam
> > >
> > > --
> > > This message contains confidential information and is intended only for
> > the
> > > individuals named. If you are not the named addressee you should not
> > > disseminate, distribute or copy this e-mail. Please notify the sender
> > > immediately by e-mail if you have received this e-mail by mistake and
> > > delete this e-mail from your system. E-mail transmission cannot be
> > > guaranteed to be secure or error-free as information could be
> > intercepted,
> > > corrupted, lost, destroyed, arrive late or incomplete, or contain
> > viruses.
> > > The sender therefore does not accept liability for any errors or
> > omissions
> > > in the contents of this message, which arise as a result of e-mail
> > > transmission. If verification is required, please request a hard-copy
> > > version. -Hazelcast
> >
>
>
> --
> Viliam Durina
> Jet Developer
>       hazelcastĀ®
>
>   <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA 94402 |
> USA
> +1 (650) 521-5453 <(650)%20521-5453> | hazelcast.com <
> https://www.hazelcast.com>
>
> --
> This message contains confidential information and is intended only for
> the
> individuals named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. E-mail transmission cannot be
> guaranteed to be secure or error-free as information could be intercepted,
> corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
> The sender therefore does not accept liability for any errors or omissions
> in the contents of this message, which arise as a result of e-mail
> transmission. If verification is required, please request a hard-copy
> version. -Hazelcast
>

Reply via email to