Yep I think you got the idea.
>Therefore to execute this query you need an unbounded memory for >`order_item` and `product` relations, so it's not a good candidate for a >streaming query, but let's put that aside, I'm interested in the semantics. The watermark idea is used to give a hint when data is complete, thus future data can be considered as late data. "EMIT STREAM" keeps a room to specify some strategies to handle late data (e.g. a syntax to indicate dropping all late data). Spark, Flink Beam all offer similar mechanisms to control the memory cost for unbounded dataset. -Rui On Mon, Mar 23, 2020 at 8:41 AM Viliam Durina <[email protected]> wrote: > I've read again parts of the paper and now I've finally got the idea, I > hope. The semantics still works with plain relations, relational operators > are applied as in instantaneous queries, just the resulting relation is > "streamed", in case of EMIT STREAM clause in the form of records and > retractions. > > So I'll try to make up an example for the query, let's repeat it for > clarity: > > CREATE TABLE order_item (product_id INT, amount INT, ...); > CREATE TABLE product (product_id INT PRIMARY KEY, name VARCHAR); > > SELECT * > FROM order_item o > JOIN product p USING(product_id) > EMIT STREAM; > > When we execute the above query: > - until any change occurs in either table, there's no output > - if a new order_item is inserted: > -> a joined record is emitted with the order item and matching product > - if a product name is updated > -> for every matching order_item a retraction and a new record is emitted > > Therefore to execute this query you need an unbounded memory for > `order_item` and `product` relations, so it's not a good candidate for a > streaming query, but let's put that aside, I'm interested in the semantics. > > Did I put the example correctly? > > Viliam > > On Sat, 21 Mar 2020 at 01:00, Julian Hyde <[email protected]> wrote: > > > Our thinking in the "One SQL to rule them all" paper [1] is that there > > are not "streams" and "tables". Both product and order_items are > > time-varying relations (TVFs). > > > > Whether it is a streaming query is determined by whether you specify > > "EMIT STREAM" in the query, not by what objects are referenced in the > > query. > > > > (There is a strong analogy between streaming queries and the > > differentiation operation in differential calculus. Consider the > > product rule in calculus: (uv)' = u'v + u.v'. If you want to compute > > the join of two time-varying relations istream(u join v) = (istream(u) > > join v) union (u join istream(v)). So you see that we are using the > > 'stream' of each side. I find this symmetric treatment of all TVRs to > > be compelling.) > > > > Julian > > > > [1] https://arxiv.org/pdf/1905.12133.pdf > > > > On Fri, Mar 20, 2020 at 3:17 PM Viliam Durina <[email protected]> > > wrote: > > > > > > > Does it matter which table is a steam? if the "STREAM" query runs > > > > continuously, the output (table) from the query is a stream, and > likely > > > > this stream gives you delta updates periodically. > > > > > > > > > In my understanding, it does. If both tables are a stream, you get a > > change > > > stream from both. You're joining two change streams. So if there's a > > change > > > in product name, a change event will occur and the change event should > be > > > joined to all previous (and future) change events on order_items > matching > > > that product. Similarly if there's a new order_item, it should be > joined > > > with all previous change events on the matching product. > > > > > > The paper doesn't discuss queries with joins at all. But it's unclear > to > > me > > > how it's supposed to work. Maybe if you could give an example for the > > above > > > query and what happens when there's a change in order_item and when in > > > product. > > > > > > Viliam > > > > > > -- > > > This message contains confidential information and is intended only for > > the > > > individuals named. If you are not the named addressee you should not > > > disseminate, distribute or copy this e-mail. Please notify the sender > > > immediately by e-mail if you have received this e-mail by mistake and > > > delete this e-mail from your system. E-mail transmission cannot be > > > guaranteed to be secure or error-free as information could be > > intercepted, > > > corrupted, lost, destroyed, arrive late or incomplete, or contain > > viruses. > > > The sender therefore does not accept liability for any errors or > > omissions > > > in the contents of this message, which arise as a result of e-mail > > > transmission. If verification is required, please request a hard-copy > > > version. -Hazelcast > > > > > -- > Viliam Durina > Jet Developer > hazelcastĀ® > > <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA 94402 | > USA > +1 (650) 521-5453 <(650)%20521-5453> | hazelcast.com < > https://www.hazelcast.com> > > -- > This message contains confidential information and is intended only for > the > individuals named. If you are not the named addressee you should not > disseminate, distribute or copy this e-mail. Please notify the sender > immediately by e-mail if you have received this e-mail by mistake and > delete this e-mail from your system. E-mail transmission cannot be > guaranteed to be secure or error-free as information could be intercepted, > corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. > The sender therefore does not accept liability for any errors or omissions > in the contents of this message, which arise as a result of e-mail > transmission. If verification is required, please request a hard-copy > version. -Hazelcast >
