We currently use “SELECT STREAM …” but we should move to “SELECT … EMIT STREAM” at some point, to be consistent with the paper.
Your query is an excellent example of why “EMIT STREAM” is the right abstraction. order_item and product are time-varying relations. The query SELECT STREAM * FROM order_item o JOIN product p USING(product_id) is also a time-varying relation. Suppose you execute that query at 10am and get a particular result. Then two new order items arrive, and the price of an existing product changes, and you execute again at 10.10am. The result will be different, reflecting changes to both order_item and product TVRs. This is good - under the “EMIT STREAM” semantics we are treating TVRs the same, even though one is more “stream-like” and the other is more “table-like”. Now, if you have particular time-sensitive join semantics - such as wanting to join the order_item to the product table as if was at the time the order was placed - you will need to express that explicitly in the join condition. Julian > On Mar 19, 2020, at 6:39 AM, Viliam Durina <[email protected]> wrote: > > Hi all, > > I'm not sure if this is an appropriate forum, I want to discuss concepts in > the "One SQL to Rule Them All" paper. The paper uses `EMIT STREAM` clause, > but Calcite uses the `STREAM` keyword after `SELECT`, but in my > understanding, these are the same. > > I'm wondering about the supposed semantics of the clause with multiple > relations queried. It's supposed to produce a streaming output instead of > instantaneous one. But IMO, you need to specify this property for each > input relation. > > Take an example. Let's have ORDER_ITEM and PRODUCT relations. > > ORDER_ITEM(product_id, quantity, ...); > PRODUCT(product_id, name); > > We want to join these and get a streaming result: > > SELECT STREAM * FROM order_item o JOIN product p USING(product_id) > > What should this query output? Intuitively, for each change event in > order_item query it should find a matching product and emit that together. > But it could be also interpreted that we treat the product as a > time-varying relation: for each change event in order_item query all > matching changes in product. > > I don't see which of the above behaviors should be chosen, provided both > `order_item` and `product` relations can be accessed as point-in-time > relations or time-varying relations. > > Regards, > Viliam > > -- > This message contains confidential information and is intended only for the > individuals named. If you are not the named addressee you should not > disseminate, distribute or copy this e-mail. Please notify the sender > immediately by e-mail if you have received this e-mail by mistake and > delete this e-mail from your system. E-mail transmission cannot be > guaranteed to be secure or error-free as information could be intercepted, > corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. > The sender therefore does not accept liability for any errors or omissions > in the contents of this message, which arise as a result of e-mail > transmission. If verification is required, please request a hard-copy > version. -Hazelcast
