Re: STREAM keyword

Danny Chan Tue, 24 Mar 2020 22:40:35 -0700

In Apache Flink, we have a syntax:

… A JOIN B for SYSTEM_TIME AS OF A.PROC_TIME


To describe a stream A join a temporal table B where we only want to join the 
records with the current machine time as the time point of table B.

Is that the case Viliam described ?


Best,
Danny Chan
在 2020年3月25日 +0800 AM12:46，Julian Hyde <[email protected]>，写道：
> You’re right that this is a problem.
>
> We’d need some way to say that you don’t care which version of the product 
> table you are joining against. One implication would be that if you replay 
> the query, and the product table has changed in the mean time, you are happy 
> to get different results.
>
> We could devise some syntax to add to the SQL. And/or we could add some 
> annotation to the product TVR. What do you think?
>
> Julian
>
>
> > On Mar 24, 2020, at 12:11 AM, Viliam Durina <[email protected]> wrote:
> >
> > So how would you do a simple stream enrichment query? That is one that for
> > each new record in an append-only relation will join a matching record from
> > a mutable relation that's valid at the processing time? This use case is
> > common, for example in credit card fraud detection, for each transaction
> > you look up the cardholder statistics, merchant statistics, product
> > statistics, transaction history etc. that you have at hand at the moment
> > the transaction is processed and the enriched record is then fed to a
> > rule-based engine or to an ML inference model. You're not interested in
> > later updates in those enrichment tables. In my understanding it is not
> > possible with the proposed semantics.
> >
> > For example, can you refer to the `undo`, `ptime` and `ver` columns in the
> > query itself? We could filter out columns where `ver > 0`:
> >
> > SELECT (
> > SELECT *
> > FROM order_item o
> > JOIN product p USING(product_id)
> > EMIT STREAM
> > ) WHERE ver = 0;
> >
> > You can optimize for the common events, and not use very much memory. For
> > > the rarer events, you can pay the cost of a disk I/O.
> > >
> >
> > With the particular query I don't think you can do this. Let's say the
> > `order_item` is backed by a Kafka topic - you might not have the full
> > history. And even if you do, the receiver of the query results is not
> > interested in retractions and new versions of all the zillions of orders
> > with updated product name. The desired output should be specified by the
> > query itself. And, for example, cardholder statistics could be updated with
> > each transaction in a feedback loop.
> >
> > Viliam
> >
> > --
> > This message contains confidential information and is intended only for the
> > individuals named. If you are not the named addressee you should not
> > disseminate, distribute or copy this e-mail. Please notify the sender
> > immediately by e-mail if you have received this e-mail by mistake and
> > delete this e-mail from your system. E-mail transmission cannot be
> > guaranteed to be secure or error-free as information could be intercepted,
> > corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.
> > The sender therefore does not accept liability for any errors or omissions
> > in the contents of this message, which arise as a result of e-mail
> > transmission. If verification is required, please request a hard-copy
> > version. -Hazelcast
>

Re: STREAM keyword

Reply via email to