In Apache Flink, we have a syntax: … A JOIN B for SYSTEM_TIME AS OF A.PROC_TIME
To describe a stream A join a temporal table B where we only want to join the records with the current machine time as the time point of table B. Is that the case Viliam described ? Best, Danny Chan 在 2020年3月25日 +0800 AM12:46,Julian Hyde <[email protected]>,写道: > You’re right that this is a problem. > > We’d need some way to say that you don’t care which version of the product > table you are joining against. One implication would be that if you replay > the query, and the product table has changed in the mean time, you are happy > to get different results. > > We could devise some syntax to add to the SQL. And/or we could add some > annotation to the product TVR. What do you think? > > Julian > > > > On Mar 24, 2020, at 12:11 AM, Viliam Durina <[email protected]> wrote: > > > > So how would you do a simple stream enrichment query? That is one that for > > each new record in an append-only relation will join a matching record from > > a mutable relation that's valid at the processing time? This use case is > > common, for example in credit card fraud detection, for each transaction > > you look up the cardholder statistics, merchant statistics, product > > statistics, transaction history etc. that you have at hand at the moment > > the transaction is processed and the enriched record is then fed to a > > rule-based engine or to an ML inference model. You're not interested in > > later updates in those enrichment tables. In my understanding it is not > > possible with the proposed semantics. > > > > For example, can you refer to the `undo`, `ptime` and `ver` columns in the > > query itself? We could filter out columns where `ver > 0`: > > > > SELECT ( > > SELECT * > > FROM order_item o > > JOIN product p USING(product_id) > > EMIT STREAM > > ) WHERE ver = 0; > > > > You can optimize for the common events, and not use very much memory. For > > > the rarer events, you can pay the cost of a disk I/O. > > > > > > > With the particular query I don't think you can do this. Let's say the > > `order_item` is backed by a Kafka topic - you might not have the full > > history. And even if you do, the receiver of the query results is not > > interested in retractions and new versions of all the zillions of orders > > with updated product name. The desired output should be specified by the > > query itself. And, for example, cardholder statistics could be updated with > > each transaction in a feedback loop. > > > > Viliam > > > > -- > > This message contains confidential information and is intended only for the > > individuals named. If you are not the named addressee you should not > > disseminate, distribute or copy this e-mail. Please notify the sender > > immediately by e-mail if you have received this e-mail by mistake and > > delete this e-mail from your system. E-mail transmission cannot be > > guaranteed to be secure or error-free as information could be intercepted, > > corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. > > The sender therefore does not accept liability for any errors or omissions > > in the contents of this message, which arise as a result of e-mail > > transmission. If verification is required, please request a hard-copy > > version. -Hazelcast >
