Re: [DISCUSS] FLIP-579: LATERAL SNAPSHOT Join

Leonard Xu Thu, 04 Jun 2026 04:06:29 -0700

The proposed metrics look good to me, thanks for the supplement.

Best,
Leonard


> 2026 6月 4 01:23，Fabian Hueske <[email protected]> 写道：
> 
> Thanks Leonard,
> 
> I've added a section about operator metrics to the proposal [1].
> If you have ideas for other useful metrics, please let me know and I'll add
> them.
> 
> Best, Fabian
> 
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421958523#FLIP579:LATERALSNAPSHOTJoin-Metrics
> 
> Am Mi., 3. Juni 2026 um 10:18 Uhr schrieb Fabian Hueske <[email protected]
>> :
> 
>> Thank you Hongshun for your feedback!
>> 
>> You are right that restricting the probe-side input to append-only changes
>> deviates from the existing event-time (and disabled proc-time) temporal
>> table joins.
>> I chose this restriction because processing time semantics do not work
>> well with retractions.
>> A probe record of +(k1, p1) could join against a build-side of (k1, v1),
>> while a later arriving probe-side retraction -(k1, p1) would join against
>> (k1, v2). The resulting rectraction record of -(k1, p1, v2) would not match
>> the earlier insertion +(k1, p1, v1).
>> The only changelog format that would work well would be upsert changes
>> (+I, +UA, -D) with key-only deletes (under the assumption that the upsert
>> key is still unique after the join).
>> 
>> I would like to keep the default behavior of only accepting append-only
>> inputs to prevent retraction mismatches.
>> At the same time, the operator implementation only needs to be slightly
>> adjusted to support arbitrary probe-side changes (if at all). The real
>> change would be in the planner / rules.
>> So allowing probe-side retraction input via a config option for
>> power-users who know what they are doing is certainly possible.
>> 
>> Do you think this should be part of the proposal or would you be fine to
>> leave this as future work?
>> 
>> Best, Fabian
>> 
>> Am Mi., 3. Juni 2026 um 08:41 Uhr schrieb Leonard Xu <[email protected]>:
>> 
>>> Hi Fabian,
>>> 
>>> Thanks for the detailed and thoughtful reply, and especially for agreeing
>>> to add operator metrics — that alone will significantly improve the
>>> debuggability of the join in production.
>>> 
>>> (1) On CPU spike / probe-side buffering
>>> 
>>> +1 for your reasoning on both points — deferring InputSelectable until
>>> the unaligned-checkpoint limitation is resolved, and leaving micro-batch
>>> transition out of v1. Watermark alignment + probe-side scan-offset is the
>>> principled clean approach, agreed.
>>> 
>>> (2) On Jark's backlog idea
>>> 
>>> Between the two paths in your reply, I'd lean toward option (a) — having
>>> the source connector emit a special WM (or a dedicated record attribute) at
>>> the end of backlog — as the long-term direction for exact flip-point
>>> semantics. It naturally fits what is almost certainly going to be the
>>> dominant production scenario: CDC sources (mysql-cdc, mongodb-cdc,
>>> postgres-cdc, ...) as the dimension build-side, all of which already have
>>> an explicit "snapshot finished → binlog start" boundary internally.
>>> 
>>> My suggestion is to ship FLIP-579 as proposed and collect user feedback
>>> on LATERAL SNAPSHOT Join from real production usage — that will give us
>>> solid evidence to prioritize this follow-up direction.
>>> 
>>> (3) On the vote
>>> 
>>> Once the metrics section is added and there are no further objections
>>> from others, I'm fine to start the vote.
>>> 
>>> 
>>> Best,
>>> Leonard
>>> 
>>> 
>>> 
>>>> 2026 6月 3 01:36，Fabian Hueske <[email protected]> 写道：
>>>> 
>>>> Thanks for your valuable feedback Jark and Leonard!
>>>> 
>>>> You are bringing up three of the tricky challenges that the new join
>>> needs
>>>> to deal with.
>>>> 
>>>> (1) Jark: The build-side flip point is not exact
>>>> 
>>>> This is correct. However, I would argue that a processing-time join does
>>>> not have exact guarantees anyway and can only produce roughly
>>> time-aligned
>>>> results. WM alignment of build and probe-side input should help to keep
>>> the
>>>> alignment somewhat close. Of course, this does not mean that we
>>> shouldn't
>>>> try to give as good guarantees as possible.
>>>> 
>>>> The primary mechanism for flipping from LOAD to JOIN phase is the
>>>> build-side watermark crossing a configured point in time. Watermarks are
>>>> used to track progress and completeness in Flink. Using them as a
>>> condition
>>>> to switch from LOAD to JOIN phase, means that the build-side received at
>>>> least all changes up to that point in time. There might have been
>>> changes
>>>> with later timestamps as well. These could be buffered on the side to
>>> have
>>>> a stricter FLIP point, but IMO this additional data should be tolerable
>>>> under proc-time semantics.
>>>> 
>>>> If the build-side input becomes stale, the processing idle timeout flip
>>>> condition gets applied. The assumption is that the build-side source is
>>>> currently exhausted and all data was consumed but the WM didn't progress
>>>> far enough to exceed the flip point. In this case, we want to flip and
>>>> start the regular JOIN phase.
>>>> 
>>>> For the use case of sources with an exact flip point, users would need
>>> to
>>>> know the timestamp of the last backlog record (or compute it if they
>>> know
>>>> roughly how long it takes to scan the backlog if it is computed
>>> on-demand).
>>>> I agree, this is not very practical.
>>>> I can think of two options
>>>> a) the source connector emits a special WM when it reaches the end of
>>> the
>>>> backlog. This would not require changes to the join operator but to the
>>>> source connectors.
>>>> b) The design of the SNAPSHOT function has the
>>> `load_completed_condition`
>>>> which is an extension point to add logic to determine the flip point.
>>>> 
>>>> 
>>>> (2) Jark: Buffering probe-side during LOAD phase
>>>> 
>>>> I think this is very similar to Leonard's point about "LOAD phase
>>>> backpressure" and also closely related to Leonard's point about
>>> "Flip-point
>>>> CPU spike".
>>>> 
>>>> This is indeed a potential problem. If used without care, the probe-side
>>>> state might grow very large. Before talking about possible ways to
>>> address
>>>> this problem, let me explain how I think that the join would be used.
>>>> 
>>>> A very common (maybe the most common?) use case should be to initialize
>>> the
>>>> build-side input up to time t_b and then start processing the probe-side
>>>> input from time t_p (with t_p = t_b, or slightly less than t_b)
>>> onwards. WM
>>>> alignment would help to roughly align build-side and probe-side inputs
>>>> (although not being perfectly aligned like the event-time join)
>>>> Initializing the build-side up to t_b and starting consuming the
>>> probe-side
>>>> from t_p with (t_p << t_b) would mean that the first probe-side records
>>> are
>>>> joined with much later versions of the build-side.
>>>> 
>>>> The first scenario (t_p = t_b) can be controlled with WM alignment and
>>>> scan-offset table hints on the probe-side input. Since the WMs of the
>>>> build-side input would be less than the WMs of the probe-side input, the
>>>> probe-side input should be throttled until the build-side caught up
>>> (which
>>>> should be close the the flip point).
>>>> Other scenarios (including the t_p << t_b scenario) would benefit from
>>> an
>>>> idea [1] that is described in the future work section of the FLIP. That
>>>> mechanism would also be based on WM alignment and would need some
>>>> collaboration from the build-side source operator to indicate
>>> completeness
>>>> of the backlog.
>>>> 
>>>> Using the InputSelectable interface is an idea that we also looked into
>>> (as
>>>> Gustavo already pointed out). Unfortunately, it is incompatible with
>>>> unaligned checkpoints and there are no other streaming operators that
>>>> implement the interface. I haven't looked in depth at the current
>>>> limitations, but if some of these would be resolved, it might be
>>> possible
>>>> to later extend the join operator. It might even work with relaxed
>>>> guarantees because we don't need to fully block the input but just
>>> throttle
>>>> it such that less probe-side data needs to be buffered.
>>>> 
>>>> The idea of limiting the size of the probe-side buffer with a config
>>> like
>>>> `max-buffer-size` sounds interesting. However, I'm not sure if applying
>>>> backpressure would really work because we still need to consume the
>>>> build-side to be able to reach the flip point and selective
>>> backpressure is
>>>> not possible without InputSelectible.
>>>> 
>>>> An earlier draft of the proposal described eager joining (now moved to
>>> the
>>>> future work section [2]). The idea is that a probe-side record would be
>>>> directly joined when a match was present or received during the LOAD
>>> phase.
>>>> After joining it wouldn't be put into state. This would of course
>>>> significantly reduce the state size and solved the issue of CPU spikes
>>>> during the transition but come at the cost of hard-to-explain semantics
>>>> (the current semantics are rather simple, we collect until the flip
>>> point
>>>> and join against that).
>>>> Also, the idea of eager joining was developed when the join operator was
>>>> still restricted to FK-PK joins (single build-side record per join key).
>>>> The design generalized this restriction to arbitrary joins which means
>>>> there might be more build-side matches for a probe-side record such that
>>>> the presence of a single join match (of a possibly earlier version) does
>>>> not guarantee completeness anymore. That's why the idea of eager joining
>>>> was discarded for now.
>>>> 
>>>> 
>>>> (3) Leonard: Flip-point CPU spikes
>>>> 
>>>> This is also a very valid concern. I would argue that the primary
>>> mechanism
>>>> to address this point should be to reduce the amount of buffered
>>> probe-side
>>>> records (see point (2) above).
>>>> 
>>>> I also thought about your idea to micro-batch the draining. In the
>>> design,
>>>> the transition join is triggered per-key by event-time timers that also
>>>> emit the "current" probe-side WM downstream. If we want to continue
>>> using
>>>> this mechanism, we would need to schedule multiple timers and probably
>>> use
>>>> some clever mechanism to gradually advance probe-side WMs while still
>>>> consuming records. There could be a separate "TRANSITION" phase during
>>>> which we still append to the probe-buffer but use the mechanism for
>>> atomic
>>>> build-side updates. However, this would significantly complicate the
>>> design
>>>> affecting the control flow and recovery. Hence, I would first try to
>>>> address this issue by reducing the probe-side state.
>>>> 
>>>> If you have better ideas for how to use micro-batching during flip
>>>> transition, I'm very open to exploring those.
>>>> 
>>>> Leonard also brought up a very important point about metrics!
>>>> I will add a section on operator metrics that will help to understand
>>> the
>>>> state of the operator.
>>>> 
>>>> Please let me know your thoughts!
>>>> 
>>>> Best, Fabian
>>>> 
>>>> [1]
>>>> 
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421958523#FLIP579:LATERALSNAPSHOTJoin-ReduceBufferingofProbe-SideviaBuild-SideWatermarkSuppression
>>>> [2]
>>>> 
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421958523#FLIP579:LATERALSNAPSHOTJoin-EagerjoinmodeduringLOADphase
>>>> 
>>>> 
>>>> Am Mo., 1. Juni 2026 um 16:44 Uhr schrieb Leonard Xu <[email protected]
>>>> :
>>>> 
>>>>> Hi Fabian
>>>>> 
>>>>> Thanks for driving this FLIP and your kind patience. The motivation is
>>>>> spot-on, and the LOAD→JOIN two-phase design is the right structural
>>> fix for
>>>>> the FLINK-19830 initialization problem. Overall direction +1 from my
>>> side.
>>>>> 
>>>>> Besides Jark’s idea about backlog and InputSelectable which may need
>>> more
>>>>> prerequisites, I’ve two concerns about current proposal:
>>>>> 
>>>>> 1. LOAD phase backpressure. The FLIP assumes "seconds to a few minutes"
>>>>> for build-side init, but nothing enforces it. Large build-side tables
>>>>> (e.g., 50M rows) + fast probe streams → unbuffered probe-side state
>>>>> explosion. Should we add a config  like max-buffer-size that applies
>>>>> backpressure when exceeded or some metrics about buffer, rather than
>>>>> silently piling up records?
>>>>> 
>>>>> 2. Flip-point CPU spike. Joining all buffered probe records ×
>>> build-side
>>>>> state in one shot differs fundamentally from event-time join's
>>> incremental
>>>>> watermark-batched emission. In the worst case this could cause a
>>>>> TaskManager CPU spike and downstream shock. Worth considering
>>> micro-batch
>>>>> draining during flip transition?
>>>>> 
>>>>> Looking forward to your thoughts.
>>>>> 
>>>>> Best,
>>>>> Leonard
>>>>> 
>>>>>> 2026 6月 1 16:02，Fabian Hueske <[email protected]> 写道：
>>>>>> 
>>>>>> Hi Leonard,
>>>>>> 
>>>>>> Sorry, missed your email and already started the vote.
>>>>>> Let me put it on hold for now and continue discussing the proposal.
>>>>>> 
>>>>>> Looking forward to your comments,
>>>>>> Fabian
>>>>>> 
>>>>>> Am Mo., 1. Juni 2026 um 09:56 Uhr schrieb Leonard Xu <
>>> [email protected]
>>>>>> :
>>>>>> 
>>>>>>> @Fabian Thanks for driving this FLIP, sorry for late reply due to my
>>>>>>> personal reason that I shouldn’t miss such an important FLIP.
>>>>>>> 
>>>>>>> I’m reviewing the FLIP and will try to finish it today, could you
>>> kindly
>>>>>>> wait one minute to start the vote?
>>>>>>> 
>>>>>>> And sorry for interrupt your plan again.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Leonard
>>>>>>> 
>>>>>>>> 2026 6月 1 15:51，Fabian Hueske <[email protected]> 写道：
>>>>>>>> 
>>>>>>>> Thanks everyone for your comments on the FLIP.
>>>>>>>> I will start the vote.
>>>>>>>> 
>>>>>>>> Best, Fabian
>>>>>>>> 
>>>>>>>> Am Do., 28. Mai 2026 um 20:13 Uhr schrieb David Anderson <
>>>>>>>> [email protected]>:
>>>>>>>> 
>>>>>>>>> Fabian,
>>>>>>>>> 
>>>>>>>>>> So, I don't think that we should buffer unmatched probe-side
>>> records
>>>>>>>>> beyond
>>>>>>>>> the flip point.
>>>>>>>>> 
>>>>>>>>> Thanks for explaining your reasoning. Makes sense to me.
>>>>>>>>> 
>>>>>>>>> David
>>>>>>>>> 
>>>>>>>>> On Thu, May 28, 2026 at 6:55 PM Fabian Hueske <[email protected]>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi Xingcan,
>>>>>>>>>> 
>>>>>>>>>> Thanks for your comments on the FLIP!
>>>>>>>>>> 
>>>>>>>>>> The join's behavior when starting from a savepoint is indeed an
>>>>>>> important
>>>>>>>>>> aspect to consider and the problem of a rapidly advancing
>>> dimension
>>>>>>>>>> (build-side) table is of course real.
>>>>>>>>>> 
>>>>>>>>>> I would argue that watermark alignment should significantly reduce
>>>>> the
>>>>>>>>>> impact of this.
>>>>>>>>>> If enabled, sources align their consumption based on their current
>>>>>>>>>> watermark such that the (presumably much smaller) build-side
>>> source
>>>>>>> would
>>>>>>>>>> be slowed down to the event-time progress of the probe-side.
>>>>>>>>>> While watermark alignment is not an "exact" mechanism, the
>>> semantics
>>>>> of
>>>>>>>>> the
>>>>>>>>>> new processing-time join also do not guarantee "exact" results.
>>>>>>>>>> At the same time, alignment should ensure that build and
>>> probe-side
>>>>> are
>>>>>>>>>> roughly aligned in event-time (without the strict guarantees that
>>> the
>>>>>>>>>> event-time temporal table join provides).
>>>>>>>>>> 
>>>>>>>>>> However, I really like your idea of starting in event-time mode
>>> and
>>>>>>>>>> flipping to processing-time after the initialization duration
>>> passed.
>>>>>>>>>> I'm not sure if it would fully address the problem you described.
>>> As
>>>>>>> you
>>>>>>>>>> said, users would need to be able to reconfigure the flip-point
>>> and
>>>>> I'm
>>>>>>>>> not
>>>>>>>>>> sure if there's a good mechanism for this yet.
>>>>>>>>>> But it might have some other properties that would be beneficial,
>>> so
>>>>>>> I'll
>>>>>>>>>> think about that.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Fabian
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Am Do., 28. Mai 2026 um 18:21 Uhr schrieb Fabian Hueske <
>>>>>>>>> [email protected]
>>>>>>>>>>> :
>>>>>>>>>> 
>>>>>>>>>>> Thanks for your feedback David!
>>>>>>>>>>> 
>>>>>>>>>>>> One question: If I understand correctly, during the JOIN phase
>>> of
>>>>> an
>>>>>>>>>>> INNER
>>>>>>>>>>> join, if the desired build-side record is missing, nothing will
>>> be
>>>>>>>>>> emitted
>>>>>>>>>>> for the unmatched probe-side record. For an INNER join, I can
>>>>> imagine
>>>>>>>>>>> wanting to buffer unmatched probe-side records, expecting the
>>> build
>>>>>>>>> side
>>>>>>>>>>> will arrive soon. What's your thinking there?
>>>>>>>>>>> 
>>>>>>>>>>> Your understanding is correct. If a probe-side record arrives
>>> during
>>>>>>>>> LOAD
>>>>>>>>>>> phase but no matching build-side record is received,
>>>>>>>>>>> the probe-side record would be discarded without being joined
>>> during
>>>>>>>>> the
>>>>>>>>>>> transition from LOAD to JOIN.
>>>>>>>>>>> 
>>>>>>>>>>> I would argue that users that want to prevent this, would need to
>>>>>>>>>>> configure a longer initialization time.
>>>>>>>>>>> IMO, dropping unmatched probe records is not a "bad" property of
>>>>> INNER
>>>>>>>>>>> joins but an essential part of their semantics. It might even be
>>>>>>>>> desired
>>>>>>>>>> by
>>>>>>>>>>> some users.
>>>>>>>>>>> If we would buffer probe-side records for INNER joins beyond the
>>>>>>>>>>> transition point, we:
>>>>>>>>>>> * would have different behaviors for INNER and LEFT joins
>>>>>>>>>>> * could not start to emit probe-side watermarks as long as there
>>> are
>>>>>>>>>> still
>>>>>>>>>>> probe-side records buffered (or at least not advance past them
>>>>> without
>>>>>>>>>>> emitting late data at a later point of time)
>>>>>>>>>>> * would either need another config knob to specify when to
>>> "really"
>>>>>>>>> clean
>>>>>>>>>>> up the probe-side state or keep such unmatched records forever in
>>>>>>> state
>>>>>>>>>> (we
>>>>>>>>>>> could also use state TTL...)
>>>>>>>>>>> 
>>>>>>>>>>> So, I don't think that we should buffer unmatched probe-side
>>> records
>>>>>>>>>>> beyond the flip point.
>>>>>>>>>>> 
>>>>>>>>>>> Best, Fabian
>>>>>>>>>>> 
>>>>>>>>>>> Am Do., 28. Mai 2026 um 17:05 Uhr schrieb Xingcan Cui <
>>>>>>>>>> [email protected]
>>>>>>>>>>>> :
>>>>>>>>>>> 
>>>>>>>>>>>> Hi Fabian,
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for this FLIP! The two-phase design is excellent for
>>>>> avoiding
>>>>>>>>>>>> early-joining bugs while maintaining low-latency processing-time
>>>>>>>>>>>> semantics.
>>>>>>>>>>>> 
>>>>>>>>>>>> After thinking more about the proposal, I'd like to point out an
>>>>> edge
>>>>>>>>>> case
>>>>>>>>>>>> related to the initialization phase or recovery after prolonged
>>>>>>>>> downtime
>>>>>>>>>>>> (for example, when a job has been down for a day). While a
>>>>>>>>>> processing-time
>>>>>>>>>>>> join works well for live streaming, where results can reasonably
>>>>>>>>> depend
>>>>>>>>>> on
>>>>>>>>>>>> the immediate arrival order of live data, it does not work as
>>> well
>>>>>>> for
>>>>>>>>>>>> catch-up scenarios.
>>>>>>>>>>>> 
>>>>>>>>>>>> Currently, if a job initializes or restores from a checkpoint
>>>>> after a
>>>>>>>>>> long
>>>>>>>>>>>> downtime, the operator resumes directly in the processing-time
>>> join
>>>>>>>>>> phase.
>>>>>>>>>>>> During catch-up, however, the natural chronological arrival
>>> order
>>>>> of
>>>>>>>>> the
>>>>>>>>>>>> live data is completely lost. As a result, these replayed fact
>>>>>>> records
>>>>>>>>>> are
>>>>>>>>>>>> evaluated against the current machine time and may blindly join
>>>>> with
>>>>>>>>> the
>>>>>>>>>>>> rapidly advancing "current" dimension snapshot, rather than the
>>>>>>>>>> historical
>>>>>>>>>>>> versions they were originally supposed to match.
>>>>>>>>>>>> 
>>>>>>>>>>>> To handle this edge case, could we consider:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. changing the first phase into an event-time join phase, and
>>>>>>>>>>>> 
>>>>>>>>>>>> 2. allowing the operator to switch back to the first phase
>>> after a
>>>>>>>>>>>> restart?
>>>>>>>>>>>> 
>>>>>>>>>>>> For example, users could configure a timestamp threshold. Before
>>>>> the
>>>>>>>>>>>> watermark reaches that point, the operator would run as an
>>>>> event-time
>>>>>>>>>>>> versioned join to safely process the catch-up phase through
>>>>> watermark
>>>>>>>>>>>> alignment. Once the watermark passes the threshold, the operator
>>>>>>> could
>>>>>>>>>>>> purge the old multi-version state and seamlessly transition
>>> back to
>>>>>>>>> the
>>>>>>>>>>>> pure processing-time join phase for live traffic.
>>>>>>>>>>>> 
>>>>>>>>>>>> After a job restart, users could either update the target
>>> timestamp
>>>>>>> to
>>>>>>>>>>>> reset the operator back into the event-time phase, or leave it
>>>>>>>>> unchanged
>>>>>>>>>>>> to
>>>>>>>>>>>> continue operating in the processing-time phase.
>>>>>>>>>>>> 
>>>>>>>>>>>> I completely understand that this would introduce significant
>>>>>>>>> complexity
>>>>>>>>>>>> to
>>>>>>>>>>>> the operator's state management and lifecycle, so this is only a
>>>>>>>>>> tentative
>>>>>>>>>>>> proposal to explore whether it might be worth considering for
>>> the
>>>>>>>>>>>> long-term
>>>>>>>>>>>> robustness of the design.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> 
>>>>>>>>>>>> Xingcan
>>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, May 28, 2026 at 8:17 AM David Anderson <
>>>>> [email protected]
>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm quite enthusiastic about this. I want to thank Fabian for
>>>>>>>>> putting
>>>>>>>>>>>>> together such a well-crafted FLIP. And I look forward to
>>> updating
>>>>>>>>> the
>>>>>>>>>>>>> awkward educational content this FLIP will make obsolete.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> To my mind, the syntax expresses the semantics of this join
>>> rather
>>>>>>>>>> well.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Until now, developers using event-time temporal joins sometimes
>>>>>>>>>>>> resorted to
>>>>>>>>>>>>> doing weird things with watermarks to handle a build side
>>> that's
>>>>>>>>>> mostly
>>>>>>>>>>>>> idle; this lateral snapshot join is clearly better -- not to
>>>>> mention
>>>>>>>>>> the
>>>>>>>>>>>>> added bonus of pre-loading the build table.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> One question: If I understand correctly, during the JOIN phase
>>> of
>>>>> an
>>>>>>>>>>>> INNER
>>>>>>>>>>>>> join, if the desired build-side record is missing, nothing
>>> will be
>>>>>>>>>>>> emitted
>>>>>>>>>>>>> for the unmatched probe-side record. For an INNER join, I can
>>>>>>>>> imagine
>>>>>>>>>>>>> wanting to buffer unmatched probe-side records, expecting the
>>>>> build
>>>>>>>>>> side
>>>>>>>>>>>>> will arrive soon. What's your thinking there?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> David
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, May 27, 2026 at 12:44 PM Fabian Hueske <
>>>>> [email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks Gustavo and Timo for the positive feedback!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I'd like to bump this thread up to collect more feedback.
>>>>>>>>>>>>>> If there are no more responses, I will start a vote on this
>>> FLIP
>>>>>>>>>> next
>>>>>>>>>>>>>> Monday, June 1st.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best, Fabian
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Am Do., 21. Mai 2026 um 12:15 Uhr schrieb Timo Walther <
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi Fabian,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> thanks for proposing this FLIP. I agree that this join is
>>> super
>>>>>>>>>>>> common,
>>>>>>>>>>>>>>> after talking to many people at conferences, I could imagine
>>> it
>>>>>>>>>>>> will be
>>>>>>>>>>>>>>> one of the most used kinds of joins going forward.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Tightly coupling it with watermarks fits both from a
>>> semantical
>>>>>>>>>>>> point
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> view but also with other efforts such as FLIP-558
>>> (Improvements
>>>>>>>>> to
>>>>>>>>>>>>>>> SinkUpsertMaterializer and changelog disorder) [1]. In the
>>> near
>>>>>>>>>>>> future,
>>>>>>>>>>>>>>> we should work on more automated watermarking to power these
>>>>>>>>>>>>>>> watermark-based operators, but this is an orthogonal effort.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Overall I'm strongly +1 on this. Also +1 on the syntax
>>>>>>>>>> improvements
>>>>>>>>>>>> for
>>>>>>>>>>>>>>> lateral table functions by dropping the TABLE() wrapper.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-558%3A+Improvements+to+SinkUpsertMaterializer+and+changelog+disorder
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 18.05.26 11:47, Gustavo de Morais wrote:
>>>>>>>>>>>>>>>> Hi Fabian,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> In general a strong +1 for the feature, without getting into
>>>>>>>>> the
>>>>>>>>>>>>>> details
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>> the FLIP yet. This is a missing feature for years and I'm
>>>>>>>>> happy
>>>>>>>>>>>> that
>>>>>>>>>>>>>>> we're
>>>>>>>>>>>>>>>> putting the time to address this - while also getting rid of
>>>>>>>>>> some
>>>>>>>>>>>> of
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> hard restrictions we had. Thanks!
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>> Gustavo
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Fri, 15 May 2026 at 16:39, Fabian Hueske <
>>>>>>>>> [email protected]
>>>>>>>>>>> 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I'd like to start a discussion on FLIP-579: LATERAL
>>> SNAPSHOT
>>>>>>>>>> Join
>>>>>>>>>>>>> [1].
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Enriching a stream with data from a (slowly changing)
>>> dynamic
>>>>>>>>>>>> table
>>>>>>>>>>>>>> is a
>>>>>>>>>>>>>>>>> super common use case.
>>>>>>>>>>>>>>>>> Flink SQL features Temporal Joins [2] to address these use
>>>>>>>>>> cases.
>>>>>>>>>>>>>>>>> However, SQL users can only use the event-time variant
>>> which
>>>>>>>>>> has
>>>>>>>>>>>>> many
>>>>>>>>>>>>>>>>> limitations (heavy dependency on frequent WM updates on
>>> both
>>>>>>>>>>>> inputs,
>>>>>>>>>>>>>>>>> build-side table requires a PK, the join predicate must
>>>>>>>>> include
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> build-side PK, etc).
>>>>>>>>>>>>>>>>> The processing-time temporal join is disabled (due to
>>>>>>>>>> build-side
>>>>>>>>>>>>>>>>> initialization issues [3]) and temporal table function
>>> joins
>>>>>>>>>> are
>>>>>>>>>>>>>>>>> only available in Table API.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> FLIP-579 proposes a new temporal join operator that
>>> operates
>>>>>>>>> in
>>>>>>>>>>>>>>>>> processing-time and addresses the limitations of the
>>> existing
>>>>>>>>>>>>>>>>> implementations:
>>>>>>>>>>>>>>>>> * initialization of the build-side before joining
>>>>>>>>>>>>>>>>> * no requirement of continuous, frequent build-side WMs
>>>>>>>>> (after
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> initialization completed)
>>>>>>>>>>>>>>>>> * no requirement for a PK on the build-side
>>>>>>>>>>>>>>>>> * table function-based syntax [4] via a built-in SNAPSHOT
>>>>>>>>>>>> function
>>>>>>>>>>>>>>>>> (proposed in FLIP-517 [4])
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Looking forward to your feedback.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Fabian
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-579%3A+LATERAL+SNAPSHOT+Join
>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/sql/queries/joins/#temporal-joins
>>>>>>>>>>>>>>>>> [3] https://issues.apache.org/jira/browse/FLINK-19830
>>>>>>>>>>>>>>>>> [4]
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/sql/queries/joins/#temporal-table-function-join
>>>>>>>>>>>>>>>>> [5]
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-517%3A+Better+Handling+of+Dynamic+Table+Primitives+with+PTFs#FLIP517:BetterHandlingofDynamicTablePrimitiveswithPTFs-SNAPSHOTfortemporaljoins
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>>

Re: [DISCUSS] FLIP-579: LATERAL SNAPSHOT Join

Reply via email to