Re: [DISCUSS] KIP-923: Add A Grace Period to Stream Table Join

Bruno Cadonna Tue, 06 Jun 2023 00:28:23 -0700

Hi,

another idea that came to my mind. Instead of using a compacted topic,the buffer could use a non-compacted topic and regularly delete recordsbefore a given offset as Streams does for repartition topics.


Best,
Bruno

On 05.06.23 21:48, Bruno Cadonna wrote:

Hi Victoria,

that is a good point!

I think, the topic needs to be a compacted topic to be able to get ridof records that are evicted from the buffer. So the key might besomething with the key, the timestamp, and a sequence number todistinguish between records with the same key and same timestamp.


Just an idea! Maybe Walker comes up with something better.

Best,
Bruno

On 05.06.23 20:38, Victoria Xia wrote:

Hi Walker,

Thanks for the latest updates! The KIP looks great. Just one questionabout

the changelog topic for the join buffer: The KIP says "When a failure

occurs the buffer will try to recover from an OffsetCheckpoint ifpossible.

If not it will reload the buffer from a compacted change-log topic." This

is a new changelog topic that will be introduced specifically for thejoin

buffer, right? Why is the changelog topic compacted? What are the keys? I

am confused because the buffer contains records from the stream-sideof the

join, for which multiple records with the same key should be treated as
separate updates will all must be tracked in the buffer, rather than
updates which replace each other.

Thanks,
Victoria

On Mon, Jun 5, 2023 at 1:47 AM Bruno Cadonna <[email protected]> wrote:

Hi Walker,

Thanks once more for the updates to the KIP!

Do you also plan to expose metrics for the buffer?

Best,
Bruno

On 02.06.23 17:16, Walker Carlson wrote:

Hello Bruno,

I think this covers your questions. Let me know what you think

2.
We can use a changelog topic. I think we can treat it like any other

store

and recover in the usual manner. Also implementation is on disk

3.
The description is in the public interfaces description. I will copy it
into the proposed changes as well.

This is a bit of an implementation detail that I didn't want to addinto

the kip, but the record will be added to the buffer to keep the stream

time

consistent, it will just be ejected immediately. If of course if this
causes performance issues we will skip this step and track stream time

separately. I will update the kip to say that stream time advanceswhen a

stream record enters the node.

Also, yes, updated.

5.
No there is no difference right now, everything gets processed as it

comes

in and tries to find a record for its time stamp.

Walker
On Fri, Jun 2, 2023 at 6:41 AM Bruno Cadonna <[email protected]>wrote:
Hi Walker,

Thanks for the updates!

2.
It is still not clear to me how a failure is handled. I do not
understand what you mean by "recover from an OffsetCheckpoint".
My understanding is that the buffer needs to be replicated into itsown
Kafka topic. The input topic is not enough. The offset of a record is
added to the offsets to commit once the record is streamed through the
subtopology. That means once the record is added to the buffer its
offset is added to the offsets to commit -- independently ofwhether the
record was evicted from the buffer and sent to the join node or not.
Now, let's assume the following scenario
1. a record is read from the input topic and added to the buffer, but
not evicted to be processed by the join node.
2. When the processing of the subtopology finishes the offset of the
record is added to the offsets to commit.
3. A commit happens.
4. A failure happens

After the failure the buffer is empty but the record will not be read
anymore from the input topic since its offset has been already
committed. The record is lost.
One solution to avoid the loss is to recreate the buffer from a
compacted Kafka topic as we do for suppression buffers. I do notthink,we need any offset checkpoint here since, we keep the buffer inmemory,
right? Or do you plan to back the buffer with a persistent store? Even
in that case, a compacted Kafka topic would be needed.


3.
From the KIP it is still not clear to me what happens if arecord is
outside of the grace period. I guess the record that falls outside of
the grace period will not be added to the buffer, but will be send to
the join node. Since it is outside of the grace period it will alsonotincrease stream time and it will not trigger an eviction. Also thehead
of the buffer will not contain a record that needs to be evicted since
the the timestamp of the head record will be within the intervalstreamtime minus grace period. Is this correct? Please add such adescription
to the KIP.
Furthermore, I think there is a mistake in the text:
"... will dequeue when the record timestamp is greater than streamtime
plus the grace period". I guess that should be "... will dequeue when
the record timestamp is less than (or equal?) stream time minus the
grace period"


5.
What is the difference between not setting the grace period andsetting
it to zero? If there is a difference, why is there a difference?


Best,
Bruno


On 01.06.23 23:58, Walker Carlson wrote:
Hey Bruno thanks for the feedback.

1)
I will add this to the kip, but stream time only advances as the when

the

buffer receives a new record.

2)
You are correct, I will add a failure section on to the kip. Sincethe
records wont change in the buffer from when they are read from the

topic

they are replicated already.

3)
I see that I'm out voted on the dropping of records thing. We willpass
them on and try to join them if possible. This might cause some null
results, but increasing the table history retention should help that.

4)
I can add some on the kip. But its pretty directly adding whateverthe
grace period is to the latency. I don't see a way around it.

Walker

On Thu, Jun 1, 2023 at 5:23 AM Bruno Cadonna <[email protected]>

wrote:

Hi Walker,

thanks for the KIP!

Here my feedback:

1.
It is still not clear to me when stream time for the bufferadvances.
What is the event that let the stream time advance? In the

discussion, I

do not understand what you mean by "The segment store already has an
observed stream time, we advance based on that. That should only

advance

based on records that enter the store." Where does this segmentstorecome from? Anyways, I think it would be great to also state howstream
time advances in the KIP.

2.
How does the buffer behave in case of a failure? I think Iunderstand
that the buffer will use an implementation of

TimeOrderedKeyValueBuffer

and therefore the records in the buffer will be replicated to atopic

in

Kafka, but I am not completely sure. Could you elaborate on this in

the

KIP?

3.
I agree with Matthias about dropping late records. We use grace

periods

in scenarios where we records are grouped like in windowed

aggregations

and windowed joins. The stream buffer you propose does not really

group

any records. It rather delays records and reorders them. I am notsureif grace period is the right naming/concept to apply here.Instead of
dropping records that fall outside of the buffer's time interval the
join should skip the buffer and try to join the recordimmediately. In
the end, a stream-table join is a unwindowed join, i.e., no grouping

is

applied to the records.
What do you and other folks think about this proposal?

4.
How does the proposed buffer, affects processing latency? Could you
please add some words about this to the KIP?


Best,
Bruno




On 31.05.23 01:49, Walker Carlson wrote:

Thanks for all the additional comments. I will either address them

here

or
update the kip accordingly.


I mentioned a follow kip to add extra features before and in the
responses.
I will try to briefly summarize what options and optimizations Iplan

to

include. If a concern is not covered in this list I for sure talk

about

it

below.

* Allowing non versioned tables to still use the stream buffer
* Automatically materializing tables instead of forcing the user to

do

it

* Configurable for in memory buffer
* Order the records in offset order or in time order
* Non memory use buffer (offset order, delayed pull from stream.)
* Time synced between stream and table side (maybe)

* Do not drop late records and process them as they come ininstead.



First, Victoria.

1) (One of your nits covers this, but you are correct it doesn'tmake

sense. so I removed that part of the example.)
For those examples with the "bad" join results I said without

buffering

the
stream it would look like that, but that was incomplete. If thelook

up

was

simply looking at the latest version of the table when the stream

records

came in then the results were possible. If we are using thepoint in

time

lookup that versioned tables let us then you are correct the future

results

are not possible.

2) I'll get to this later as Matthias brought up something related.

To your additional thoughts, I agree that we need to call those

things

out

in the documentation. I'm writing up a follow up kip with a lot of

the

ideas we have discussed so that we can improve this feature beyond

the

base
implementation if it's needed.

I addressed the nits in the kip. I somehow missed the table stream
table
join processor improvement, it makes your first question make a lot
more
sense. Table history retention is a much cleaner way todescribe it.
As to your mention of the syncing the time for the table andstream.Matthias mentioned that as well. I will address both here. Iplan to
bring
that up in the future, but for now we will leave it out. Isuppose it
will
be more useful after the table history retention is separable from

the

table grace period.


To address Matthias comments.

You are correct by saying the in memory store shouldn't cause any

semantic

concerns. My concern would be more with if we limited the number of

records

on the buffer and what we would do if we hit said limits, (emitting

those

records might be an issue, throwing an error and halting wouldnot).

think we can leave this discussion to the follow up kip alongwith a
few
other options.

I will go through your proposals now.

      - don't support non-versioned KTables
Sure, we can always expand this later on. Will include as partof the
of
the improvement kip
- if grace period is added, users need to explicitlymaterialize
the
table as version (either directly, or upstream. Upstream only works

if

downstream tables "inherit" versioned semantics -- cf KIP-914)
again, that works for me for now, if we find a use we can alwaysadd

later.


      - the table's history retention time must be larger than the

grace

period (should be easy to check at runtime, when we build the

topology)

agreed
- because switching from non-versioned to version storesis not
backward compatibly (cf KIP-914), users need to take care of this
themselves, and this also implies that adding grace period is not a
backward compatible change (even only if via indirect means)

sure, this works
As to the dropping of late records, I'm not sure. One one hand Ilike

not

dropping things. But on the other I struggle to see how a user can

filter

out late records that might have incomplete join results. The point

in

time

look up will aggressively expire old data and if new data has been

replaced

it will return null if outside of the retention. This seems like it

could

corrupt the integrity of the join output. Seeing that we drop late

records

on the table side as well I would think it makes sense to drop late

records

on the stream buffer. I could be convinced otherwise I suppose, I

could

see
adding this as an option in a follow up kip. It would be veryeasy toimplement either way. For now unless no one else objects I'mgoing to
stick
with dropping the records for the sake of getting this kippassed. It
is
functionally a small change to make and we can update later if you

feel

strongly about it.
For the ordering. I have to say that it would be morecomplicated toimplement it to be in offset order, if the goal it to get asmany of
the
records validly joined as possible. Because we would process as

things

left

the buffer a sufficiency early enough record could hold up records

that

would otherwise be valid past the table history retention. To fix

this

we
could process by timestamp then store in a second queue and emit by
offset,
but that would be a lot more complicated. If we didn't careabout not
missing some valid joins we could just have no store and pull from

the

topic at a delay only caring about the timestamp of the nextoffset.
For
now I want to stick with the timestamp ordering as it makes muchmore
sense
to me, but would propose we add both of the other options I havelaid
out
here in the follow up kip.
Lastly, I think having an empty store with zero grace periodwould be
super
simple and not costly, so we might as well make it even if nothing

gets

entered.

I hope that address all your concerns,

Walker

On Thu, May 25, 2023 at 9:50 AM Matthias J. Sax <[email protected]>

wrote:

Walker,

thanks for the updates. The KIP itself reads fine (of course

Victoria

made good comments about some phrases), but there is a couple of

things

from your latest reply I don't understand, and that I still think

need

some more discussions.

Lukas, asked about in-memory option and `WindowStoreSupplier` and

you

mention "semantic concerns". There should not be any semantic
difference
from the underlying buffer implementation, so I am not surewhat you
mean here (also the relationship to suppress() is unclear to me)?

-- I

am ok to not make it configurable for now. We can always do itvia a

follow up KIP, and keep interface changes limited for now.

Does it really make sense to allow a grace period if the table is
non-versioned? You also say: "If table is not materialized it will
materialize it as versioned." -- What history retention time would

we

pick for this case (also asked by Victoria)? Or should werather notsupport this and force the user to materialize the tableexplicitly,
and
thus explicitly picking a history retention time? It's tradeoff
between
usability and guiding uses that there will be a significant impact

on

disk usage. There is also compatibility concerns: If the table is

not

explicitly materialized in the old program, we would alreadyneed to
materialize it also in the old program (of course, we would use a
non-versioned store so far). Thus, if somebody adds a graceperiod,

we

cannot just switch the store type, as it would be a breakingchange,potentially required an application re-set, or following theupgrade
path for versioned state stores, and also changing the program to
explicitly materialize using a versioned store. Also note, that we
might
not materialize the actual join table, but only an upstream table,

and

use `ValueGetter` to access the upstream data.

To this end, as you already mentioned, history retention of the

table

should be at least grace period. You proposed to include this in a
follow up KIP, but I am wondering if it's a fundamentalrequirement
and
thus we should put a check in place right away and reject aninvalid
configuration? (It always easier to lift restriction than to

introduce

them later.) This would also imply that a non-versioned tablecannot
be
supported, because it does not have a history retention that is

larger

than grace period, and maybe also answer the requirement about
materialization: as we already always materialize something on the

tablet side as non-versioned store right now, it seemsdifficult to

migrate the store to a versioned store. Ie, it might be ok to push

the

burden onto the user and say: if you start using grace period, you

also

need to manually switch from non-versioned to versioned KTables.

Doing

stuff automatically under the hood if very complex for thiscase, we

if

we push the burden onto the user, it might be ok to not complicate

this

KIP significantly.

To summarize the last two paragraphs, I would propose to:
      - don't support non-versioned KTables
      - if grace period is added, users need to explicitly

materialize

the
table as version (either directly, or upstream. Upstream onlyworks

if

downstream tables "inherit" versioned semantics -- cf KIP-914)
      - the table's history retention time must be larger than the

grace

period (should be easy to check at runtime, when we build the

topology)

- because switching from non-versioned to version storesis not
backward compatibly (cf KIP-914), users need to take care of this
themselves, and this also implies that adding grace period isnot a
backward compatible change (even only if via indirect means)

About dropping late records: wondering if we should never drop a
stream-side record for a left-join, even if it's late? In general,

one

thing I observed over the years is, that it's easier to keep stuff

and

let users filter explicitly downstream (or make it configurable),

instead of dropping pro-actively, because users have no goodway to

resurrect record that got already dropped.

For ordering, sounds reasonable to me only start with one
implementation, and maybe make it configurable as a follow up.

However,

I am wondering if starting with offset order might be the better

option

as it seems to align more with what we do so far? So instead of

storing

record ordered by timestamp, we can just store them ordered by

offset,

and still "poll" from the buffer based on the head records

timestamp.

Or
would this complicate the implementation significantly?
I also think it's ok to not "sync" stream-time between thetable and
the
stream in this KIP, but we should consider doing this as afollow up
change (not sure if we would need a KIP or not for a change this
this).
About increasing/decreasing grace period: what you describe make

sense

to me. If decreased, the next record would just trigger emitting a

lot

of records, and for increase, the buffer would just need to "fill

up"

again. For reprocessing getting a different result with adifferent
grace period is expected, so that's ok IMHO. -- There seems to be

one

special corner case: grace period zero. For this case, we actually

don't

need any store, and the stream-side could be stateless. I think it

can

have the same behavior, but if we want to "add / remove" the store
dynamically, we need to add specific code for it. For example,even

if

we start up with a grace period of zero, we would need to check if

there

is a local store, and still emit everything in it, before we can

ditch

the store (not sure if that's even easily done at all). Or: wewould
need to have a store for _all_ cases, even if grace period is zero

(the

store would be empty all the time though), to avoid super complex

code?



-Matthias


On 5/25/23 10:53 AM, Lucas Brutschy wrote:

Hi Walker,

thanks for your responses. That makes sense. I guess there is

always

the option to make the implementation more configurable later on,

if

users request it. Also thanks for the clarifications. From myside,

the KIP is good to go.

Cheers,
Lucas

On Wed, May 24, 2023 at 11:54 PM Victoria Xia
<[email protected]> wrote:


Thanks for the updates, Walker! Looks great, though I do have a

couple

questions about the latest updates:
1. The new example says that without stream-sidebuffering,

"ex"

and

        "fy" are possible join results. How could those join

results

happen? The
example versioned table suggests that table record"x" has
timestamp 2, and
table record "y" has timestamp 3. If stream record"e" has
timestamp 1,
        then it can never be joined against record "x", and

similarly

for

stream

        record "f" with timestamp 2 being joined against "y".
        2. I see in your replies above that "If table is not

materialized it

        will materialize it as versioned" but I don't see this

called

out

in the
KIP -- seems worth calling out. Also, what will thehistory
retention for
        the versioned table be? Will it be the same as the join

grace

period, or
        will it be greater?


And some additional thoughts:
Sounds like there are a few things users should watch out forwhen
enabling
the stream-side buffer:
- Records will get "stuck" if there are no newerrecords to
advance
        stream time.
        - If there are large gaps between the timestamps of
stream-side
records,
then it's possible that versioned store historyretention

will

have

expired

        by the time a record is evicted from the join buffer,

leading

to

join

        "miss." For example, if the join grace period and table

history

retention

        are both 10, and records come in the order:

        table side t0 with ts=0
        stream side s1 with ts=1 <-- enters buffer
        table side t10 with ts=10
        table side t20 with ts=20
        stream side s21 with ts=21 <-- evicts record s1 from

buffer,

but

        versioned store no longer contains data for ts=1 due to

history

retention
        having elapsed
This will result in the join result (s1, null) eventhough

it

should've

        been (s1, t0), due to t0 having been expired from the

versioned

store

        already.
        - Out-of-order records from the stream-side will be

reordered,

and

late
        records will be dropped.
I don't think any of these are reasons to not go forward withthis

KIP,

but

it'd be good to call them out in the eventual documentation to

decrease

the

chance users get tripped up.

We could maybe do an improvement later to advance stream time

from

table

side as well, but that might be debatable as we might get more

late

records.


Yes, the likelihood of late records increases but also the

likelihood

of

"join misses" due to versioned store history retention having

elapsed

decreases, which feels important for certain use cases. Either

way,

agreed

that it can be a discussion for the future as incorporating this

would

substantially complicate the implementation.

Also a couple nits:

        - The KIP currently says "We recently added versioned

tables

which

allow

        the table side of the a join [...] but it is not taken

advantage

of

in

        joins," but this doesn't seem true? If the table of a

stream-table

join is

versioned, then the DSL's stream-table join processorwill

automatically

        perform timestamped lookups into the table, in order to

take

advantage of
        the new timestamp-aware store to provide better join
semantics.
- The KIP mentions "grace period" for versionedstores in a
number
of
places but I think you actually mean "historyretention"?

The

two

happen to

        be the same today (it is not an option for users to

configure

the

two

        separately) but this need not be true in the future.

"History

retention"
governs how far back in time reads may occur, whichis the
relevant
        parameter for performing lookups as part of the

stream-table

join.

"Grace

        period" in the context of versioned stores refers to how

far

back

in time

        out-of-order writes may occur, which probably isn't

directly

relevant for
introducing a stream-side buffer, though it's alsopossible

I've

overlooked

        something. (As a bonus, switching from "table grace

period" in

the
KIP to
"table history retention" also helps toclarify/distinguish

that

it's a
different parameter from the "join grace period,"which I

could

see

being

        confusing to readers. :) )


Cheers,
Victoria

On Thu, May 18, 2023 at 1:43 PM Walker Carlson
<[email protected]> wrote:

Hey all,
Thanks for the comments, they gave me a lot to think about.I'll

try

to

address them all inorder. I have made some updates to the kip

related

to

them, but I mention where below.

Lucas

Good idea about the example. I added a simple one.

1) I have thought about including options for the underlying

buffer

configuration. One of which might be adding an in memoryoption.

My

biggest

concern is about the semantic guarantees. This isn't like

suppress

or

with

windows where producing incomplete results is repetitively

harmless.

Here

we would be possibly producing incorrect results. I also would

like

to

keep

the interface changes as simple as I can. Making more than this

change

to

Joined I feel could make this more complicated than it needs to

be.

If
we
really want to I could see adding a grace() option with a
BufferConifg
in
there or something, but I would rather not.
2) The buffer will be independent of if the table isversioned or
not.
If
table is not materialized it will materialize it asversioned. It
might
make sense to do a follow up kip where we force the retention
period
of
the versioned to be greater than whatever the max of the stream
buffer
is.
Victoria
1) Yes, records will exit in timestamp order not in offsetorder.
2) Late records will be dropped (Late as out of the grace

period).

    From my

understanding that is the point of a grace period, no? Doesn't

the

same

thing happen with versioned stores?
3) The segment store already has an observed stream time, we

advance

based
on that. That should only advance based on records thatenter the
store. So
yes, only stream side records. We could maybe do an improvement

later

to
advance stream time from table side as well, but that might be
debatable as
we might get more late records. Anyways I would rather havethat

as a

separate discussion.

in memory option? We can do that, for the buffer I plan to use

the

TimeOrderedKeyValueBuffer interface which already has an in

memory

implantation, so it would be simple.
I said more in my answer to Lucas's question. The concern Ihave

with

buffer configs or in memory is complicating the interface. Also

semantic

guarantees but in memory shouldn't effect that

Matthias

1) fixed out of order vs late terminology in the kip.

2) I was referring to having a stream. So after this kip we can

have

buffered stream or a normal one. For the table we can use a

versioned

table

or a normal table.

3 Good call out. I clarified this as "If the table side uses a

materialized

version store, it can store multiple versions of each record

within

its

defined grace period." and modified the rest of the paragraph a

bit.

4) I get the preserving off offset ordering, but if thestream is
buffered
to join on timestamp instead of offset doesn't it already seem

like

we
care
more about time in this case?
If we end up adding more options it might make sense to dothis.
Maybe
offset order processing can be a follow up?
I'll add a section for this in Rejected Alternatives. Ithink it
makes
sense to do something like this but maybe in a follow up.
5) I hadn't thought about this. I suppose if they changedthis in
an
upgrade the next record would either evict a lot of records (if

the

grace

period decreased) or there would be a pause until the new grace

period

reached. Increasing is a bit more problematic, especially ifthe

table

grace period and retention time stays the same. If the data is

reprocessed

after a change like that then there would be different results,

but I

feel

like that would be expected after such a change.

What do you think should happen?

Hopefully this answers your questions!

Walker

On Mon, May 8, 2023 at 11:32 AM Matthias J. Sax <

[email protected]>

wrote:
Thanks for the KIP! Also some question/comments from my side:
10) Notation: you use the term "late data" but I think youmeanout-of-order. We reserve the term "late" to records thatarrive
after
grace period passed, and thus, "late == out-of-order data that

is

dropped".



20) "There is only one option from the stream side and only

recently

is

there a second option on the table side."

What are those options? Victoria already asked about the table

side,

but

I am also not sure what option you mean for the stream side?


30) "If the table side uses a materialized version store the

value

is

the latest by stream time rather than by offset within its

defined

grace
period."
The phrase "the value is the latest by stream time" isconfusing
--
in
the end, a versioned stores multiple versions, not just one.
40) I am also wondering about ordering. In general, KStries to
preserve
offset-order during processing (with some exception, whenoffset
order
preservation is not clearly defined). Given that thestream-side
buffer
is really just a "linear buffer", we could easily preserve
offset-order.
But I also see a benefit of re-ordering and emitting

out-of-order

data

right away when read (instead of blocking them behind in-order

records

that are not ready yet). -- It might even be a possibility, to

let

users

pick a emit strategy eg "EmitStrategy.preserveOffsets" (name

just

placeholder).

The KIP should explain this in more detail and also discuss

different

options and mention them in "Rejected alternatives" in case we

don't

want to include them.
50) What happens when users change the grace period?Especially,

when

they turn it on/off (but also increasing/decreasing is an

interesting

point)? I think we should try to support this if possible; the
"Compatibility" section needs to cover switching on/off inmore
detail.
-Matthias




On 5/2/23 2:06 PM, Victoria Xia wrote:
Cool KIP, Walker! Thanks for sharing this proposal.

A few clarifications:
1. Is the order that records exit the buffer innecessarily the

same

as

the

order that records enter the buffer in, or no? Based on the

description

in

the KIP, it sounds like the answer is no, i.e., records will

exit

the
buffer in increasing timestamp order, which means thatthey may
be
ordered
(even for the same key) compared to the input order.

2. What happens if the join grace period is nonzero, and a
stream-side
record arrives with a timestamp that is older than thecurrent
stream
time
minus the grace period? Will this record trigger a joinresult,
or
will
it
be dropped? Based on the description for what happens whenthe
join
grace
period is set to zero, it sounds like the late record will be
dropped,
even
if the join grace period is nonzero. Is that true?

3. What could cause stream time to advance, for purposes of
removing
records from the join buffer? For example, will new records
arriving
on
the
table side of the join cause stream time to advance? From the

KIP

it
sounds
like only stream-side records will advance stream time --does

that

mean

that the join processor itself will have to track this stream

time?


Also +1 to Lucas's question about what options will be

available

for

configuring the join buffer. Will users have the option to

choose

whether

they want the buffer to be in-memory vs persistent?

- Victoria

On Fri, Apr 28, 2023 at 11:54 AM Lucas Brutschy
<[email protected]> wrote:

HI Walker,

thanks for the KIP! We definitely need this. I have two

questions:

- Have you considered allowing the customizationof the

underlying

buffer implementation? As I can see, `StreamJoined` lets you

customize

the underlying store via a `WindowStoreSupplier`. Would it

make

sense
for `Joined` to have this as well? I can imagine one maywant

to

limit

the number of records in the buffer, for example. If we hit

the

maximum, the only option would be to drop semanticguarantees,
but
users may still want to do this.
       - With "second option on the table side" you are

referring

to

versioned tables, right? Will the buffer on the stream side

behave

any

different whether the table side is versioned or not?

Finally, I think a simple example in the motivation section

could

help

non-experts understand the KIP.

Best,
Lucas

On Tue, Apr 25, 2023 at 9:13 PM Walker Carlson
<[email protected]> wrote:

Hello everybody,
I have a stream proposal to improve the stream tablejoin by

adding a

grace

period and buffer to the stream side of the join to allow

processing

in

timestamp order matching the recent improvements of the

versioned

tables.


Please take a look here <

https://cwiki.apache.org/confluence/x/lAs0Dw>

and

share your thoughts.

best,
Walker

Re: [DISCUSS] KIP-923: Add A Grace Period to Stream Table Join

Reply via email to