Re: [DISCUSS] KIP-1024: Make the restore behavior of GlobalKTables with custom processors configureable

Bruno Cadonna Mon, 01 Apr 2024 13:33:50 -0700

Hi Walker and Matthias,

(2)

That is exactly my point about having a compile time error versus aruntime error. The added flexibility as proposed by Matthias sounds goodto me.

Regarding the Named parameter, I was not aware that the processor thatwrites records to the global state store is named according to the namepassed in by Consumed. I thought Consumed strictly specifies the namesof source processors. So I am fine with not having an overload with aNamed parameter.


Best,
Bruno

On 3/31/24 11:30 AM, Matthias J. Sax wrote:

Two more follow up thoughts:

(1) I am still not a big fan of the boolean parameter we introduce. Didyou consider to use different method names, like`addReadOnlyGlobalStore()` (for the optimized method, that would notreprocess data on restore), and maybe add `addModifiableGlobalStore()`(not a good name, but we cannot re-use existing `addGlobalStore()` --maybe somebody else has a good idea about a better `addXxxGlobalStore`that would describe it well).

(2) I was thinking about Bruno's comment to limit the scope the storebuilder for the optimized case. I think we should actually do somethingabout it, because in the end, the runtime (ie, the `Processor` we hardwire) would need to pick a store it supports and cast to thecorresponding store? If the cast fails, we hit a runtime exception, butby putting the store we cast to into the signature we can actuallyconvert it into a compile time error what seems better. -- If we want,we could make it somewhat flexible and support both `KeyValueStore` and`TimestampedKeyValueStore` -- ie, the signature would be `KeyValueStore`but we explicitly check if the builder gives us a`TimestampedKeyValueStore` instance and use it properly.

If putting the signature does not work for some reason, we should atleast clearly call it out in the JavaDocs what store type is expected.




-Matthias



On 3/28/24 5:05 PM, Walker Carlson wrote:

Hey all,

Thanks for the feedback Bruno, Almog and Matthias!

Almog: I like the idea, but I agree with Matthais. I actually looked at
that ticket a bit when doing this and found that while similar they are
actually pretty unrelated codewise. I would love to see it get taken care
of.

Bruno and Matthias: The Named parameter doesn't really make sense tome to

put it here. The store in the Store builder is already named through what
Matthais described and the processor doesn't actually have a name. That
would be the processor node that gets named via the Named parameter  (in

the DSL) and the internal streams builder uses the consumed object tomake

a source name. I think we should just keep the Consumed object and used
that for the processor node name.

As for the limitation of the store builder interface I don't think it is
necessary. It could be something we add later if we really want to.

Anyways I think we are getting close enough to consensus that I'mgoing to

open a vote and hopefully we can get it voted on soon!

best,
Walker

On Thu, Mar 28, 2024 at 5:55 AM Matthias J. Sax <[email protected]> wrote:

Hey,

looking into the API, I am wondering why we would need to add an
overload talking a `Named` parameter?

StreamsBuilder.addGlobalStore() (and .addGlobalTable()) already takes a
`Consumed` parameter that allows to set a name.

2.
I do not understand what you mean with "maximum flexibility". The

built-in processor needs to assume a given state store interface. That

means, users have to provide a state store that offers thatinterface. If

they do not they will get a runtime exception. If we require a store
builder for a given interface, we can catch the mistake at compile time.
Let me know whether I misunderstood something.

Yes, we could catch it at runtime. But I guess what I was trying to say
is different: I was trying to say, we should not limit the API to always
require a specific store, such that global stores can only be of a
certain type. Global Stores should be allowed to be of any type. Hence,
if we add a built-in processor, it can only be one option, and we always
need to support custom processor, and might also want to try to allow
the restore optimization for custom processor (and thus other store
types), not just for our built-in processor (and our built-in stores).
Coupling the optimization to built-in stores would prevent us to apply
the optimization to custom stores.



@Almog: interesting idea. I tend to think that both issues are
orthogonal. If users pick to apply the optimization "added" by this KIP,
the bug you mentioned would still apply to global stores, and thus this
KIP is not addressing the issue you mentioned.

Personally, I also think that we don't need a KIP to fix the ticket you
mentioned? In the end, we need to skip records during restore, and it
seems it does not make sense to make this configurable?



-Matthias


On 3/26/24 4:24 PM, Almog Gavra wrote:

Thanks for the thoughts Bruno!

Do you mean a API to configure restoration instead of boolean flag

reprocessOnRestore?

Yes, this is exactly the type of thing I was musing (but I don'thave any

concrete suggestions). It feels like that would give the flexibility to

do

things like the motivation section of the KIP (allow bulk loading of
records without reprocessing) while also solving other limitations.

I'm supportive of the KIP as-is but was hoping somebody with more
experience would have a sudden inspiration for how to solve both issues
with one API! Anyway, I'll slide back into the lurking shadows for now

and

let the discussion continue :)

Cheers,
Almog

On Tue, Mar 26, 2024 at 4:22 AM Bruno Cadonna <[email protected]>

wrote:

Hi Almog,

Do you mean a API to configure restoration instead of boolean flag
reprocessOnRestore?

Do you already have an idea?

The proposal in the KIP is focused on the processor that updates the
global state whereas in the case of GlobalKTable and source KTable the

issues lies in the deserialization of records from the inputtopics, but

only if the deserialization error handler is configured to drop the
problematic record. Additionally, for source KTable the source topic

optimization must be turned on to run into the issue. I amwondering how

a unified API for global stores, GlobalKTable, and source KTable might
look like.

While it is an interesting question, I am in favor of deferringthis to

a separate KIP.

Best,
Bruno

On 3/26/24 12:49 AM, Almog Gavra wrote:

Hello Folk!

Glad to see improvements to the GlobalKTables in discussion! I think

they

deserve more love :)

Scope creep alert (which I'm generally against and certainly still

support

this KIP without but I want to see if there's an elegant way toaddress
both problems). The KIP mentions that "Now the restore is done by
reprocessing using an instance from the customer processor supplier"

which

I suppose fixed a long-standing bug (
https://issues.apache.org/jira/browse/KAFKA-8037) but only for

GlobalKTables and not for normal KTables that use thesource-changelog

optimization. Since this API could be used to signal "I want to

reprocess

on restore" I'm wondering whether it makes sense to design thisAPI in

way that could be extended for KTables as well so a fix forKAFKA-8037

would be possible with the same mechanism. Thoughts?

Cheers,
Almog

On Mon, Mar 25, 2024 at 11:06 AM Walker Carlson
<[email protected]> wrote:

Hey Bruno,

1) I'm actually not sure why that is in there. It certainly doesn't

match

the convention. Best to remove it and match the other methods.

2) Yeah, I thought about it but I'm not convinced it is a necessary
restriction. It might be useful for the already definedprocessors but

then

they might as well use the `globalTable` method. I think the addstate
store option should go for maximum flexibility.

Best,
Walker



On Fri, Mar 22, 2024 at 10:01 AM Bruno Cadonna <[email protected]>

wrote:

Hi Walker,

A couple of follow-up questions.

1.
Why do you propose to explicitly pass a parameter "storeName" in
StreamsBuilder#addGlobalStore?
The StoreBuilder should already provide a name for the store, if I
understand the code correctly.
I would avoid using the same name for the source node and the state
store, because it limits the flexibility in naming. Why do you not

use

Named for the name of the source node?

2.

Did you consider Matthias' proposal to restrict the type of thestore

builder to `StoreBuilder<TimestampedKeyValueStore>` (or even
`StoreBuilder<? extends TimestampedKeyValueStore>`) for the case

where

the processor is built-in?


Best,
Bruno

On 3/13/24 11:05 PM, Walker Carlson wrote:

Thanks for the feedback Bruno, Matthias, and Lucas!
There is a decent amount but I'm going to try and just hit themajor

points

as I would like to keep this change simple.

I've made corrections for the mistakes pointed out. Thanks for the
suggestions everyone.

The main sticking point seems to be with the method of signalling

the

restore behavior. It seems we can all agree with how the APIshould
look
with the default option we are adding. I think keeping theoption to
load
directly from the topic into the store is a good idea. It is much

more

performant and could make a simple metric collector processor much
simpler.
I think something that Matthais said about creating a specialclass

of

processors for the global stores helps me think about the issue. I
tend
to
fall into the category that we should keep global stores opento thepossibility of having child nodes in the future. I don't reallysee
the
downside of having that as an option. It might not be best fora lot
of
cases, but something simple could be very useful to put in thePAPI.
I like the idea of having a `GlobalStoreParameters` but only if we
decide
to make the processor need to extend an interface like
'GobalStoreProcessor`. If not that seems excessive.

As of right now I don't see a better option than having a boolean

flag

for

the reprocessOnRestore option. I expanded the description in the

docs

so

hope that helps.

I am more than willing to take other ideas on it.

thanks,
Walker

Re: [DISCUSS] KIP-1024: Make the restore behavior of GlobalKTables with custom processors configureable

Reply via email to