Hi Matthias,
Originally I had written a really long answer against deprecations of the method. When I finished writing, I realised that I actually agree with all your points. As you said, if a user is worried about how a table store is persisted, she will almost always also be worried about how the subscription store is persisted. Thus we should deprecate the method with only table store materialization. I will change the KIP to reflect that. I will also add that we need to provide information to the users-docs about subscription stores, since I couldn’t find anything on Kafka documentation about it. One small question - with us deprecating the API call, should we still have a bug-fix to piggyback on the materialisation method provided to the table-store? Not sure if you plan to have it removed at 3.0.0 or would like to have a longer deprecation time. Cheers, Marco Lotz On Wed, Apr 14, 2021 at 11:29 PM Matthias J. Sax <mj...@apache.org> wrote: > If you argue this way, would the consequence not be that we would need > to add even more overloads that allow to only use Materialized for the > subscription store? > > -> I only want to switch to in-memory for the subscription store: Why do > I need to set Materialized for the main store? > > Btw: the above is not something I would really argue for. > > > The surface area of our public API is quite big and tend to overwhelm > users, and thus I would like to keep it small is possible. > > > > Must the user know about the subscription-stores beforehand? If yes, then > > indeed we should deprecate. > > I tend to argue "yes". > > > Keeping the old methods seems to be a rare use case IMHO. It seems most > likely that user want to overwrite all stores. And thus, it seems to be > error prone: I want to overwrite the stores, but I forget to overwrite > the subscription store. > > And even if one wants to keep RocksDB for the subscription stores only, > they can still pass in both Materialized object using the new methods. > > > > -Matthias > > > On 4/14/21 1:23 PM, Marco Aurélio Lotz wrote: > > Hi Matthias, > > > > Thanks for reviewing it. That's a nice question. I can provide you the > > answer as a user of the API - but of course it very much depends on what > > the original intention towards the user is. > > The only reason I found out it was opinionated was when my tests using > > in-memory materialization started failing due to the state-store clean-up > > bug > > < > https://stackoverflow.com/questions/50602512/failed-to-delete-the-state-directory-in-ide-for-kafka-stream-application > > > > in Windows systems. > > > > To be honest, I think I wouldn't have investigated how subscription > stores > > are used without seeing that my system couldn't handle the expected data. > > Must the user know about the subscription-stores beforehand? If yes, then > > indeed we should deprecate. > > > > Having said that, it seems to me that deprecating would not be consistent > > with other parts of the API that have parameterized methods hiding > > complexity. > > E.g. from the top of my head, join() will make a rocksDB materialization > if > > no-materialization method is provided. > > > > public <V1, R> KTable<K, R> join(final KTable<K, V1> other, final > > ValueJoiner<? super V, ? super V1, ? extends R> joiner0 > > > > > > I see this as something equivalent. If the user didn't provide all the > > methods, shouldn't it be fine to fallback to a default materialization > > method as the other sections of the API already does? > > In that scenario, there's no need to deprecate and we just need to make a > > reasonable guess towards it. Using the same method as the table-store > > sounds like a reasonable guess in this scenario. > > > > In my perspective as a user, I would prefer to have simpler methods > knowing > > I would be losing some fine-grain configuration. Does it sound reasonable > > to you? > > > > Cheers, > > Marco Lotz > > > > On Wed, Apr 14, 2021 at 8:47 PM Matthias J. Sax <mj...@apache.org> > wrote: > > > >> I just re-read the KIP, and I am wondering why we would *only add* new > >> methods. > >> > >> Is it an expected use case to only change the main stores but not the > >> subscription stores? > >> > >> Wondering if we should deprecate the existing methods that only accept a > >> single `Materialized` parameter? > >> > >> > >> -Matthias > >> > >> On 4/9/21 1:32 PM, Marco Aurélio Lotz wrote: > >>> Hi everyone, > >>> > >>> I am fine with sticking with Materialized and adding the info to the > >>> javadoc then - so we keep the footprint smaller. > >>> I will then update the KIP to match what we discussed here and send it > >> for > >>> a vote next week. > >>> > >>> I will raise a new bug-fix ticket and change KAFKA-10383 > >>> <https://issues.apache.org/jira/browse/KAFKA-10383> to become a > feature > >> - > >>> if that's ok. > >>> > >>> Cheers, > >>> Marco > >>> > >>> On Wed, Apr 7, 2021 at 4:15 AM Matthias J. Sax <mj...@apache.org> > wrote: > >>> > >>>> Just catching up here. > >>>> > >>>> I agree that we have two issue, and the first (align subscription > store > >>>> to main store) can be done as a bug-fix. > >>>> > >>>> For the KIP (that addressed the second), I tend to agree that reusing > >>>> `Materialized` might be better as it would keep the API surface area > >>>> smaller. > >>>> > >>>> > >>>> -Matthias > >>>> > >>>> On 4/6/21 8:48 AM, John Roesler wrote: > >>>>> Hi Marco, > >>>>> > >>>>> Just a quick clarification: I just reviewed the Materialized class. > It > >>>> looks like the only undesirable members are: > >>>>> 1. Retention > >>>>> 2. Key/Value serdes > >>>>> > >>>>> The underlying store type would be “KeyValueStore<Bytes,byte[]>” , > for > >>>> which case the withRetention javadoc already says it’s ignored. > >>>>> > >>>>> Perhaps we could just stick with Materialized by adding a note to the > >>>> Key/Value serdes setters that they are ignored for FKJoin subscription > >>>> stores? > >>>>> > >>>>> Not as elegant as a new config class, but these config classes > actually > >>>> bring a fair amount of complexity, so it might be nice to avoid a new > >> one. > >>>>> > >>>>> Thanks, > >>>>> John > >>>>> > >>>>> On Tue, Apr 6, 2021, at 10:28, Marco Aurélio Lotz wrote: > >>>>>> Hi John / Guozhang, > >>>>>> > >>>>>> If I correctly understood John's message, he agrees on having the > two > >>>>>> scenarios (piggy-back and api extension). In my view, these two > >>>> scenarios > >>>>>> are separate tasks - the first one is a bug-fix and the second is an > >>>>>> improvement on the current API. > >>>>>> > >>>>>> - bug-fix: On the current API, we change its implementation to piggy > >>>> back > >>>>>> on the materialization method provided to the materialized > parameter. > >>>> This > >>>>>> way it will not be opinionated anymore and will not force RocksDb > >>>>>> persistence for subscription store. Thus an in-memory materialized > >>>>>> parameter would imply an in-memory subscription store, for example. > >>>> From my > >>>>>> understanding, the original implementation tried to be as > unopionated > >>>>>> towards storage methods as possible - and the current implementation > >> is > >>>> not > >>>>>> allowing that. Was that the case? We would still need to add this > >>>>>> modification to the update notes, since it may affect some > >> deployments. > >>>>>> > >>>>>> - improvement: We extend the API to allow a user to fine tune > >> different > >>>>>> materialization methods for subscription and join store. This is > done > >> by > >>>>>> adding a new parameter to the associated methods. > >>>>>> > >>>>>> Does it sound reasonable to you Guozhang? > >>>>>> On your question, does it make sense for an user to decide retention > >>>>>> policies (withRetention method) or caching for subscription stores? > I > >>>> can > >>>>>> see why to finetune Logging for example, but in a first moment not > >> these > >>>>>> other behaviours. That's why I am unsure about using Materialized > >> class. > >>>>>> > >>>>>> @John, I will update the KIP with your points as soon as we clarify > >>>> this. > >>>>>> > >>>>>> Cheers, > >>>>>> Marco > >>>>>> > >>>>>> On Tue, Apr 6, 2021 at 1:17 AM Guozhang Wang <wangg...@gmail.com> > >>>> wrote: > >>>>>> > >>>>>>> Thanks Marco / John, > >>>>>>> > >>>>>>> I think the arguments for not piggy-backing on the existing > >>>> Materialized > >>>>>>> makes sense; on the other hand, if we go this route should we just > >> use > >>>> a > >>>>>>> separate Materialized than using an extended / > >>>>>>> narrowed-scoped MaterializedSubscription since it seems we want to > >>>> allow > >>>>>>> users to fully customize this store? > >>>>>>> > >>>>>>> Guozhang > >>>>>>> > >>>>>>> On Thu, Apr 1, 2021 at 5:28 PM John Roesler <vvcep...@apache.org> > >>>> wrote: > >>>>>>> > >>>>>>>> Thanks Marco, > >>>>>>>> > >>>>>>>> Sorry if I caused any trouble! > >>>>>>>> > >>>>>>>> I don’t remember what I was thinking before, but reasoning about > it > >>>> now, > >>>>>>>> you might need the fine-grained choice if: > >>>>>>>> > >>>>>>>> 1. The number or size of records in each partition of both tables > is > >>>>>>>> small(ish), but the cardinality of the join is very high. Then you > >>>> might > >>>>>>>> want an in-memory table store, but an on-disk subscription store. > >>>>>>>> > >>>>>>>> 2. The number or size of records is very large, but the join > >>>> cardinality > >>>>>>>> is low. Then you might need an on-disk table store, but an > in-memory > >>>>>>>> subscription store. > >>>>>>>> > >>>>>>>> 3. You might want a different kind (or differently configured) > store > >>>> for > >>>>>>>> the subscription store, since it’s access pattern is so different. > >>>>>>>> > >>>>>>>> If you buy these, it might be good to put the justification into > the > >>>> KIP. > >>>>>>>> > >>>>>>>> I’m in favor of the default you’ve proposed. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> John > >>>>>>>> > >>>>>>>> On Mon, Mar 29, 2021, at 04:24, Marco Aurélio Lotz wrote: > >>>>>>>>> Hi Guozhang, > >>>>>>>>> > >>>>>>>>> Apologies for the late answer. Originally that was my proposal - > to > >>>>>>>>> piggyback on the provided materialisation method ( > >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-10383). > >>>>>>>>> John Roesler suggested to us to provide even further fine tuning > on > >>>> API > >>>>>>>>> level parameters. Maybe we could see this as two sides of the > same > >>>>>>> coin: > >>>>>>>>> > >>>>>>>>> - On the current API, we change it to piggy back on the > >>>> materialization > >>>>>>>>> method provided to the join store. > >>>>>>>>> - We extend the API to allow a user to fine tune different > >>>>>>>> materialization > >>>>>>>>> methods for subscription and join store. > >>>>>>>>> > >>>>>>>>> What do you think? > >>>>>>>>> > >>>>>>>>> Cheers, > >>>>>>>>> Marco > >>>>>>>>> > >>>>>>>>> On Thu, Mar 4, 2021 at 8:04 PM Guozhang Wang <wangg...@gmail.com > > > >>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Thanks Marco, > >>>>>>>>>> > >>>>>>>>>> Just a quick thought: what if we reuse the existing Materialized > >>>>>>>> object for > >>>>>>>>>> both subscription and join stores, instead of introducing a new > >>>>>>> param / > >>>>>>>>>> class? > >>>>>>>>>> > >>>>>>>>>> Guozhang > >>>>>>>>>> > >>>>>>>>>> On Tue, Mar 2, 2021 at 1:07 AM Marco Aurélio Lotz < > >>>>>>>> cont...@marcolotz.com> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Hi folks, > >>>>>>>>>>> > >>>>>>>>>>> I would like to invite everyone to discuss further KIP-718: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>> > >>>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-718%3A+Make+KTable+Join+on+Foreign+key+unopinionated > >>>>>>>>>>> > >>>>>>>>>>> I welcome all feedback on it. > >>>>>>>>>>> > >>>>>>>>>>> Kind Regards, > >>>>>>>>>>> Marco Lotz > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> -- Guozhang > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> -- Guozhang > >>>>>>> > >>>>>> > >>>> > >>> > >> > > >