Hi Matthias,

Originally I had written a really long answer against deprecations of the
method. When I finished writing, I realised that I actually agree with all
your points.


As you said, if a user is worried about how a table store is persisted, she
will almost always also be worried about how the subscription store is
persisted. Thus we should deprecate the method with only table store
materialization.


I will change the KIP to reflect that. I will also add that we need to
provide information to the users-docs about subscription stores, since I
couldn’t find anything on Kafka documentation about it.


One small question - with us deprecating the API call, should we still have
a bug-fix to piggyback on the materialisation method provided to the
table-store? Not sure if you plan to have it removed at 3.0.0 or would like
to have a longer deprecation time.


Cheers,

Marco Lotz

On Wed, Apr 14, 2021 at 11:29 PM Matthias J. Sax <mj...@apache.org> wrote:

> If you argue this way, would the consequence not be that we would need
> to add even more overloads that allow to only use Materialized for the
> subscription store?
>
> -> I only want to switch to in-memory for the subscription store: Why do
> I need to set Materialized for the main store?
>
> Btw: the above is not something I would really argue for.
>
>
> The surface area of our public API is quite big and tend to overwhelm
> users, and thus I would like to keep it small is possible.
>
>
> > Must the user know about the subscription-stores beforehand? If yes, then
> > indeed we should deprecate.
>
> I tend to argue "yes".
>
>
> Keeping the old methods seems to be a rare use case IMHO. It seems most
> likely that user want to overwrite all stores. And thus, it seems to be
> error prone: I want to overwrite the stores, but I forget to overwrite
> the subscription store.
>
> And even if one wants to keep RocksDB for the subscription stores only,
> they can still pass in both Materialized object using the new methods.
>
>
>
> -Matthias
>
>
> On 4/14/21 1:23 PM, Marco Aurélio Lotz wrote:
> > Hi Matthias,
> >
> > Thanks for reviewing it. That's a nice question. I can provide you the
> > answer as a user of the API - but of course it very much depends on what
> > the original intention towards the user is.
> > The only reason I found out it was opinionated was when my tests using
> > in-memory materialization started failing due to the state-store clean-up
> > bug
> > <
> https://stackoverflow.com/questions/50602512/failed-to-delete-the-state-directory-in-ide-for-kafka-stream-application
> >
> > in Windows systems.
> >
> > To be honest, I think I wouldn't have investigated how subscription
> stores
> > are used without seeing that my system couldn't handle the expected data.
> > Must the user know about the subscription-stores beforehand? If yes, then
> > indeed we should deprecate.
> >
> > Having said that, it seems to me that deprecating would not be consistent
> > with other parts of the API that have parameterized methods hiding
> > complexity.
> > E.g. from the top of my head, join() will make a rocksDB materialization
> if
> > no-materialization method is provided.
> >
> > public <V1, R> KTable<K, R> join(final KTable<K, V1> other, final
> > ValueJoiner<? super V, ? super V1, ? extends R> joiner0
> >
> >
> > I see this as something equivalent. If the user didn't provide all the
> > methods, shouldn't it be fine to fallback to a default materialization
> > method as the other sections of the API already does?
> > In that scenario, there's no need to deprecate and we just need to make a
> > reasonable guess towards it. Using the same method as the table-store
> > sounds like a reasonable guess in this scenario.
> >
> > In my perspective as a user, I would prefer to have simpler methods
> knowing
> > I would be losing some fine-grain configuration. Does it sound reasonable
> > to you?
> >
> > Cheers,
> > Marco Lotz
> >
> > On Wed, Apr 14, 2021 at 8:47 PM Matthias J. Sax <mj...@apache.org>
> wrote:
> >
> >> I just re-read the KIP, and I am wondering why we would *only add* new
> >> methods.
> >>
> >> Is it an expected use case to only change the main stores but not the
> >> subscription stores?
> >>
> >> Wondering if we should deprecate the existing methods that only accept a
> >> single `Materialized` parameter?
> >>
> >>
> >> -Matthias
> >>
> >> On 4/9/21 1:32 PM, Marco Aurélio Lotz wrote:
> >>> Hi everyone,
> >>>
> >>> I am fine with sticking with Materialized and adding the info to the
> >>> javadoc then - so we keep the footprint smaller.
> >>> I will then update the KIP to match what we discussed here and send it
> >> for
> >>> a vote next week.
> >>>
> >>> I will raise a new bug-fix ticket and change KAFKA-10383
> >>> <https://issues.apache.org/jira/browse/KAFKA-10383> to become a
> feature
> >> -
> >>> if that's ok.
> >>>
> >>> Cheers,
> >>> Marco
> >>>
> >>> On Wed, Apr 7, 2021 at 4:15 AM Matthias J. Sax <mj...@apache.org>
> wrote:
> >>>
> >>>> Just catching up here.
> >>>>
> >>>> I agree that we have two issue, and the first (align subscription
> store
> >>>> to main store) can be done as a bug-fix.
> >>>>
> >>>> For the KIP (that addressed the second), I tend to agree that reusing
> >>>> `Materialized` might be better as it would keep the API surface area
> >>>> smaller.
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>> On 4/6/21 8:48 AM, John Roesler wrote:
> >>>>> Hi Marco,
> >>>>>
> >>>>> Just a quick clarification: I just reviewed the Materialized class.
> It
> >>>> looks like the only undesirable members are:
> >>>>> 1. Retention
> >>>>> 2. Key/Value serdes
> >>>>>
> >>>>> The underlying store type would be “KeyValueStore<Bytes,byte[]>” ,
> for
> >>>> which case the withRetention javadoc already says it’s ignored.
> >>>>>
> >>>>> Perhaps we could just stick with Materialized by adding a note to the
> >>>> Key/Value serdes setters that they are ignored for FKJoin subscription
> >>>> stores?
> >>>>>
> >>>>> Not as elegant as a new config class, but these config classes
> actually
> >>>> bring a fair amount of complexity, so it might be nice to avoid a new
> >> one.
> >>>>>
> >>>>> Thanks,
> >>>>> John
> >>>>>
> >>>>> On Tue, Apr 6, 2021, at 10:28, Marco Aurélio Lotz wrote:
> >>>>>> Hi John / Guozhang,
> >>>>>>
> >>>>>> If I correctly understood John's message, he agrees on having the
> two
> >>>>>> scenarios (piggy-back and api extension). In my view, these two
> >>>> scenarios
> >>>>>> are separate tasks - the first one is a bug-fix and the second is an
> >>>>>> improvement on the current API.
> >>>>>>
> >>>>>> - bug-fix: On the current API, we change its implementation to piggy
> >>>> back
> >>>>>> on the materialization method provided to the materialized
> parameter.
> >>>> This
> >>>>>> way it will not be opinionated anymore and will not force RocksDb
> >>>>>> persistence for subscription store. Thus an in-memory materialized
> >>>>>> parameter would imply an in-memory subscription store, for example.
> >>>> From my
> >>>>>> understanding, the original implementation tried to be as
> unopionated
> >>>>>> towards storage methods as possible - and the current implementation
> >> is
> >>>> not
> >>>>>> allowing that. Was that the case? We would still need to add this
> >>>>>> modification to the update notes, since it may affect some
> >> deployments.
> >>>>>>
> >>>>>> - improvement: We extend the API to allow a user to fine tune
> >> different
> >>>>>> materialization methods for subscription and join store. This is
> done
> >> by
> >>>>>> adding a new parameter to the associated methods.
> >>>>>>
> >>>>>> Does it sound reasonable to you Guozhang?
> >>>>>> On your question, does it make sense for an user to decide retention
> >>>>>> policies (withRetention method) or caching for subscription stores?
> I
> >>>> can
> >>>>>> see why to finetune Logging for example, but in a first moment not
> >> these
> >>>>>> other behaviours. That's why I am unsure about using Materialized
> >> class.
> >>>>>>
> >>>>>> @John, I will update the KIP with your points as soon as we clarify
> >>>> this.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Marco
> >>>>>>
> >>>>>> On Tue, Apr 6, 2021 at 1:17 AM Guozhang Wang <wangg...@gmail.com>
> >>>> wrote:
> >>>>>>
> >>>>>>> Thanks Marco / John,
> >>>>>>>
> >>>>>>> I think the arguments for not piggy-backing on the existing
> >>>> Materialized
> >>>>>>> makes sense; on the other hand, if we go this route should we just
> >> use
> >>>> a
> >>>>>>> separate Materialized than using an extended /
> >>>>>>> narrowed-scoped MaterializedSubscription since it seems we want to
> >>>> allow
> >>>>>>> users to fully customize this store?
> >>>>>>>
> >>>>>>> Guozhang
> >>>>>>>
> >>>>>>> On Thu, Apr 1, 2021 at 5:28 PM John Roesler <vvcep...@apache.org>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Thanks Marco,
> >>>>>>>>
> >>>>>>>> Sorry if I caused any trouble!
> >>>>>>>>
> >>>>>>>> I don’t remember what I was thinking before, but reasoning about
> it
> >>>> now,
> >>>>>>>> you might need the fine-grained choice if:
> >>>>>>>>
> >>>>>>>> 1. The number or size of records in each partition of both tables
> is
> >>>>>>>> small(ish), but the cardinality of the join is very high. Then you
> >>>> might
> >>>>>>>> want an in-memory table store, but an on-disk subscription store.
> >>>>>>>>
> >>>>>>>> 2. The number or size of records is very large, but the join
> >>>> cardinality
> >>>>>>>> is low. Then you might need an on-disk table store, but an
> in-memory
> >>>>>>>> subscription store.
> >>>>>>>>
> >>>>>>>> 3. You might want a different kind (or differently configured)
> store
> >>>> for
> >>>>>>>> the subscription store, since it’s access pattern is so different.
> >>>>>>>>
> >>>>>>>> If you buy these, it might be good to put the justification into
> the
> >>>> KIP.
> >>>>>>>>
> >>>>>>>> I’m in favor of the default you’ve proposed.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> John
> >>>>>>>>
> >>>>>>>> On Mon, Mar 29, 2021, at 04:24, Marco Aurélio Lotz wrote:
> >>>>>>>>> Hi Guozhang,
> >>>>>>>>>
> >>>>>>>>> Apologies for the late answer. Originally that was my proposal -
> to
> >>>>>>>>> piggyback on the provided materialisation method (
> >>>>>>>>> https://issues.apache.org/jira/browse/KAFKA-10383).
> >>>>>>>>> John Roesler suggested to us to provide even further fine tuning
> on
> >>>> API
> >>>>>>>>> level parameters. Maybe we could see this as two sides of the
> same
> >>>>>>> coin:
> >>>>>>>>>
> >>>>>>>>> - On the current API, we change it to piggy back on the
> >>>> materialization
> >>>>>>>>> method provided to the join store.
> >>>>>>>>> - We extend the API to allow a user to fine tune different
> >>>>>>>> materialization
> >>>>>>>>> methods for subscription and join store.
> >>>>>>>>>
> >>>>>>>>> What do you think?
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>> Marco
> >>>>>>>>>
> >>>>>>>>> On Thu, Mar 4, 2021 at 8:04 PM Guozhang Wang <wangg...@gmail.com
> >
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Thanks Marco,
> >>>>>>>>>>
> >>>>>>>>>> Just a quick thought: what if we reuse the existing Materialized
> >>>>>>>> object for
> >>>>>>>>>> both subscription and join stores, instead of introducing a new
> >>>>>>> param /
> >>>>>>>>>> class?
> >>>>>>>>>>
> >>>>>>>>>> Guozhang
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Mar 2, 2021 at 1:07 AM Marco Aurélio Lotz <
> >>>>>>>> cont...@marcolotz.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi folks,
> >>>>>>>>>>>
> >>>>>>>>>>> I would like to invite everyone to discuss further KIP-718:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-718%3A+Make+KTable+Join+on+Foreign+key+unopinionated
> >>>>>>>>>>>
> >>>>>>>>>>> I welcome all feedback on it.
> >>>>>>>>>>>
> >>>>>>>>>>> Kind Regards,
> >>>>>>>>>>> Marco Lotz
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> -- Guozhang
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> -- Guozhang
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> >
>

Reply via email to