Thanks for the comments, Matthias. I like both suggestions regarding the names and I'll update the KIP accordingly.
On Tue, Sep 17, 2019 at 11:22 AM Matthias J. Sax <matth...@confluent.io> wrote: > Thanks Bill! > > Using a new configuration object was suggested by John in the original > DISCUSS thread already. We rejected it because we wanted to avoid a new > configuration class. However, given your analysis, it seems it's > actually the right choice to introduce a new configuration class. > > Hence, overall I am +1 on the proposal. > > > Some nits about names (as always :)) > > - `StreamJoined.as()` does not sound right for the `StoreSupplier` > overload. Maybe it's better to call that static method `with` (not 100% > sure) > > - Should we use plural instead of singular -> `Stream{s}Joined` > -> or keep singular but call it `StreamJoin` > `Joined` seems to refer to the input stream(s) while `Join` would > refer to the join-operator > > Thoughts? > > > You suggest to deprecate existing overload what I support -- can you > list deprecated method in the "Public Interface" section? > > > > -Matthias > > > > On 9/17/19 9:39 AM, Guozhang Wang wrote: > > Hello Bill, > > > > Thanks for the updated KIP. I made a pass on the StreamJoined section. > Just > > a quick question from user's perspective: if a user wants to provide a > > customized store-supplier, she is forced to also provide a name via the > > store-supplier. So there's no way to say "I want to provide my own store > > engine but let the library decide its name", is that right? > > > > > > Guozhang > > > > > > On Tue, Sep 17, 2019 at 8:53 AM Bill Bejeck <bbej...@gmail.com> wrote: > > > >> Bumping this discussion as we need to re-vote before the KIP deadline. > >> > >> On Fri, Sep 13, 2019 at 10:29 AM Bill Bejeck <bbej...@gmail.com> wrote: > >> > >>> Hi All, > >>> > >>> While working on the implementation of KIP-479, some issues came to > light > >>> that the KIP as written won't work. I have updated the KIP with a > >> solution > >>> I believe will solve the original problem as well as address the > >>> impediment to the initial approach. > >>> > >>> This update is a significant change, so please review the updated KIP > >>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-479%3A+Add+StreamJoined+config+object+to+Join > >> and > >>> provide feedback. After we conclude the discussion, there will be a > >>> re-vote. > >>> > >>> Thanks! > >>> Bill > >>> > >>> On Wed, Jul 17, 2019 at 7:01 PM Guozhang Wang <wangg...@gmail.com> > >> wrote: > >>> > >>>> Hi Bill, thanks for your explanations. I'm on board with your decision > >>>> too. > >>>> > >>>> > >>>> Guozhang > >>>> > >>>> On Wed, Jul 17, 2019 at 10:20 AM Bill Bejeck <bbej...@gmail.com> > wrote: > >>>> > >>>>> Thanks for the response, John. > >>>>> > >>>>>> If I can offer my thoughts, it seems better to just document on the > >>>>>> Stream join javadoc for the Materialized parameter that it will not > >>>>>> make the join result queriable. I'm not opposed to the queriable > >> flag > >>>>>> in general, but introducing it is a much larger consideration that > >> has > >>>>>> previously derailed this KIP discussion. In the interest of just > >>>>>> closing the gap and keeping the API change small, it seems better to > >>>>>> just go with documentation for now. > >>>>> > >>>>> I agree with your statement here. IMHO the most important goal of > >> this > >>>> KIP > >>>>> is to not breaking existing users and gain some consistency of the > >> API. > >>>>> > >>>>> I'll update the KIP accordingly. > >>>>> > >>>>> -Bill > >>>>> > >>>>> On Tue, Jul 16, 2019 at 11:55 AM John Roesler <j...@confluent.io> > >>>> wrote: > >>>>> > >>>>>> Hi Bill, > >>>>>> > >>>>>> Thanks for driving this KIP toward a conclusion. I'm on board with > >>>>>> your decision. > >>>>>> > >>>>>> You didn't mention whether you're still proposing to add the > >>>>>> "queriable" flag to the Materialized config object, or just document > >>>>>> that a Stream join is never queriable. Both options have come up > >>>>>> earlier in the discussion, so it would be good to pin this down. > >>>>>> > >>>>>> If I can offer my thoughts, it seems better to just document on the > >>>>>> Stream join javadoc for the Materialized parameter that it will not > >>>>>> make the join result queriable. I'm not opposed to the queriable > >> flag > >>>>>> in general, but introducing it is a much larger consideration that > >> has > >>>>>> previously derailed this KIP discussion. In the interest of just > >>>>>> closing the gap and keeping the API change small, it seems better to > >>>>>> just go with documentation for now. > >>>>>> > >>>>>> Thanks again, > >>>>>> -John > >>>>>> > >>>>>> On Thu, Jul 11, 2019 at 2:45 PM Bill Bejeck <bbej...@gmail.com> > >>>> wrote: > >>>>>>> > >>>>>>> Thanks all for the great discussion so far. > >>>>>>> > >>>>>>> Everyone has made excellent points, and I appreciate the detail > >>>>> everyone > >>>>>>> has put into their arguments. > >>>>>>> > >>>>>>> However, after carefully evaluating all the points made so far, > >>>>> creating > >>>>>> an > >>>>>>> overload with Materialized is still my #1 option. > >>>>>>> My reasoning for saying so is two-fold: > >>>>>>> > >>>>>>> 1. It's a small change, and IMHO since it's consistent with our > >>>>>> current > >>>>>>> API concerning state store usage, the cognitive load on users > >>>> will > >>>>> be > >>>>>>> minimal. > >>>>>>> 2. It achieves the most important goal of this KIP, namely to > >>>> close > >>>>>> the > >>>>>>> gap of naming state stores independently of the join operator > >>>> name. > >>>>>>> > >>>>>>> Additionally, I agree with the points made by Matthias earlier (I > >>>>> realize > >>>>>>> there is some overlap here). > >>>>>>> > >>>>>>>> - the main purpose of this KIP is to close the naming gap what > >> we > >>>>>> achieve > >>>>>>>> - we can allow people to use the new in-memory store > >>>>>>>> - we allow people to enable/disable caching > >>>>>>>> - we unify the API > >>>>>>>> - we decouple querying from naming > >>>>>>>> - it's a small API change > >>>>>>> > >>>>>>> Although it's not a perfect solution, IMHO the positives of using > >>>>>>> Materialize far outweigh the negatives, and from what we've > >>>> discussed > >>>>> so > >>>>>>> far, anything we implement seems to involve an additional change > >>>> down > >>>>> the > >>>>>>> road. > >>>>>>> > >>>>>>> If others are still strongly opposed to using Materialized, my > >> other > >>>>>>> preferences would be > >>>>>>> > >>>>>>> 1. Add a "withStoreName" to Joined. Although I agree with > >>>> Guozhang > >>>>>> that > >>>>>>> having a parameter that only applies to one use-case would be > >>>>> clumsy. > >>>>>>> 2. Add a String overload for naming the store, but this would > >> be > >>>> my > >>>>>>> least favorite option as IMHO this seems to be a step backward > >>>> from > >>>>>> why we > >>>>>>> introduced configuration objects in the first place. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Bill > >>>>>>> > >>>>>>> On Thu, Jun 27, 2019 at 4:45 PM Matthias J. Sax < > >>>> matth...@confluent.io > >>>>>> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Thanks for the KIP Bill! > >>>>>>>> > >>>>>>>> Great discussion to far. > >>>>>>>> > >>>>>>>> About John's idea about querying upstream stores and don't > >>>>> materialize > >>>>>> a > >>>>>>>> store: I agree with Bill that this seems to be an orthogonal > >>>>> question, > >>>>>>>> and it might be better to treat it as an independent > >> optimization > >>>> and > >>>>>>>> exclude from this KIP. > >>>>>>>> > >>>>>>>>> What should be the behavior if there is no store > >>>>>>>>> configured (e.g., if Materialized with only serdes) and > >>>> querying is > >>>>>>>>> enabled? > >>>>>>>> > >>>>>>>> IMHO, this could be an error case. If one wants to query a > >> store, > >>>>> they > >>>>>>>> need to provide a name -- if you don't know the name, how would > >>>> you > >>>>>>>> actually query the store (even if it would be possible to get > >> the > >>>>> name > >>>>>>>> from the `TopologyDescription`, it seems clumsy). > >>>>>>>> > >>>>>>>> If we don't want to throw an error, materializing seems to be > >> the > >>>>> right > >>>>>>>> option, to exclude "query optimization" from this KIP. I would > >> be > >>>> ok > >>>>>>>> with this option, even if it's clumsy to get the name from > >>>>>>>> `TopologyDescription`; hence, I would prefer to treat it as an > >>>> error. > >>>>>>>> > >>>>>>>>> To get back to the current behavior, users would have to > >>>>>>>>> add a "bytes store supplier" to the Materialized to indicate > >>>> that, > >>>>>>>>> yes, they really want a state store there. > >>>>>>>> > >>>>>>>> This sound like a quite subtle semantic difference on how to use > >>>> the > >>>>>>>> API. Might be hard to explain to users. I would prefer to not > >>>>>> introduce it. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> About Guozhang's points: > >>>>>>>> > >>>>>>>> 1a) That is actually a good point. However, I believe we cannot > >>>> get > >>>>>>>> around this issue easily, and it seems ok to me, to expose the > >>>> actual > >>>>>>>> store type we are using. (More thoughts later.) > >>>>>>>> > >>>>>>>> 1b) I don't see an issue with allowing users to query all > >> stores? > >>>>> What > >>>>>>>> is the rational behind it? What do we gain by not allowing it? > >>>>>>>> > >>>>>>>> 2) While I understand what you are saying, we also want/need to > >>>> have > >>>>> a > >>>>>>>> way in the PAPI to allow users adding "internal/private" > >>>>> non-queryable > >>>>>>>> stores to a topology. That's possible via > >>>>>>>> `Materialized#withQueryingDisabled()`. We could also update > >>>>>>>> `Topology#addStateStore(StoreBuilder, boolean isQueryable, > >>>>> String...)` > >>>>>>>> to address this. Again, I agree with Bill that the current API > >> is > >>>>> built > >>>>>>>> in a certain way, and if we want to change it, it should be a > >>>>> separate > >>>>>>>> KIP, as it seems to be an orthogonal concern. > >>>>>>>> > >>>>>>>>> Instead, we just restrict KIP-307 to NOT > >>>>>>>>> use the Joined.name for state store names and always use > >>>> internal > >>>>>> names > >>>>>>>> as > >>>>>>>>> well, which admittedly indeed leaves a hole of not being able > >> to > >>>>>> cover > >>>>>>>> all > >>>>>>>>> internal names here > >>>>>>>> > >>>>>>>> I think it's important to close this gap. Naming entities seems > >>>> to a > >>>>>>>> binary feature: if there is a gap, the feature is more or less > >>>>> useless, > >>>>>>>> rendering KIP-307 void. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> I like John's detailed list of required features and what > >>>>>>>> Materialized/WindowByteStoreSuppliers offers. My take is, that > >>>> adding > >>>>>>>> Materialized including the required run-time checks is the best > >>>>> option > >>>>>>>> we have, for the following reasons: > >>>>>>>> > >>>>>>>> - the main purpose of this KIP is to close the naming gap what > >> we > >>>>>> achieve > >>>>>>>> - we can allow people to use the new in-memory store > >>>>>>>> - we allow people to enable/disable caching > >>>>>>>> - we unify the API > >>>>>>>> - we decouple querying from naming > >>>>>>>> - it's a small API change > >>>>>>>> > >>>>>>>> Adding an overload and only passing in a name, would address the > >>>> main > >>>>>>>> purpose of the KIP. However, it falls short on all the other > >>>>> "goodies". > >>>>>>>> As you mentioned, passing in `Materialized` might not be perfect > >>>> and > >>>>>>>> maybe we need to deprecate is at some point; but this is also > >> true > >>>>> for > >>>>>>>> passing in just a name. > >>>>>>>> > >>>>>>>> I am also not convinced, that a `StreamJoinStore` would resolve > >>>> all > >>>>> the > >>>>>>>> issues. In the end, as long as we are using a `WindowedStore` > >>>>>>>> internally, we need to expose this "implemenation detail" to > >>>> users to > >>>>>>>> allow them to plug in a custom store. Adding `Materialized` seem > >>>> to > >>>>> be > >>>>>>>> the best short-term fix from my point of view. > >>>>>>>> > >>>>>>>> > >>>>>>>> -Matthias > >>>>>>>> > >>>>>>>> > >>>>>>>> On 6/27/19 9:56 AM, Guozhang Wang wrote: > >>>>>>>>> Hi John, > >>>>>>>>> > >>>>>>>>> I actually feels better about a new interface but I'm not sure > >>>> if > >>>>> we > >>>>>>>> would > >>>>>>>>> need the full configuration of store / log / cache, now or in > >>>> the > >>>>>> future > >>>>>>>>> ever for stream-stream join. > >>>>>>>>> > >>>>>>>>> Right now I feel that 1) we want to improve our implementation > >>>> of > >>>>>>>>> stream-stream join, and potentially also allow users to > >>>> customize > >>>>>> this > >>>>>>>>> implementation but with a more suitable interface than the > >>>> current > >>>>>>>>> WindowStore interface, how to do that is less clear and > >>>>>> execution-wise > >>>>>>>> it's > >>>>>>>>> (arguably..) not urgent; 2) we want to close the last gap > >>>>>> (Stream-stream > >>>>>>>>> join) of allowing users to specify all internal names to help > >> on > >>>>>> backward > >>>>>>>>> compatibility, which is urgent. > >>>>>>>>> > >>>>>>>>> Therefore if we want to unblock 2) from 1) in the near term, I > >>>> feel > >>>>>>>>> slightly inclined to just add overload functions that takes > >> in a > >>>>>> store > >>>>>>>> name > >>>>>>>>> for stream-stream joins only -- and admittedly, in the future > >>>> this > >>>>>>>> function > >>>>>>>>> maybe deprecated -- i.e. if we have to do something that we > >> "may > >>>>>> regret" > >>>>>>>> in > >>>>>>>>> the future, I'd like to pick the least intrusive option. > >>>>>>>>> > >>>>>>>>> About `Joined#withStoreName`: since the Joined class itself is > >>>> also > >>>>>> used > >>>>>>>> in > >>>>>>>>> other join types, I feel less comfortable to have a > >>>>>>>> `Joined#withStoreName` > >>>>>>>>> which is only going to be used by stream-stream join. Or > >> maybe I > >>>>> miss > >>>>>>>>> something here about the "latter" case that you are referring > >>>> to? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Guozhang > >>>>>>>>> > >>>>>>>>> On Mon, Jun 24, 2019 at 12:16 PM John Roesler < > >>>> j...@confluent.io> > >>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Thanks Guozhang, > >>>>>>>>>> > >>>>>>>>>> Yep. Maybe we can consider just exactly what the join needs: > >>>>>>>>>> > >>>>>>>>>>> the WindowStore<Bytes, byte[]> itself is fine, if overly > >>>> broad, > >>>>>>>>>>> since the only two methods we need are `window.put(key, > >> value, > >>>>>>>>>>> context().timestamp())` and `WindowStoreIterator<V2> iter = > >>>>>>>>>>> window.fetch(key, timeFrom, timeTo)`. > >>>>>>>>>> > >>>>>>>>>> One "middle ground" would be to extract _this_ into a new > >> store > >>>>>>>>>> interface, which only supports these API calls, like > >>>>>>>>>> StreamJoinStore<K, V>. This would give us the latitude we > >> need > >>>> to > >>>>>>>>>> efficiently support the exact operation without concerning > >>>>> ourselves > >>>>>>>>>> with all the other things a WindowStore can do (which are > >>>>>> unreachable > >>>>>>>>>> for the join use case). It would also let us drop "store > >>>>> duplicates" > >>>>>>>>>> from the main WindowStore interface, since it only exists to > >>>>> support > >>>>>>>>>> the join use case. > >>>>>>>>>> > >>>>>>>>>> If we were to add a new StreamJoinStore interface, then it'd > >> be > >>>>>>>>>> straightforward how we could add also > >>>>>>>>>> `Materialized.as(StreamJoinBytesStoreSupplier)` and use > >>>>>> Materialized, > >>>>>>>>>> or alternatively add the ability to set the bytes store on > >>>> Joined. > >>>>>>>>>> > >>>>>>>>>> Personally, I'm kind of leaning toward the latter (and also > >>>> doing > >>>>>>>>>> `Joined#withStoreName`), since adding the new interface to > >>>>>>>>>> Materialized then also pollutes the interface for its > >> _actual_ > >>>> use > >>>>>>>>>> case of materializing a table view. Of course, to solve the > >>>>>> immediate > >>>>>>>>>> problem, all we need is the store name, but we might feel > >>>> better > >>>>>> about > >>>>>>>>>> adding the store name to Joined if we _also_ feel like in the > >>>>>> future, > >>>>>>>>>> we would add store/log/cache configuration to Joined as well. > >>>>>>>>>> > >>>>>>>>>> -John > >>>>>>>>>> > >>>>>>>>>> On Mon, Jun 24, 2019 at 12:56 PM Guozhang Wang < > >>>>> wangg...@gmail.com> > >>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hello John, > >>>>>>>>>>> > >>>>>>>>>>> My main concern is exactly the first point at the bottom of > >>>> your > >>>>>>>> analysis > >>>>>>>>>>> here: "* configure the bytes store". I'm not sure if using a > >>>>> window > >>>>>>>> bytes > >>>>>>>>>>> store would be ideal for stream-stream windowed join; e.g. > >> we > >>>>> could > >>>>>>>>>>> consider two dimensional list sorted by timestamps and then > >> by > >>>>>> keys to > >>>>>>>> do > >>>>>>>>>>> the join, whereas a windowed bytes store is basically sorted > >>>> by > >>>>> key > >>>>>>>>>> first, > >>>>>>>>>>> then by timestamp. If we expose the Materialized to let user > >>>> pass > >>>>>> in a > >>>>>>>>>>> windowed bytes store, then we would need to change that if > >> we > >>>>> want > >>>>>> to > >>>>>>>>>>> replace it with a different implementation interface. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Guozhang > >>>>>>>>>>> > >>>>>>>>>>> On Mon, Jun 24, 2019 at 8:59 AM John Roesler < > >>>> j...@confluent.io> > >>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hey Guozhang and Bill, > >>>>>>>>>>>> > >>>>>>>>>>>> For what it's worth, I agree with you both! > >>>>>>>>>>>> > >>>>>>>>>>>> I think it might help the discussion to look concretely at > >>>> what > >>>>>>>>>>>> Materialized does: > >>>>>>>>>>>> * set a WindowBytesStoreSupplier > >>>>>>>>>>>> * set a name > >>>>>>>>>>>> * set the key/value serdes > >>>>>>>>>>>> * disable/enable/configure change-logging > >>>>>>>>>>>> * disable/enable caching > >>>>>>>>>>>> * configure retention > >>>>>>>>>>>> > >>>>>>>>>>>> Further, looking into the WindowBytesStoreSupplier, the > >>>>> interface > >>>>>> lets > >>>>>>>>>> you: > >>>>>>>>>>>> * get the segment interval > >>>>>>>>>>>> * get the window size > >>>>>>>>>>>> * get whether "duplicates" are enabled > >>>>>>>>>>>> * get the retention period > >>>>>>>>>>>> * (obviously) get a WindowStore<Bytes, byte[]> > >>>>>>>>>>>> > >>>>>>>>>>>> We know that Materialized isn't exactly what we need for > >>>> stream > >>>>>> joins, > >>>>>>>>>>>> but we can see how close Materialized is to what we need. > >> If > >>>> it > >>>>> is > >>>>>>>>>>>> close, maybe we can use it and document the gaps, and if it > >>>> is > >>>>> not > >>>>>>>>>>>> close, then maybe we should just add what we need to > >> Joined. > >>>>>>>>>>>> Stream Join's requirements for its stores: > >>>>>>>>>>>> * a multimap store (i.e., it keeps duplicates) for storing > >>>>> general > >>>>>>>>>>>> (not windowed) keyed records associated with their > >> insertion > >>>>>> time, and > >>>>>>>>>>>> allows efficient time-bounded lookups and also efficient > >>>> purges > >>>>>> of old > >>>>>>>>>>>> data. > >>>>>>>>>>>> ** Note, a properly configured WindowBytesStoreSupplier > >>>>> satisfies > >>>>>> this > >>>>>>>>>>>> requirement, and the interface supports the queries we need > >>>> to > >>>>>> verify > >>>>>>>>>>>> the configuration at run-time > >>>>>>>>>>>> * set a name for the store > >>>>>>>>>>>> * do _not_ set the serdes (they are already set in Joined) > >>>>>>>>>>>> * logging could be configurable (set to enabled now) > >>>>>>>>>>>> * caching could be configurable (set to enabled now) > >>>>>>>>>>>> * do _not_ configure retention (determined by JoinWindows) > >>>>>>>>>>>> > >>>>>>>>>>>> So, out of six capabilities for Materialized, there are two > >>>> we > >>>>>> don't > >>>>>>>>>>>> want (serdes and retention). These would become run-time > >>>> checks > >>>>>> if we > >>>>>>>>>>>> use it. > >>>>>>>>>>>> > >>>>>>>>>>>> A third questionable capability is to provide a > >>>>>>>>>>>> WindowBytesStoreSupplier. Looking at whether the > >>>>>>>>>>>> WindowBytesStoreSupplier is the right interface for Stream > >>>> Join: > >>>>>>>>>>>> * configuring segment interval is fine > >>>>>>>>>>>> * should _not_ configure window size (it's determined by > >>>>>> JoinWindows) > >>>>>>>>>>>> * duplicates _must_ be enabled > >>>>>>>>>>>> * retention should be _at least_ windowSize + gracePeriod, > >>>> but > >>>>>> note > >>>>>>>>>>>> that (unlike for Table window stores) there is no utility > >> in > >>>>>> having a > >>>>>>>>>>>> longer retention time. > >>>>>>>>>>>> * the WindowStore<Bytes, byte[]> itself is fine, if overly > >>>>> broad, > >>>>>>>>>>>> since the only two methods we need are `window.put(key, > >>>> value, > >>>>>>>>>>>> context().timestamp())` and `WindowStoreIterator<V2> iter = > >>>>>>>>>>>> window.fetch(key, timeFrom, timeTo)`. > >>>>>>>>>>>> > >>>>>>>>>>>> Thus, flattening out the overlap for > >> WindowBytesStoreSupplier > >>>>>> onto the > >>>>>>>>>>>> overlap for Materialized, we have 9 capabilities total > >> (note > >>>>>> retention > >>>>>>>>>>>> is duplicated), we have 4 that we don't want: > >>>>>>>>>>>> * do _not_ set the serdes (they are already set in Joined) > >>>>>>>>>>>> * do _not_ configure retention (determined by JoinWindows) > >>>>>>>>>>>> * should _not_ configure window size (it's determined by > >>>>>> JoinWindows) > >>>>>>>>>>>> * duplicates _must_ be enabled > >>>>>>>>>>>> > >>>>>>>>>>>> These gaps would have to be covered with run-time checks if > >>>> we > >>>>>> re-use > >>>>>>>>>>>> Materialized and WindowStoreBytesStoreSupplier both. Maybe > >>>> this > >>>>>> sounds > >>>>>>>>>>>> bad, but consider the other side, that we get 5 new > >>>> capabilities > >>>>>> we > >>>>>>>>>>>> don't require, but are still pretty nice: > >>>>>>>>>>>> * configure the bytes store > >>>>>>>>>>>> * set a name for the store > >>>>>>>>>>>> * configure caching > >>>>>>>>>>>> * configure logging > >>>>>>>>>>>> * configure segment interval > >>>>>>>>>>>> > >>>>>>>>>>>> Not sure where this nets us out, but it's food for thought. > >>>>>>>>>>>> -John > >>>>>>>>>>>> > >>>>>>>>>>>> On Sun, Jun 23, 2019 at 7:52 PM Guozhang Wang < > >>>>> wangg...@gmail.com > >>>>>>> > >>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hi Bill, > >>>>>>>>>>>>> > >>>>>>>>>>>>> I think by giving a Materialized param into stream-stream > >>>> join, > >>>>>> it's > >>>>>>>>>> okay > >>>>>>>>>>>>> (though still ideal) to say "we still would not expose the > >>>>> store > >>>>>> for > >>>>>>>>>>>>> queries", but it would sound a bit awkward to say "we > >> would > >>>>> also > >>>>>>>>>> ignore > >>>>>>>>>>>>> whatever the passed in store supplier but just use our > >>>> default > >>>>>> ones" > >>>>>>>>>> -- > >>>>>>>>>>>>> again the concern is that, if in the future we'd want to > >>>> change > >>>>>> the > >>>>>>>>>>>> default > >>>>>>>>>>>>> implementation of join algorithm which no longer rely on a > >>>>> window > >>>>>>>>>> store > >>>>>>>>>>>>> with deduping enabled, then we need to change this API > >>>> again by > >>>>>>>>>> changing > >>>>>>>>>>>>> the store supplier type. > >>>>>>>>>>>>> > >>>>>>>>>>>>> If we do want to fill this hole for stream-stream join, I > >>>> feel > >>>>>> just > >>>>>>>>>>>> adding > >>>>>>>>>>>>> a String typed store-name would even be less > >>>> future-intrusive > >>>>> if > >>>>>> we > >>>>>>>>>>>> expect > >>>>>>>>>>>>> this parameter to be modified later. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Does that makes sense? > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Guozhang > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Sat, Jun 22, 2019 at 12:51 PM Bill Bejeck < > >>>>> bbej...@gmail.com> > >>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks for the comments John and Guozhang, I'll address > >>>> each > >>>>>> one of > >>>>>>>>>>>> your > >>>>>>>>>>>>>> comments in turn. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> John, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I'm wondering about a missing quadrant from the truth > >>>> table > >>>>>>>>>> involving > >>>>>>>>>>>>>>> whether a Materialized is stored or not and querying is > >>>>>>>>>>>>>>> enabled/disabled... What should be the behavior if there > >>>> is > >>>>> no > >>>>>>>>>> store > >>>>>>>>>>>>>>> configured (e.g., if Materialized with only serdes) and > >>>>>> querying > >>>>>>>>>> is > >>>>>>>>>>>>>> enabled? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> It seems we have two choices: > >>>>>>>>>>>>>>> 1. we can force creation of a state store in this case, > >> so > >>>>> the > >>>>>>>>>> store > >>>>>>>>>>>>>>> can be used to serve the queries > >>>>>>>>>>>>>>> 2. we can provide just a queriable view, basically > >>>> letting IQ > >>>>>>>>>> query > >>>>>>>>>>>>>>> into the "KTableValueGetter", which would transparently > >>>>>>>>>> construct the > >>>>>>>>>>>>>>> query response by applying the operator logic to the > >>>> upstream > >>>>>>>>>> state > >>>>>>>>>>>> if > >>>>>>>>>>>>>>> the operator state isn't already stored. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I agree with your assertion about a missing quadrant from > >>>> the > >>>>>> truth > >>>>>>>>>>>> table. > >>>>>>>>>>>>>> Additionally, I too like the concept of a queriable view. > >>>>> But I > >>>>>>>>>> think > >>>>>>>>>>>> that > >>>>>>>>>>>>>> goes a bit beyond the scope of this KIP and would like to > >>>>> pursue > >>>>>>>>>> that > >>>>>>>>>>>>>> feature as follow-on work. Also thinking about this KIP > >>>> some > >>>>>>>>>> more, I'm > >>>>>>>>>>>>>> thinking of the changes to Materialized might be a reach > >> as > >>>>>> well. > >>>>>>>>>>>>>> Separating the naming from a store and its queryable > >> state > >>>>> seems > >>>>>>>>>> like a > >>>>>>>>>>>>>> complex issue in and of itself and should be treated > >>>>>> accordingly. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> So here's what I'm thinking now. We add Materialzied to > >>>> Join, > >>>>>> but > >>>>>>>>>> for > >>>>>>>>>>>> now, > >>>>>>>>>>>>>> we internally disable querying. I know this breaks our > >>>>> current > >>>>>>>>>>>> semantic > >>>>>>>>>>>>>> approach, but I think it's crucial that we do two things > >> in > >>>>> this > >>>>>>>>>> KIP > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 1. Break the naming of the state stores from Joined to > >>>>>>>>>>>> Materialized, so > >>>>>>>>>>>>>> the naming of state stores follows our current pattern > >>>> and > >>>>>>>>>> enables > >>>>>>>>>>>>>> upgrades > >>>>>>>>>>>>>> from 2.3 to 2.4 > >>>>>>>>>>>>>> 2. Offer the ability to configure the state stores of > >>>> the > >>>>>> join, > >>>>>>>>>> even > >>>>>>>>>>>>>> providing a different implementation (i.e. in-memory) > >> if > >>>>>>>>>> desired. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> With that in mind I'm considering changing the KIP to > >>>> remove > >>>>> the > >>>>>>>>>>>> changes to > >>>>>>>>>>>>>> Materialized, and we document very clearly that by > >>>> providing a > >>>>>>>>>>>> Materialized > >>>>>>>>>>>>>> object with a name is only for naming the state store, > >>>> hence > >>>>> the > >>>>>>>>>>>> changelog > >>>>>>>>>>>>>> topics and any possible configurations of the store, but > >>>> this > >>>>>> store > >>>>>>>>>>>> *will > >>>>>>>>>>>>>> not be available for IQ.* > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> WDYT? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Guozhang, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 1. About not breaking compatibility of stream-stream > >> join > >>>>>>>>>>>> materialized > >>>>>>>>>>>>>>> stores: I think this is a valid issue to tackle, but > >> after > >>>>>>>>>> thinking > >>>>>>>>>>>> about > >>>>>>>>>>>>>>> it once more I'm not sure if exposing Materialized would > >>>> be a > >>>>>>>>>> good > >>>>>>>>>>>>>> solution > >>>>>>>>>>>>>>> here. My rationles: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 1.a For stream-stream join, our current usage of > >>>> window-store > >>>>>> is > >>>>>>>>>> not > >>>>>>>>>>>>>> ideal, > >>>>>>>>>>>>>>> and we want to modify it in the near future to be more > >>>>>>>>>> efficient. Not > >>>>>>>>>>>>>>> allowing users to override such state store backend > >> gives > >>>> us > >>>>>> such > >>>>>>>>>>>> freedom > >>>>>>>>>>>>>>> (which was also considered in the original DSL design), > >>>>> whereas > >>>>>>>>>>>> getting a > >>>>>>>>>>>>>>> Materialized<WindowStore> basically kicks out that > >> freedom > >>>>> out > >>>>>>>>>> of the > >>>>>>>>>>>>>>> window. > >>>>>>>>>>>>>>> 1.b For strema-stream join, in our original design we > >>>> intend > >>>>> to > >>>>>>>>>>>> "never" > >>>>>>>>>>>>>>> want users to query the state, since it is just for > >>>> buffering > >>>>>> the > >>>>>>>>>>>>>> upcoming > >>>>>>>>>>>>>>> records from the stream. Now I know that some users may > >>>>> indeed > >>>>>>>>>> want > >>>>>>>>>>>> to > >>>>>>>>>>>>>>> query it from the debugging perspective, but still I > >>>>> concerned > >>>>>>>>>> about > >>>>>>>>>>>>>>> whether leveraging IQ for debugging purposes would be > >> the > >>>>> right > >>>>>>>>>>>> solution > >>>>>>>>>>>>>>> here. And adding Materialized object opens the door to > >> let > >>>>>> users > >>>>>>>>>>>> query > >>>>>>>>>>>>>>> about it (unless we did something intentionally to still > >>>>>> forbids > >>>>>>>>>> it), > >>>>>>>>>>>>>> which > >>>>>>>>>>>>>>> also restricts us in the future. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 2. About the coupling between Materialized.name() and > >>>>>> queryable: > >>>>>>>>>>>> again I > >>>>>>>>>>>>>>> think this is a valid issue. But I'm not sure if the > >>>> current > >>>>>>>>>>>>>>> "withQuerryingDisabled / Enabled" at Materialized is the > >>>> best > >>>>>>>>>>>> approach. > >>>>>>>>>>>>>>> Here I think I agree with John, that generally speaking > >>>> it's > >>>>>>>>>> better > >>>>>>>>>>>> be a > >>>>>>>>>>>>>>> control function on the `KTable` itself, rather than on > >>>>>>>>>>>> `Materialized`, > >>>>>>>>>>>>>> so > >>>>>>>>>>>>>>> fixing it via adding functions through `Materialized` > >>>> seems > >>>>>> not a > >>>>>>>>>>>> natural > >>>>>>>>>>>>>> approach either. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I understand your thoughts here, and up to a point, I > >> agree > >>>>> with > >>>>>>>>>> you. > >>>>>>>>>>>>>> But concerning not providing Materialized as it may > >>>> restrict > >>>>> us > >>>>>> in > >>>>>>>>>> the > >>>>>>>>>>>>>> future for delivering different implementations, I'm > >>>> wondering > >>>>>> if > >>>>>>>>>> we > >>>>>>>>>>>> are > >>>>>>>>>>>>>> doing some premature optimization here. > >>>>>>>>>>>>>> My rationale for saying so > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 1. I think the cost of not allowing the naming of > >> state > >>>>>> stores > >>>>>>>>>> for > >>>>>>>>>>>> joins > >>>>>>>>>>>>>> is too big of a gap to leave. IMHO for joins to > >> follow > >>>>> the > >>>>>>>>>> current > >>>>>>>>>>>>>> pattern of using Materialized for naming state stores > >>>> would > >>>>>> be > >>>>>>>>>> what > >>>>>>>>>>>> most > >>>>>>>>>>>>>> users would expect to use. As I said in my comments > >>>>> above, I > >>>>>>>>>> think > >>>>>>>>>>>> we > >>>>>>>>>>>>>> should *not include* the changes to Materialized and > >>>>> enforce > >>>>>>>>>> named > >>>>>>>>>>>>>> stores for joins as unavailable for IQ. > >>>>>>>>>>>>>> 2. We'll still have the join methods available > >> without a > >>>>>>>>>>>> Materialized > >>>>>>>>>>>>>> allowing us to do something different internally if a > >>>>>>>>>> Materialized > >>>>>>>>>>>> is > >>>>>>>>>>>>>> not > >>>>>>>>>>>>>> provided. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Overall, I'm thinking maybe we should still use two > >> stones > >>>>>> rather > >>>>>>>>>>>> than > >>>>>>>>>>>>>> one > >>>>>>>>>>>>>>> to kill these two birds, and probably for this KIP we > >> just > >>>>>> focus > >>>>>>>>>> on > >>>>>>>>>>>> 1) > >>>>>>>>>>>>>>> above. And for that I'd like to not expose the > >>>> Materialized > >>>>>>>>>> either > >>>>>>>>>>>> for > >>>>>>>>>>>>>>> rationales that I've listed above. Instead, we just > >>>> restrict > >>>>>>>>>> KIP-307 > >>>>>>>>>>>> to > >>>>>>>>>>>>>> NOT > >>>>>>>>>>>>>>> use the Joined.name for state store names and always use > >>>>>> internal > >>>>>>>>>>>> names > >>>>>>>>>>>>>> as > >>>>>>>>>>>>>>> well, which admittedly indeed leaves a hole of not being > >>>> able > >>>>>> to > >>>>>>>>>>>> cover > >>>>>>>>>>>>>> all > >>>>>>>>>>>>>>> internal names here, but now I feel this `hole` may > >>>> better be > >>>>>>>>>> filled > >>>>>>>>>>>> by, > >>>>>>>>>>>>>>> e.g. not creating changelog topics but just use the > >>>> upstream > >>>>> to > >>>>>>>>>>>>>>> re-bootstrap the materialized store, more concretely: > >> when > >>>>>>>>>>>> materializing > >>>>>>>>>>>>>>> the store, try to piggy-back the changelog topic on an > >>>>> existing > >>>>>>>>>>>> topic, > >>>>>>>>>>>>>> e.g. > >>>>>>>>>>>>>>> a) if the stream is coming directly from some source > >> topic > >>>>>>>>>> (including > >>>>>>>>>>>>>>> repartition topic), make that as changelog topic and if > >>>> it is > >>>>>>>>>>>> repartition > >>>>>>>>>>>>>>> topic change the retention / data purging policy > >>>> necessarily > >>>>> as > >>>>>>>>>>>> well; b) > >>>>>>>>>>>>>> if > >>>>>>>>>>>>>>> the stream is coming from some stateless operators, > >>>> delegate > >>>>>> that > >>>>>>>>>>>>>> stateless > >>>>>>>>>>>>>>> operator to the parent stream similar as a); if the > >>>> stream is > >>>>>>>>>> coming > >>>>>>>>>>>> from > >>>>>>>>>>>>>> a > >>>>>>>>>>>>>>> stream-stream join which is the only stateful operator > >>>> that > >>>>> can > >>>>>>>>>>>> result in > >>>>>>>>>>>>>> a > >>>>>>>>>>>>>>> stream, consider merging the join into multi-way joins > >>>> (yes, > >>>>>>>>>> this is > >>>>>>>>>>>> a > >>>>>>>>>>>>>> very > >>>>>>>>>>>>>>> hand-wavy thought, but the point here is that we do not > >>>> try > >>>>> to > >>>>>>>>>>>> tackle it > >>>>>>>>>>>>>>> now but leave it for a better solution :). > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I really like this idea! I agree with you in that this > >>>>> approach > >>>>>>>>>> to too > >>>>>>>>>>>>>> much for adding in this KIP, but we could pick it up > >> later > >>>> and > >>>>>>>>>>>> leverage the > >>>>>>>>>>>>>> Optimization framework to accomplish this re-use. > >>>>>>>>>>>>>> Again, while I agree we should break the naming of join > >>>> state > >>>>>>>>>> stores > >>>>>>>>>>>> from > >>>>>>>>>>>>>> KIP-307, IMHO it's something we should fix now as it will > >>>> be > >>>>> the > >>>>>>>>>> last > >>>>>>>>>>>> piece > >>>>>>>>>>>>>> we can provide to give users the ability to completely > >> make > >>>>>> their > >>>>>>>>>>>>>> topologies "upgrade proof" when adding additional > >>>> operations. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks again to both of you for comments and I look > >>>> forward to > >>>>>>>>>> hearing > >>>>>>>>>>>> back > >>>>>>>>>>>>>> from you. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>> Bill > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Thu, Jun 20, 2019 at 2:33 PM Guozhang Wang < > >>>>>> wangg...@gmail.com> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hello Bill, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks for the KIP. Glad to see that we can likely > >>>> shooting > >>>>> two > >>>>>>>>>> birds > >>>>>>>>>>>>>> with > >>>>>>>>>>>>>>> one stone. I have some concerns though about those "two > >>>>> birds" > >>>>>>>>>>>>>> themselves: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 1. About not breaking compatibility of stream-stream > >> join > >>>>>>>>>>>> materialized > >>>>>>>>>>>>>>> stores: I think this is a valid issue to tackle, but > >> after > >>>>>>>>>> thinking > >>>>>>>>>>>> about > >>>>>>>>>>>>>>> it once more I'm not sure if exposing Materialized would > >>>> be a > >>>>>>>>>> good > >>>>>>>>>>>>>> solution > >>>>>>>>>>>>>>> here. My rationles: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 1.a For stream-stream join, our current usage of > >>>> window-store > >>>>>> is > >>>>>>>>>> not > >>>>>>>>>>>>>> ideal, > >>>>>>>>>>>>>>> and we want to modify it in the near future to be more > >>>>>>>>>> efficient. Not > >>>>>>>>>>>>>>> allowing users to override such state store backend > >> gives > >>>> us > >>>>>> such > >>>>>>>>>>>> freedom > >>>>>>>>>>>>>>> (which was also considered in the original DSL design), > >>>>> whereas > >>>>>>>>>>>> getting a > >>>>>>>>>>>>>>> Materialized<WindowStore> basically kicks out that > >> freedom > >>>>> out > >>>>>>>>>> of the > >>>>>>>>>>>>>>> window. > >>>>>>>>>>>>>>> 1.b For strema-stream join, in our original design we > >>>> intend > >>>>> to > >>>>>>>>>>>> "never" > >>>>>>>>>>>>>>> want users to query the state, since it is just for > >>>> buffering > >>>>>> the > >>>>>>>>>>>>>> upcoming > >>>>>>>>>>>>>>> records from the stream. Now I know that some users may > >>>>> indeed > >>>>>>>>>> want > >>>>>>>>>>>> to > >>>>>>>>>>>>>>> query it from the debugging perspective, but still I > >>>>> concerned > >>>>>>>>>> about > >>>>>>>>>>>>>>> whether leveraging IQ for debugging purposes would be > >> the > >>>>> right > >>>>>>>>>>>> solution > >>>>>>>>>>>>>>> here. And adding Materialized object opens the door to > >> let > >>>>>> users > >>>>>>>>>>>> query > >>>>>>>>>>>>>>> about it (unless we did something intentionally to still > >>>>>> forbids > >>>>>>>>>> it), > >>>>>>>>>>>>>> which > >>>>>>>>>>>>>>> also restricts us in the future. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 2. About the coupling between Materialized.name() and > >>>>>> queryable: > >>>>>>>>>>>> again I > >>>>>>>>>>>>>>> think this is a valid issue. But I'm not sure if the > >>>> current > >>>>>>>>>>>>>>> "withQuerryingDisabled / Enabled" at Materialized is the > >>>> best > >>>>>>>>>>>> approach. > >>>>>>>>>>>>>>> Here I think I agree with John, that generally speaking > >>>> it's > >>>>>>>>>> better > >>>>>>>>>>>> be a > >>>>>>>>>>>>>>> control function on the `KTable` itself, rather than on > >>>>>>>>>>>> `Materialized`, > >>>>>>>>>>>>>> so > >>>>>>>>>>>>>>> fixing it via adding functions through `Materialized` > >>>> seems > >>>>>> not a > >>>>>>>>>>>> natural > >>>>>>>>>>>>>>> approach either. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Overall, I'm thinking maybe we should still use two > >> stones > >>>>>> rather > >>>>>>>>>>>> than > >>>>>>>>>>>>>> one > >>>>>>>>>>>>>>> to kill these two birds, and probably for this KIP we > >> just > >>>>>> focus > >>>>>>>>>> on > >>>>>>>>>>>> 1) > >>>>>>>>>>>>>>> above. And for that I'd like to not expose the > >>>> Materialized > >>>>>>>>>> either > >>>>>>>>>>>> for > >>>>>>>>>>>>>>> rationales that I've listed above. Instead, we just > >>>> restrict > >>>>>>>>>> KIP-307 > >>>>>>>>>>>> to > >>>>>>>>>>>>>> NOT > >>>>>>>>>>>>>>> use the Joined.name for state store names and always use > >>>>>> internal > >>>>>>>>>>>> names > >>>>>>>>>>>>>> as > >>>>>>>>>>>>>>> well, which admittedly indeed leaves a hole of not being > >>>> able > >>>>>> to > >>>>>>>>>>>> cover > >>>>>>>>>>>>>> all > >>>>>>>>>>>>>>> internal names here, but now I feel this `hole` may > >>>> better be > >>>>>>>>>> filled > >>>>>>>>>>>> by, > >>>>>>>>>>>>>>> e.g. not creating changelog topics but just use the > >>>> upstream > >>>>> to > >>>>>>>>>>>>>>> re-bootstrap the materialized store, more concretely: > >> when > >>>>>>>>>>>> materializing > >>>>>>>>>>>>>>> the store, try to piggy-back the changelog topic on an > >>>>> existing > >>>>>>>>>>>> topic, > >>>>>>>>>>>>>> e.g. > >>>>>>>>>>>>>>> a) if the stream is coming directly from some source > >> topic > >>>>>>>>>> (including > >>>>>>>>>>>>>>> repartition topic), make that as changelog topic and if > >>>> it is > >>>>>>>>>>>> repartition > >>>>>>>>>>>>>>> topic change the retention / data purging policy > >>>> necessarily > >>>>> as > >>>>>>>>>>>> well; b) > >>>>>>>>>>>>>> if > >>>>>>>>>>>>>>> the stream is coming from some stateless operators, > >>>> delegate > >>>>>> that > >>>>>>>>>>>>>> stateless > >>>>>>>>>>>>>>> operator to the parent stream similar as a); if the > >>>> stream is > >>>>>>>>>> coming > >>>>>>>>>>>>>> from a > >>>>>>>>>>>>>>> stream-stream join which is the only stateful operator > >>>> that > >>>>> can > >>>>>>>>>>>> result > >>>>>>>>>>>>>> in a > >>>>>>>>>>>>>>> stream, consider merging the join into multi-way joins > >>>> (yes, > >>>>>>>>>> this is > >>>>>>>>>>>> a > >>>>>>>>>>>>>> very > >>>>>>>>>>>>>>> hand-wavy thought, but the point here is that we do not > >>>> try > >>>>> to > >>>>>>>>>>>> tackle it > >>>>>>>>>>>>>>> now but leave it for a better solution :). > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Guozhang > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Wed, Jun 19, 2019 at 11:41 AM John Roesler < > >>>>>> j...@confluent.io > >>>>>>>>>>> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Hi Bill, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Thanks for the KIP! Awesome job catching this > >> unexpected > >>>>>>>>>>>> consequence > >>>>>>>>>>>>>>>> of the prior KIPs before it was released. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The proposal looks good to me. On top of just fixing > >> the > >>>>>>>>>> problem, > >>>>>>>>>>>> it > >>>>>>>>>>>>>>>> seems to address two other pain points: > >>>>>>>>>>>>>>>> * that naming a state store automatically causes it to > >>>>> become > >>>>>>>>>>>>>> queriable. > >>>>>>>>>>>>>>>> * that there's currently no way to configure the bytes > >>>> store > >>>>>>>>>> for > >>>>>>>>>>>> join > >>>>>>>>>>>>>>>> windows. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> It's awesome that we can fix this issue and two others > >>>> with > >>>>>> one > >>>>>>>>>>>>>> feature. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I'm wondering about a missing quadrant from the truth > >>>> table > >>>>>>>>>>>> involving > >>>>>>>>>>>>>>>> whether a Materialized is stored or not and querying is > >>>>>>>>>>>>>>>> enabled/disabled... What should be the behavior if > >> there > >>>> is > >>>>> no > >>>>>>>>>>>> store > >>>>>>>>>>>>>>>> configured (e.g., if Materialized with only serdes) and > >>>>>>>>>> querying is > >>>>>>>>>>>>>>>> enabled? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> It seems we have two choices: > >>>>>>>>>>>>>>>> 1. we can force creation of a state store in this case, > >>>> so > >>>>> the > >>>>>>>>>>>> store > >>>>>>>>>>>>>>>> can be used to serve the queries > >>>>>>>>>>>>>>>> 2. we can provide just a queriable view, basically > >>>> letting > >>>>> IQ > >>>>>>>>>> query > >>>>>>>>>>>>>>>> into the "KTableValueGetter", which would transparently > >>>>>>>>>> construct > >>>>>>>>>>>> the > >>>>>>>>>>>>>>>> query response by applying the operator logic to the > >>>>> upstream > >>>>>>>>>>>> state if > >>>>>>>>>>>>>>>> the operator state isn't already stored. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Offhand, it seems like the second is actually a pretty > >>>>> awesome > >>>>>>>>>>>>>>>> capability. But it might have an awkward interaction > >> with > >>>>> the > >>>>>>>>>>>> current > >>>>>>>>>>>>>>>> semantics. Presently, if I provide a > >>>> Materialized.withName, > >>>>> it > >>>>>>>>>>>> implies > >>>>>>>>>>>>>>>> that querying should be enabled AND that the view > >> should > >>>>>>>>>> actually > >>>>>>>>>>>> be > >>>>>>>>>>>>>>>> stored in a state store. Under option 2 above, this > >>>> behavior > >>>>>>>>>> would > >>>>>>>>>>>>>>>> change to NOT provision a state store and instead just > >>>>> consult > >>>>>>>>>> the > >>>>>>>>>>>>>>>> ValueGetter. To get back to the current behavior, users > >>>>> would > >>>>>>>>>> have > >>>>>>>>>>>> to > >>>>>>>>>>>>>>>> add a "bytes store supplier" to the Materialized to > >>>> indicate > >>>>>>>>>> that, > >>>>>>>>>>>>>>>> yes, they really want a state store there. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Behavior changes are always kind of scary, but I think > >> in > >>>>> this > >>>>>>>>>>>> case, > >>>>>>>>>>>>>>>> it might actually be preferable. In the event where > >> only > >>>> the > >>>>>>>>>> name > >>>>>>>>>>>> is > >>>>>>>>>>>>>>>> provided, it means that people just wanted to make the > >>>>>>>>>> operation > >>>>>>>>>>>>>>>> result queriable. If we automatically convert this to a > >>>>>>>>>> non-stored > >>>>>>>>>>>>>>>> view, then simply upgrading results in the same > >>>> observable > >>>>>>>>>> behavior > >>>>>>>>>>>>>>>> and semantics, but a linear reduction in local storage > >>>>>>>>>> requirements > >>>>>>>>>>>>>>>> and disk i/o, as well as a corresponding linear > >>>> reduction in > >>>>>>>>>> memory > >>>>>>>>>>>>>>>> usage both on and off heap. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> What do you think? > >>>>>>>>>>>>>>>> -John > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Tue, Jun 18, 2019 at 9:21 PM Bill Bejeck < > >>>>>> bbej...@gmail.com > >>>>>>>>>>> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> All, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I'd like to start a discussion for adding a > >> Materialized > >>>>>>>>>>>>>> configuration > >>>>>>>>>>>>>>>>> object to KStream.join for naming state stores > >> involved > >>>> in > >>>>>>>>>> joins. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>> > >>>>> > >>>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-479%3A+Add+Materialized+to+Join > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Your comments and suggestions are welcome. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>> Bill > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>> -- Guozhang > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> -- > >>>>>>>>>>>>> -- Guozhang > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> -- Guozhang > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >>>> -- > >>>> -- Guozhang > >>>> > >>> > >> > > > > > >