Thanks Matthias / Victoria, both bullet points make sense to me.
On Thu, Mar 16, 2023 at 10:39 AM Victoria Xia <victoria....@confluent.io.invalid> wrote: > > Thanks for your comments, Matthias! > > > For stream-table joins, I think we need to elaborate that a `get(k, ts)` > call now might return `null` if the history retention of the store is too > short. > > Great callout -- I agree we should definitely clarify this in the KIP and > mention it in the eventual docs as well. > > When a call to `get(k, ts)` returns null, there's not really a good way to > distinguish whether it's because the timestamp is outside of the store's > history retention or if it's because there's actually no record version for > the key at the specified timestamp. Determining this from the processor > would require (1) exposing the store's history retention to the processor, > and (2) reconciling the fact that state stores today (including the new > versioned store implementation) track their own observed stream time > separate from processor time. > > In light of this, I think your proposal to treat a null from `get(k, ts)` > due to history retention having been exceeded the same as we'd treat any > other null makes sense, and is also our only viable option right now. I'll > call this out in the docs so users are aware that their choice of history > retention has this implication. > > > For left-table-table joins, there seems to be no special impact, but it > should be called out, too. The lookup itself does not go into the history > of the table so no change here (as we don't have the "query older than > history case") > > Yup, we're on the same page. Using a versioned store for table-table joins > results in the semantic change that the join result will include the > latest-by-timestamp record rather than the latest-by-offset record, but no > timestamped lookups (i.e., `get(key, ts)` calls) are used in the process so > there is no concern about history retention having elapsed and affecting > join results. (The only implication of history retention for this use case > is indirect, since history retention doubles as grace period for the store. > Because grace period is per store instance, which has task-level > granularity, that means if grace period is set too low then the latest > record for one key could be dropped from the store if another key has > already advanced the store's observed stream time past the grace period by > the time that this record is seen.) > > I will update the KIP with these additional notes. > > Thanks, > Victoria > > On Wed, Mar 15, 2023 at 7:16 PM Matthias J. Sax <mj...@apache.org> wrote: > > > Thanks for the KIP! Great to see a first step towards using the new > > versioned stores! > > > > I think the described tradeoffs make sense and I like make a pragmatic > > step into the right direction, and avoid boiling the ocean. Thus, I > > agree to the proposed solution. > > > > One minor thing, that I believe just need clarification in the KIP (does > > not seem to be a change to the KIP itself): > > > > For stream-table joins, I think we need to elaborate that a `get(k, ts)` > > call now might return `null` if the history retention of the store is > > too short. For inner-joins it would result in no output record (ie, > > stream input record is dropped). Would be good to have it mentioned in > > the KIP explicitly. > > > > We should also discuss how left-joins should work for this case. I think > > it's ok (better) to include the stream record in the result if the > > lookup returns `null` -- either because no key exist in the exiting > > history for the provided timestamp, or (the actual case in question) > > because we query older than available history. If you agree, can we add > > this to the KIP? > > > > For left-table-table joins, there seems to be no special impact, but it > > should be called out, too. The lookup itself does not go into the > > history of the table so no change here (as we don't have the "query > > older than history case") -- and for out-of-order records, we just > > "drop" them anyway, so no change for left-joins either I believe. > > > > > > -Matthias > > > > > > > > On 3/15/23 2:00 PM, Guozhang Wang wrote: > > > Sounds good to me. Thanks! > > > > > > On Wed, Mar 15, 2023 at 12:07 PM Victoria Xia > > > <victoria....@confluent.io.invalid> wrote: > > >> > > >> Thanks for kicking off the discussion, John and Guozhang! > > >> > > >>> Just one thing that might be out of scope: if users want to enable the > > >> versioned table feature across the topology, should we allow them to do > > it > > >> via a single config rather than changing the materialized object at each > > >> place? > > >> > > >> Yes, I think this would be a great usability improvement and am in > > favor of > > >> introducing such a config. As long as the config defaults to using > > >> unversioned stores (which makes sense anyway), there will be no > > >> compatibility concerns with introducing the config in a future release. > > >> It's out of scope for this particular KIP as a result, but can > > hopefully be > > >> introduced as part of the next release after 3.5. > > >> > > >> Best, > > >> Victoria > > >> > > >> On Wed, Mar 15, 2023 at 10:49 AM Guozhang Wang < > > guozhang.wang...@gmail.com> > > >> wrote: > > >> > > >>> Thanks Victoria for the great writeup, with a thorough analysis and > > >>> trade-offs. I do not have any major questions about the proposal. > > >>> > > >>> Just one thing that might be out of scope: if users want to enable the > > >>> versioned table feature across the topology, should we allow them to > > >>> do it via a single config rather than changing the materialized object > > >>> at each place? Maybe we can defer that for future discussions, but > > >>> just want to hear your thoughts. > > >>> > > >>> Anyways, I think this proposal is great just as-is even if we agree to > > >>> do the configuration improvement later. > > >>> > > >>> > > >>> Guozhang > > >>> > > >>> On Thu, Mar 9, 2023 at 7:52 PM John Roesler <vvcep...@apache.org> > > wrote: > > >>>> > > >>>> Thanks for the KIP, Victoria! > > >>>> > > >>>> I had some questions/concerns, but you addressed them in the Rejected > > >>> Alternatives section. Thanks for the thorough proposal! > > >>>> > > >>>> -John > > >>>> > > >>>> On Thu, Mar 9, 2023, at 18:59, Victoria Xia wrote: > > >>>>> Hi everyone, > > >>>>> > > >>>>> I have a proposal for updating Kafka Streams's stream-table join and > > >>>>> table-table join semantics for the new versioned key-value state > > stores > > >>>>> introduced in KIP-889 > > >>>>> < > > >>> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-889%3A+Versioned+State+Stores > > >>>> . > > >>>>> Would love to hear your thoughts and suggestions. > > >>>>> > > >>>>> > > >>> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-914%3A+Join+Processor+Semantics+for+Versioned+Stores > > >>>>> > > >>>>> Thanks, > > >>>>> Victoria > > >>> > >