No specific comments, but I just wanted to mention I like the direction of the KIP. My team is a big user of "transform" methods because of the ability to chain them, and I have always found the terminology challenging to explain alongside "process". It felt like one concept with two names. So moving towards a single API that is powerful enough to handle both use cases seems absolutely correct to me.
Paul On Mon, Feb 14, 2022 at 1:12 PM Jorge Esteban Quilcate Otoya < quilcate.jo...@gmail.com> wrote: > Got it. Thanks John, this make sense. > > I've updated the KIP to include the deprecation of: > > - KStream#transform > - KStream#transformValues > - KStream#flatTransform > - KStream#flatTransformValues > > > > On Fri, 11 Feb 2022 at 15:16, John Roesler <vvcep...@apache.org> wrote: > > > Thanks, Jorge! > > > > I think it’ll be better to keep this KIP focused on KStream methods only. > > I suspect that the KTable methods may be more complicated than just that > > proposed replacement, but it’ll also be easier to consider that question > in > > isolation. > > > > The nice thing about just deprecating the KStream methods and not the > > Transform* interfaces is that you can keep your proposal just scoped to > > KStream and not have any consequences for the rest of the DSL. > > > > Thanks again, > > John > > > > On Fri, Feb 11, 2022, at 06:43, Jorge Esteban Quilcate Otoya wrote: > > > Thanks, John. > > > > > >> 4) I agree that we shouldn't deprecate the Transformer* > > > classes, but do you think we should deprecate the > > > KStream#transform* methods? I'm curious if there's any > > > remaining reason to have those methods, or if your KIP > > > completely obviates them. > > > > > > Good catch. > > > I considered that deprecating `Transformer*` and `transform*` would go > > hand > > > in hand — maybe it happened similarly with old `Processor` and > `process`? > > > Though deprecating only `transform*` operations could be a better > signal > > > for users than non deprecating anything at all and pave the way to it's > > > deprecation. > > > > > > Should this deprecation also consider including > `KTable#transformValues`? > > > The approach proposed on the KIP: > > > `ktable.toStream().processValues().toTable()` seems fair to me, though > I > > > will have to test it further. > > > > > > I'm happy to update the KIP if there's some consensus around this. > > > Will add the deprecation notes these days and wait for any additional > > > feedback on this topic before wrapping up the KIP. > > > > > > > > > On Fri, 11 Feb 2022 at 04:03, John Roesler <vvcep...@apache.org> > wrote: > > > > > >> Thanks for the update, Jorge! > > >> > > >> I just read over the KIP again, and I'm in support. One more > > >> question came up for me, though: > > >> > > >> 4) I agree that we shouldn't deprecate the Transformer* > > >> classes, but do you think we should deprecate the > > >> KStream#transform* methods? I'm curious if there's any > > >> remaining reason to have those methods, or if your KIP > > >> completely obviates them. > > >> > > >> Thanks, > > >> -John > > >> > > >> On Thu, 2022-02-10 at 21:32 +0000, Jorge Esteban Quilcate > > >> Otoya wrote: > > >> > Thank you both for your feedback! > > >> > > > >> > I have added the following note on punctuation: > > >> > > > >> > ``` > > >> > NOTE: The key validation can be defined when processing the message. > > >> > Though, with punctuations it won't be possible to define the key for > > >> > validation before forwarding, therefore it won't be possible to > > forward > > >> > from punctuation. > > >> > This is similar behavior to how `ValueTransformer`s behave at the > > moment. > > >> > ``` > > >> > > > >> > Also make it explicit also that we are going to apply referencial > > >> equality > > >> > for key validation. > > >> > > > >> > I hope this is covering all your feedback, let me know if I'm > missing > > >> > anything. > > >> > > > >> > Cheers, > > >> > Jorge. > > >> > > > >> > On Wed, 9 Feb 2022 at 22:19, Guozhang Wang <wangg...@gmail.com> > > wrote: > > >> > > > >> > > I'm +1 on John's point 3) for punctuations. > > >> > > > > >> > > And I think if people are on the same page that a reference > equality > > >> check > > >> > > per record is not a huge overhead, I think doing that enforcement > is > > >> better > > >> > > than documentations and hand-wavy undefined behaviors. > > >> > > > > >> > > > > >> > > Guozhang > > >> > > > > >> > > On Wed, Feb 9, 2022 at 11:27 AM John Roesler <vvcep...@apache.org > > > > >> wrote: > > >> > > > > >> > > > Thanks for the KIP Jorge, > > >> > > > > > >> > > > I'm in support of your proposal. > > >> > > > > > >> > > > 1) > > >> > > > I do agree with Guozhang's point (1). I think the cleanest > > >> > > > approach. I think it's cleaner and better to keep the > > >> > > > enforcement internal to the framework than to introduce a > > >> > > > public API or context wrapper for processors to use > > >> > > > explicitly. > > >> > > > > > >> > > > 2) I tend to agree with you on this one; I think the > > >> > > > equality check ought to be fast enough in practice. > > >> > > > > > >> > > > 3) I think this is implicit, but should be explicit in the > > >> > > > KIP: For the `processValues` API, because the framework sets > > >> > > > the key on the context before calling `process` and then > > >> > > > unsets it afterwards, there will always be no key set during > > >> > > > task puctuation. Therefore, while processors may still > > >> > > > register punctuators, they will not be able to forward > > >> > > > anything from them. > > >> > > > > > >> > > > This is functionally equivalent to the existing > > >> > > > transformers, by the way, that are also forbidden to forward > > >> > > > anything during punctuation. > > >> > > > > > >> > > > For what it's worth, I think this is the best tradeoff. > > >> > > > > > >> > > > The only alternative I see is not to place any restriction > > >> > > > on forwarded keys at all and just document that if users > > >> > > > don't maintain proper partitioning, they'll get undefined > > >> > > > behavior. That might be more powerful, but it's also a > > >> > > > usability problem. > > >> > > > > > >> > > > Thanks, > > >> > > > -John > > >> > > > > > >> > > > On Wed, 2022-02-09 at 11:34 +0000, Jorge Esteban Quilcate > > >> > > > Otoya wrote: > > >> > > > > Thanks Guozhang. > > >> > > > > > > >> > > > > > Does `ValueProcessorContext` have to be a public API? It > seems > > >> to me > > >> > > > > that this can be completely abstracted away from user > interfaces > > >> as an > > >> > > > > internal class > > >> > > > > > > >> > > > > Totally agree. No intention to add these as public APIs. Will > > >> update > > >> > > the > > >> > > > > KIP to reflect this. > > >> > > > > > > >> > > > > > in the past the rationale for enforcing it at the > > >> > > > > interface layer rather than do runtime checks is that it is > more > > >> > > > efficient. > > >> > > > > > I'm not sure how much overhead it may incur to check if the > > key > > >> did > > >> > > not > > >> > > > > change: if it is just a reference equality check maybe it's > > okay. > > >> > > What's > > >> > > > > your take on this? > > >> > > > > > > >> > > > > Agree, reference equality should cover this validation and the > > >> overhead > > >> > > > > impact should not be meaningful. > > >> > > > > Will update the KIP to reflect this as well. > > >> > > > > > > >> > > > > > > >> > > > > On Tue, 8 Feb 2022 at 19:05, Guozhang Wang < > wangg...@gmail.com> > > >> wrote: > > >> > > > > > > >> > > > > > Hello Jorge, > > >> > > > > > > > >> > > > > > Thanks for bringing this KIP! I think this is a nice idea to > > >> consider > > >> > > > using > > >> > > > > > a single overloaded function name for #process, just a > couple > > >> quick > > >> > > > > > questions after reading the proposal: > > >> > > > > > > > >> > > > > > 1) Does `ValueProcessorContext` have to be a public API? It > > >> seems to > > >> > > me > > >> > > > > > that this can be completely abstracted away from user > > interfaces > > >> as > > >> > > an > > >> > > > > > internal class, and we call the `setKey` before calling > > >> > > > user-instantiated > > >> > > > > > `process` function, and then in its overridden `forward` it > > can > > >> just > > >> > > > check > > >> > > > > > if the key changes or not. > > >> > > > > > 2) Related to 1) above, in the past the rationale for > > enforcing > > >> it at > > >> > > > the > > >> > > > > > interface layer rather than do runtime checks is that it is > > more > > >> > > > efficient. > > >> > > > > > I'm not sure how much overhead it may incur to check if the > > key > > >> did > > >> > > not > > >> > > > > > change: if it is just a reference equality check maybe it's > > okay. > > >> > > > What's > > >> > > > > > your take on this? > > >> > > > > > > > >> > > > > > > > >> > > > > > Guozhang > > >> > > > > > > > >> > > > > > On Tue, Feb 8, 2022 at 5:17 AM Jorge Esteban Quilcate Otoya > < > > >> > > > > > quilcate.jo...@gmail.com> wrote: > > >> > > > > > > > >> > > > > > > Hi Dev team, > > >> > > > > > > > > >> > > > > > > I'd like to start a new discussion thread on Kafka Streams > > >> KIP-820: > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > >> > > > > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-820%3A+Extend+KStream+process+with+new+Processor+API > > >> > > > > > > > > >> > > > > > > This KIP is aimed to extend the current `KStream#process` > > API > > >> to > > >> > > > return > > >> > > > > > > output values that could be chained across the topology, > as > > >> well as > > >> > > > > > > introducing a new `KStream#processValues` to use processor > > >> while > > >> > > > > > validating > > >> > > > > > > keys haven't change and repartition is not required. > > >> > > > > > > > > >> > > > > > > Looking forward to your feedback. > > >> > > > > > > > > >> > > > > > > Regards, > > >> > > > > > > Jorge. > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > -- > > >> > > > > > -- Guozhang > > >> > > > > > > > >> > > > > > >> > > > > > >> > > > > >> > > -- > > >> > > -- Guozhang > > >> > > > > >> > > >> > > >