mjsax commented on PR #16684: URL: https://github.com/apache/kafka/pull/16684#issuecomment-2253080256
Ah sorry. Mixed up the handler and the context... My bad. > > do we have to consider that the the first point of this comment https://github.com/apache/kafka/pull/16684#issuecomment-2250417070 is enough ? > > Checking in ProcessorNode#process if rawRecord != null before accessing the raw key and the raw value. It is the safest approach for me and will avoid crashing to NPE. The drawback is the processing exception handler can end up with no sourceRawKey nor sourceRawValue while the values are actually available in upstream Yes, that is what I meant here: > Thus, making a step back, I am wondering why we not just pass in the current key/value (or full Record) into the handler? Of course, for doing a DQL which is a follow up feature we want to build on top, having something unserialized at hand might not be ideal, but at the source-node level we should always be able to pass in the unserialized source data. -- Should we change the handler to pass in both current input Record and source raw key/value (making both sourceRawKey and sourceRawValue type Optional<byte[]>)? In the end, messing with the store cache seems to be brittle, and not solve the problem for all cases? Do we really think it would be the right way forward? (Also, I think that caches are only an issue in the DSL, for which we couple cache flushing and forwarding -- for the PAPI, caching and forwarding is independent, and there won't be a reason to add any raw record, but it would just be a waste of memory.) I think there is too many corner cases right now, and we might just want to go the path of least resistance. So just doing the `null` check seems reasonable to me for now. But as said, I would change to `Optional` type to highlight was both `sourceRawKey` and `sourceRawValue` may be `null`. > The drawback is the processing exception handler can end up with no sourceRawKey nor sourceRawValue while the values are actually available in upstream Well, if we find good way later, to set both correctly, it would just be an improvement we can do anytime. The API contract just says, may or may not be there (if we use `Optional`) but we don't define when it will be there and when not, so we can change it IMHO w/o the need of a KIP or anything. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
