Hi Chris, Thanks for the explanation. This clears my thoughts. I can now agree that the concerns are totally different for SMTs and predicates. I also do agree with you that this might encourage SMTs to be poorly designed.
Do you see this worth considering just for the filter use case? ['errors.tolerance' would make it drop from the pipeline instead of skipping from the transformation chain]. Maybe by further limiting its functionality to not support nested fields. I can now discard the other use-cases I mentioned in the KIP in favor of "Improving the corresponding SMTs". *---* *Thanks and Regards,* *Kumud Kumar Srivatsava Tirupati* *Ph : +91-8686073938* On Fri, 3 Jun 2022 at 18:04, Chris Egerton <fearthecel...@gmail.com> wrote: > Hi Kumud, > > Responses inline. > > > But, I still believe this logic of predicate checking shouldn't be made a > part of each individual SMT. After all, that is what the predicates are for > right? > > I don't quite agree. I think the benefit of predicates is that they can > allow you to selectively apply a transformation based on conditions that > don't directly relate to the behavior of the transformation. For example, > "mask field 'ID' if the 'SSN' header is present". The use cases identified > in the KIP don't quite align with this; instead, they're more about working > around undesirable behavior in transformations that makes them difficult to > use in some pipelines. If an SMT is difficult to use, it's better to just > improve the SMT directly instead of adding a layer of indirection on top of > it. > > > `HasField` predicate is something very similar but is more powerful > because > it allows users to skip transformations at SMT level or drop them from the > transformation chain altogether using the current ` > org.apache.kafka.connect.transforms.Filter`. So, the use cases are not just > limited to skipping some SMTs. > > This use case should be included in the KIP if it's a significant > motivating factor for adding this predicate. But it's also possible to drop > entire records from a pipeline today by setting the 'errors.tolerance' > property to 'all' and allowing all records that cause exceptions to be > thrown in the transformation chain to be skipped/written to a DLQ/logged. > > Ultimately, I'm worried that we're encouraging bad habits with SMT > developers by establishing a precedent where core logic that most users > would expect to be present is left out in favor of a predicate, which makes > things harder to configure and can confound new users by adding a layer of > complexity in the form of another pluggable interface. > > Cheers, > > Chris > > On Thu, Jun 2, 2022 at 4:24 AM Kumud Kumar Srivatsava Tirupati < > kumudkumartirup...@gmail.com> wrote: > >> Hi Chris, >> Thanks for your comments. >> >> I am totally aligned with your comment on nested field names which include >> dots. I will incorporate the same based on how the KIP-821 discussion goes >> (maybe this parser could be a utility that can be reused in other areas as >> well). >> >> But, I still believe this logic of predicate checking shouldn't be made a >> part of each individual SMT. After all, that is what the predicates are >> for >> right? >> If you see, the current predicates that we have are: >> org.apache.kafka.connect.transforms.predicates.TopicNameMatches >> org.apache.kafka.connect.transforms.predicates.HasHeaderKey >> org.apache.kafka.connect.transforms.predicates.RecordIsTombstone >> They only allow you filter/transform based on the metadata of the record. >> >> `HasField` predicate is something very similar but is more powerful >> because >> it allows users to skip transformations at SMT level or drop them from the >> transformation chain altogether using the current ` >> org.apache.kafka.connect.transforms.Filter`. So, the use cases are not >> just >> limited to skipping some SMTs. >> If this makes sense, I should probably add this and give an example in the >> KIP as well. >> >> *---* >> *Thanks and Regards,* >> *Kumud Kumar Srivatsava Tirupati* >> >> >> On Wed, 1 Jun 2022 at 07:09, Chris Egerton <fearthecel...@gmail.com> >> wrote: >> >> > Hi Kumud, >> > >> > Thanks for the KIP. I'm a little bit skeptical about the necessity for >> this >> > predicate but I think we may be able to satisfy your requirements with a >> > slightly different approach. The motivation section deals largely with >> > skipping the invocation of SMTs that expect a certain field to be >> present >> > in a record, and will fail if it is not present. This seems like a >> > reasonable use case; even when your data has a fixed schema, optional >> > fields are still possible, and preventing SMTs from being used on >> optional >> > fields seems unnecessarily restrictive. In fact, it seems like such a >> > reasonable use case that I wonder if it'd be worth investing in new >> > SMT-level properties to handle this case, instead of requiring users to >> > configure a predicate separately alongside them? Something like an >> > "on.field.missing" property with options of "skip", "fail", and/or >> "warn" >> > could do the trick. >> > >> > It's also worth noting that the proposed syntax for the "field.path" >> > property in the HasField predicate would make it impossible to refer to >> > field names that have dots in them. It hasn't been finalized yet but if >> we >> > do end up going this route, we should probably use the same syntax >> that's >> > agreed upon for KIP-821 [1]. >> > >> > [1] - >> > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures >> > >> > Cheers, >> > >> > Chris >> > >> > On Thu, May 26, 2022 at 5:28 AM Sagar <sagarmeansoc...@gmail.com> >> wrote: >> > >> > > Hi Kumud, >> > > >> > > Thanks for that. I don't have any other comments at this point on the >> > KIP. >> > > LGTM overall. >> > > >> > > Thanks! >> > > Sagar. >> > > >> > > On Wed, May 25, 2022 at 5:14 PM Sagar <sagarmeansoc...@gmail.com> >> wrote: >> > > >> > > > Thanks for the KIP Kumud. >> > > > >> > > > Can you please add a couple of examples on how this would behave >> with >> > > > different combinations. I think that way it would be easier to >> > > understand. >> > > > >> > > > Thanks! >> > > > Sagar. >> > > > >> > > > On Wed, May 25, 2022 at 4:59 PM Kumud Kumar Srivatsava Tirupati < >> > > > kumudkumartirup...@gmail.com> wrote: >> > > > >> > > >> Hi all, >> > > >> I have written a KIP for having a new HasField predicate for kafka >> > > connect >> > > >> transforms and would like to start a discussion. Please share your >> > > >> thoughts >> > > >> on the same. >> > > >> >> > > >> >> > > >> >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-845%3A+%27HasField%27+predicate+for+kafka+connect >> > > >> >> > > >> *---* >> > > >> *Thanks and Regards,* >> > > >> *Kumud Kumar Srivatsava Tirupati* >> > > >> >> > > > >> > > >> > >> >