Hi Kumud,

Responses inline.

> But, I still believe this logic of predicate checking shouldn't be made a
part of each individual SMT. After all, that is what the predicates are for
right?

I don't quite agree. I think the benefit of predicates is that they can
allow you to selectively apply a transformation based on conditions that
don't directly relate to the behavior of the transformation. For example,
"mask field 'ID' if the 'SSN' header is present". The use cases identified
in the KIP don't quite align with this; instead, they're more about working
around undesirable behavior in transformations that makes them difficult to
use in some pipelines. If an SMT is difficult to use, it's better to just
improve the SMT directly instead of adding a layer of indirection on top of
it.

> `HasField` predicate is something very similar but is more powerful
because
it allows users to skip transformations at SMT level or drop them from the
transformation chain altogether using the current `
org.apache.kafka.connect.transforms.Filter`. So, the use cases are not just
limited to skipping some SMTs.

This use case should be included in the KIP if it's a significant
motivating factor for adding this predicate. But it's also possible to drop
entire records from a pipeline today by setting the 'errors.tolerance'
property to 'all' and allowing all records that cause exceptions to be
thrown in the transformation chain to be skipped/written to a DLQ/logged.

Ultimately, I'm worried that we're encouraging bad habits with SMT
developers by establishing a precedent where core logic that most users
would expect to be present is left out in favor of a predicate, which makes
things harder to configure and can confound new users by adding a layer of
complexity in the form of another pluggable interface.

Cheers,

Chris

On Thu, Jun 2, 2022 at 4:24 AM Kumud Kumar Srivatsava Tirupati <
kumudkumartirup...@gmail.com> wrote:

> Hi Chris,
> Thanks for your comments.
>
> I am totally aligned with your comment on nested field names which include
> dots. I will incorporate the same based on how the KIP-821 discussion goes
> (maybe this parser could be a utility that can be reused in other areas as
> well).
>
> But, I still believe this logic of predicate checking shouldn't be made a
> part of each individual SMT. After all, that is what the predicates are for
> right?
> If you see, the current predicates that we have are:
> org.apache.kafka.connect.transforms.predicates.TopicNameMatches
> org.apache.kafka.connect.transforms.predicates.HasHeaderKey
> org.apache.kafka.connect.transforms.predicates.RecordIsTombstone
> They only allow you filter/transform based on the metadata of the record.
>
> `HasField` predicate is something very similar but is more powerful because
> it allows users to skip transformations at SMT level or drop them from the
> transformation chain altogether using the current `
> org.apache.kafka.connect.transforms.Filter`. So, the use cases are not just
> limited to skipping some SMTs.
> If this makes sense, I should probably add this and give an example in the
> KIP as well.
>
> *---*
> *Thanks and Regards,*
> *Kumud Kumar Srivatsava Tirupati*
>
>
> On Wed, 1 Jun 2022 at 07:09, Chris Egerton <fearthecel...@gmail.com>
> wrote:
>
> > Hi Kumud,
> >
> > Thanks for the KIP. I'm a little bit skeptical about the necessity for
> this
> > predicate but I think we may be able to satisfy your requirements with a
> > slightly different approach. The motivation section deals largely with
> > skipping the invocation of SMTs that expect a certain field to be present
> > in a record, and will fail if it is not present. This seems like a
> > reasonable use case; even when your data has a fixed schema, optional
> > fields are still possible, and preventing SMTs from being used on
> optional
> > fields seems unnecessarily restrictive. In fact, it seems like such a
> > reasonable use case that I wonder if it'd be worth investing in new
> > SMT-level properties to handle this case, instead of requiring users to
> > configure a predicate separately alongside them? Something like an
> > "on.field.missing" property with options of "skip", "fail", and/or "warn"
> > could do the trick.
> >
> > It's also worth noting that the proposed syntax for the "field.path"
> > property in the HasField predicate would make it impossible to refer to
> > field names that have dots in them. It hasn't been finalized yet but if
> we
> > do end up going this route, we should probably use the same syntax that's
> > agreed upon for KIP-821 [1].
> >
> > [1] -
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures
> >
> > Cheers,
> >
> > Chris
> >
> > On Thu, May 26, 2022 at 5:28 AM Sagar <sagarmeansoc...@gmail.com> wrote:
> >
> > > Hi Kumud,
> > >
> > > Thanks for that. I don't have any other comments at this point on the
> > KIP.
> > > LGTM overall.
> > >
> > > Thanks!
> > > Sagar.
> > >
> > > On Wed, May 25, 2022 at 5:14 PM Sagar <sagarmeansoc...@gmail.com>
> wrote:
> > >
> > > > Thanks for the KIP Kumud.
> > > >
> > > > Can you please add a couple of examples on how this would behave with
> > > > different combinations. I think that way it would be easier to
> > > understand.
> > > >
> > > > Thanks!
> > > > Sagar.
> > > >
> > > > On Wed, May 25, 2022 at 4:59 PM Kumud Kumar Srivatsava Tirupati <
> > > > kumudkumartirup...@gmail.com> wrote:
> > > >
> > > >> Hi all,
> > > >> I have written a KIP for having a new HasField predicate for kafka
> > > connect
> > > >> transforms and would like to start a discussion. Please share your
> > > >> thoughts
> > > >> on the same.
> > > >>
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-845%3A+%27HasField%27+predicate+for+kafka+connect
> > > >>
> > > >> *---*
> > > >> *Thanks and Regards,*
> > > >> *Kumud Kumar Srivatsava Tirupati*
> > > >>
> > > >
> > >
> >
>

Reply via email to