Hi,

I've updated the KIP to reflect recent discussions. Please note the
following:

1. I've described a HasHeaderKey predicate rather than HeaderKeyMatches
because at the moment Headers doesn't expose the whole set of headers, only
those with a specific name. Exposing this would significantly increase the
scope of this KIP but relatively little extra benefit.
2. I used org.apache.kafka.connect.transformer.predicates rather than
org.apache.kafka.connect.predicates, since I thought it better reflected
the use of predicates within transforms. I'm flexible on this one if
someone has a good case for the original name. For example the original
package name would be more appropriate if we were expecting Connectors to
make use of Predicates somehow.
3. I've dropped the special case of not needing to provide a FQN for
predicate classes in that package, since this isn't something supported by
transformations themselves as far as I know. I like the idea, but it seemed
inconsistent to need it for transformations but not for predicates.
4. Chris, I've used your suggestions for a validate() method for the extra
configurability needed. I've also added this to Predicate (even though I
don't need it) for consistency with Connector and Transformation.
5. For negating the result of a predicate I decided it was clearer to just
have a "negate" config parameter, supported by all predicates, than to
reach for more punctuation.

I'd be grateful for any more feedback people might have.

Kind regards,

Tom

On Tue, Apr 28, 2020 at 3:50 PM Andrew Schofield <andrew_schofi...@live.com>
wrote:

> Hi Tom,
> I think we should go for a consistent naming convention for the
> predicates, maybe so they
> read nicely if you say "IF" first. Here's a suggestion for predicates that
> the KIP could introduce:
>  - TopicNameMatches
>  - HeaderKeyMatches
>  - RecordIsTombstone
>
> On naming, I also suggest using
> org.apache.kafka.connect.predicates.Predicate
> as the name for the interface.
>
> I hadn't settled on a preference for negation. I suppose there are three
> results
> from evaluating a predicate - true, false or an exception caught by Kafka
> Connect.
> I suggest the exception case is always a predicate fail so the guarded
> transformation
> is skipped. Initially I'd preferred having the predicate be configured to
> return a
> negated result to give a "not match" behaviour. Thinking again now, I
> prefer something like:
>
> transforms: t1
> transforms.t1.?type: RecordIsTombstone
> transforms.t1.type: org.apache.kafka.connect.transforms.InsertField$Value
> transforms.t1.static.field: deleted
> transforms.t1.static.value: true
>
> The "?" means the transform guarded by the predicate runs in the Predicate
> returns true.
>
> transforms: t2
> transforms.t2.?!type: org.apache.kafka.connect.predicates.TopicNameMatch
> transforms.t2.?regex: my-prefix-.*
> transforms.t2.type: org.apache.kafka.connect.transforms.ExtractField$Key
> transforms.t2.field: c1
>
> The "?!" means the transform guarded by the predicate runs if the Predicate
> returns false.
>
> I think adding a Filter SMT that uses a Predicate is a nice idea.
>
> Cheers,
> Andrew
>
> On 28/04/2020, 09:44, "Tom Bentley" <tbent...@redhat.com> wrote:
>
>     Hi Andrew and Chris,
>
>     Firstly, thanks for the input.
>
>     To me, the ?type syntax has some pros and cons. On the pros side:
>     * it's pretty succinct
>     * it's flexible enough for the use cases we've identified so far.
>
>     On the cons side:
>     * it's a bit cryptic; I don't think people who didn't know it already
> would
>     be able to guess what ?type meant.
>     * it doesn't work so well if there is a whole chain of transformations
>     which need to be applied conditionally (that can still be done, of
> course,
>     but the user has to duplicate the configuration for each SMT). Nor
> does it
>     work if you need something akin to a "nested if" (you could fake it
> using
>     And), but we don't really expect people to need that.
>
>     Andrew, you didn't give an example of a negated match. Would you use
>     something like !?type for that, or did you have another idea?
>
>     On balance, I think this is a reasonable trade off.
>
>     I agree that exposing RecordPredicate as a public API, while providing
>     common implementation(s) would allow most use cases to be covered
> without
>     coding, but also allow people with specific unaddressed needs to
> implement
>     their own without having to write a streaming application to do so.
> Again,
>     I think this is a reasonable trade off.
>
>     It would be helpful to have a consensus about exactly which condition
>     implementations the KIP should provide. I think we all agree that
>     TopicMatches (or should it be TopicNameMatches?) should be included.
>     HasHeader (or should it be HasHeaderWithName or HasHeaderMatching for a
>     regex version?), And and Or all seem like they could be useful, even
> though
>     no one has pointed to concrete use cases. It might also be worth
>     considering that it would be natural to add a Filter SMT which filtered
>     records using a RecordPredicate. Perhaps that provides motivation for
>     conditions on the record key as well? If people think the Filter SMT
> would
>     be useful I'm willing to include it in this KIP, since it seems
>     straightforward.
>
>     If people are happy with this approach, I'll update the KIP.
>
>     Kind regards,
>
>     Tom
>
>     On Sun, Apr 26, 2020 at 3:00 PM Andrew Schofield <
> andrew_schofi...@live.com>
>     wrote:
>
>     > Hi,
>     > I'm suggesting that RecordPredicate be another pluggable interface
> that
>     > users
>     > could conceivably implement on their own, and that the KIP
> introduces the
>     > most likely ones as supplied implementations, much as there are SMT
>     > implementations such as HoistField. I don't really see users
> implementing
>     > their own RecordPredicate, but the supplied ones would probably have
> a
>     > common
>     > interface and it seems a pity not to expose that as an external API.
>     >
>     > SMT is essentially a toolkit for Kafka Connect and I think
> RecordPredicate
>     > would be a nice minor enhancement and I'd be happy to contribute
> code.
>     >
>     > In order to make it simpler to configure, I suggest that the class
> name
>     > configured
>     > using "?type" would be assumed to be in package
>     > org.apache.kafka.connect.predicates
>     > in the absence of a package name so that shorter names such as
>     > "TopicMatches"
>     > could be used.
>     >
>     > Cheers,
>     > Andrew Schofield
>     > IBM Event Streams
>     >
>     > On 25/04/2020, 21:22, "Christopher Egerton" <chr...@confluent.io>
> wrote:
>     >
>     >     Hi Andrew,
>     >
>     >     I know a DSL seems like overkill, and I'm not attached to it by
> any
>     > means,
>     >     but I do think it serves a vital purpose in that it allows
> people who
>     > don't
>     >     know how or have the time to write Java code to act on data being
>     > processed
>     >     by connectors.
>     >
>     >     Are you proposing that the "RecordPredicate" be another pluggable
>     > interface
>     >     that users could then implement on their own? Or, would this be
> purely
>     > a
>     >     syntactic expansion to SMT configuration with some subset of
> hard-coded
>     >     predicates provided by the framework? I think there's value in
> the
>     > latter,
>     >     but the former doesn't seem like it'd bring much to the table as
> far as
>     >     users are concerned, and for developers, it is an improvement,
> but not
>     > a
>     >     major one.
>     >
>     >     Cheers,
>     >
>     >     Chris
>     >
>     >     On Fri, Apr 24, 2020 at 12:04 PM Andrew Schofield <
>     >     schofieldandr...@gmail.com> wrote:
>     >
>     >     > I wonder whether we're getting a bit overcomplicated with
> this. I
>     > think all
>     >     > that's required here is to add an optional guard predicate for
> a
>     >     > Transformation.
>     >     > The predicate cannot end the Transformation chain, but it can
> allow
>     > or
>     >     > prevent a particular
>     >     > Transformation from running.
>     >     >
>     >     > How about this as syntax?
>     >     > transforms: extractInt
>     >     > transforms.extractInt.?type:
>     >     > org.apache.kafka.connect.predicates.TopicMatches
>     >     > transforms.extractInt.?regex: my-prefix-.*
>     >     > transforms.extractInt.type:
>     >     > org.apache.kafka.connect.transforms.ExtractField$Key
>     >     > transforms.extractInt.field: c1
>     >     >
>     >     > The idea is that a Transformation can have an optional
>     > RecordPredicate.
>     >     > The RecordPredicate
>     >     > can have configuration items, in a similar way to a
> Transformation.
>     > The
>     >     > syntax of
>     >     > using a '?' prefix to separate configuration for the
> RecordPredicate
>     > from
>     >     > the configuration
>     >     > for the Transformation could conceivably clash with an existing
>     >     > Transformation
>     >     > but the chances are minimal.
>     >     >
>     >     > This syntax doesn't allow for an 'else', but if the KIP offers
> say a
>     >     > TopicMatches predicate
>     >     > then that can be configured to return FALSE if the predicate
> matches.
>     >     >
>     >     > I feel that a DSL for SMTs is overkill. If you need something
> that
>     >     > complex, it's
>     >     > perhaps too complex for a transformation chain and it's really
> a
>     > streaming
>     >     > application.
>     >     >
>     >     > Andrew Schofield
>     >     > IBM Event Streams
>     >     >
>     >     > On 2020/04/08 21:39:31, Christopher Egerton <
> chr...@confluent.io>
>     > wrote:
>     >     > > Hi Tom,
>     >     > >
>     >     > > With regards to the potential Transformation::validate
> method, I
>     > don't
>     >     > > quite follow your objections. The AbstractHerder class,
>     > ConnectorConfig
>     >     > > class, and embedding of ConfigDefs that happens with the two
> is all
>     >     > > internal logic and we're allowed to modify it however we
> want, as
>     > long as
>     >     > > it doesn't alter any public-facing APIs (and if it does, we
> just
>     > have to
>     >     > > document those changes in a KIP). We don't have to embed the
>     > ConfigDef
>     >     > for
>     >     > > a transformation chain inside another ConfigDef if the API
> we want
>     > to
>     >     > > present to our users doesn't play well with that part of the
> code
>     > base.
>     >     > > Additionally, beyond aligning with the existing API provided
> for
>     >     > > connectors, another advantage is that it becomes possible to
>     > validate
>     >     > > properties for a configuration in the context of all other
>     > properties, so
>     >     > > to be clear, it's not just about preserving what may be
> perceived
>     > as a
>     >     > > superficial similarity and comes with real functional
> benefits
>     > that can't
>     >     > > be provided (at least, not as easily) by dynamic
> construction of a
>     >     > > ConfigDef object.
>     >     > >
>     >     > > As far as the new proposal goes, I hate to say it, but I
> think
>     > we're
>     >     > stuck
>     >     > > with the worst of both worlds here. Adding the new
> RecordPredicate
>     >     > > interface seems like it defeats the whole purpose of SMTs,
> which
>     > is to
>     >     > > allow manipulation of a connector's event stream by users
> who don't
>     >     > > necessarily know how or have the time to write Java code of
> their
>     > own.
>     >     > This
>     >     > > is also why I'm in favor of adding a lightweight DSL for the
>     > condition;
>     >     > > emphasizing readability for people who aren't very familiar
> with
>     > Connect
>     >     > > and just want to get something going quickly should be a
> priority
>     > for
>     >     > SMTs.
>     >     > > But if that's not going to happen with this KIP, I'd still
> choose
>     > the
>     >     > > simpler, less-flexible approach initially outlined, in order
> to
>     > keep
>     >     > things
>     >     > > simple for people creating connectors and try to let them
>     > accomplish what
>     >     > > they want via configuration, not code.
>     >     > >
>     >     > > With regards to the question about where the line should be
> drawn
>     > and how
>     >     > > much is too much and comparisons to other stream processing
>     > frameworks, I
>     >     > > think the nature of SMTs draws the line quite nicely: you
> can only
>     >     > process
>     >     > > one message at a time. There's plenty of use cases out there
> for
>     >     > > heavier-duty processing frameworks like Kafka Streams, with
>     > aggregate
>     >     > > operations, joining of streams, expanding a single message
> into
>     > multiple
>     >     > > messages, etc. With SMTs, none of this is possible; the
> general
>     > use case
>     >     > is
>     >     > > to filter and clean a stream of data. If any of the
> heavier-weight
>     > logic
>     >     > > provided by, e.g., Streams, isn't required for a project, it
>     > should be
>     >     > > possible to get along with just a collection of sink
> connectors,
>     > source
>     >     > > connectors, a converter or two, and SMTs that smooth over any
>     > differences
>     >     > > in data format between what source connectors produce and
> what sink
>     >     > > connectors expect. This is why I'm comfortable suggesting
> heavy
>     > expansion
>     >     > > of the out-of-the-box SMTs that we provide with Connect; as
> long as
>     >     > they're
>     >     > > reasonable to configure, they can greatly reduce the
> operational
>     > burden
>     >     > on
>     >     > > anyone running and/or using a Connect cluster since they
> might
>     > entirely
>     >     > > replace additional services that would otherwise be required.
>     >     > >
>     >     > > All that said--this is a huge ask of someone who just wants
> to
>     > support
>     >     > the
>     >     > > SMT equivalent of an "if" statement, and it's totally
>     > understandable if
>     >     > > that's too much to ask. The one concern I have left is that
> if we
>     > expand
>     >     > > the SMT in the future, there become compatibility concerns
> since
>     > SMTs are
>     >     > > pretty tightly-coupled with the worker on which they run
> (although
>     >     > > technically, with some classpath/plugin path shuffling, you
> can run
>     >     > > different versions of the out-of-the-box SMTs from the
> Connect
>     > worker on
>     >     > > which they're run). Someone might write a connector config
> with the
>     >     > > If/Conditional/Whatever SMT with a condition type that works
> on one
>     >     > worker,
>     >     > > but that doesn't work on another worker that's running an
> earlier
>     > version
>     >     > > of Connect. This is why I'm in favor of adding extra
> predicates now
>     >     > instead
>     >     > > of later; if we're going to implement what has the potential
> to be
>     > a bit
>     >     > of
>     >     > > a Swiss army knife SMT with room for future expansion of
>     > configuration,
>     >     > and
>     >     > > we can think of a reasonable way to add that functionality
> now, it
>     > seems
>     >     > > better for users and administrators of Connect to try to do
> that
>     > now
>     >     > > instead of later.
>     >     > >
>     >     > > Cheers,
>     >     > >
>     >     > > Chris
>     >     > >
>     >     > > On Wed, Apr 8, 2020 at 9:45 AM Tom Bentley <
> tbent...@redhat.com>
>     > wrote:
>     >     > >
>     >     > > > Since no one objected I've updated the KIP with the
> revised way
>     > to
>     >     > > > configure this transformation.
>     >     > > >
>     >     > > > On Mon, Apr 6, 2020 at 2:52 PM Tom Bentley <
> tbent...@redhat.com>
>     >     > wrote:
>     >     > > >
>     >     > > > > To come back about a point Chris made:
>     >     > > > >
>     >     > > > > 1. Instead of the new "ConfigDef config(Map<String,
> String>
>     > props)"
>     >     > > > method,
>     >     > > > >> what would you think about adopting a similar approach
> as the
>     >     > framework
>     >     > > > >> uses with connectors, by adding a "Config
> validate(Map<String,
>     >     > String>
>     >     > > > >> props)" method that can perform custom validation
> outside of
>     > what
>     >     > can be
>     >     > > > >> performed by the ConfigDef's single-property-at-a-time
>     > validation?
>     >     > It
>     >     > > > may
>     >     > > > >> be a little heavyweight for use with this particular
> SMT, but
>     > it'd
>     >     > > > provide
>     >     > > > >> more flexibility for other SMT implementations and would
>     > mirror an
>     >     > API
>     >     > > > >> that
>     >     > > > >> developers targeting the framework are likely already
>     > familiar with.
>     >     > > > >>
>     >     > > > >
>     >     > > > > The validate() + config() approach taken for Connectors
>     > doesn't quite
>     >     > > > work
>     >     > > > > for Transformations.
>     >     > > > >
>     >     > > > > The default Connector.validate() basically just calls
>     >     > > > > `config().validate(connectorConfigs)` and returns a
> Config.
>     >     > Separately
>     >     > > > the
>     >     > > > > AbstractHeader also calls `config()` and uses the Config
> and
>     >     > ConfigDef to
>     >     > > > > build the ConfigInfos. The contract is not really
> defined well
>     >     > enough in
>     >     > > > > the javadoc, but this means a connector can use
> validate() to
>     > build
>     >     > the
>     >     > > > > ConfigDef which it will return from config().
>     >     > > > >
>     >     > > > > For Transformations we don't really want validate() to
> perform
>     >     > validation
>     >     > > > > at all since ultimately the transformations config will
> be
>     > embedded
>     >     > in
>     >     > > > the
>     >     > > > > connector's and will be validated by the connector
> itself. It
>     >     > wouldn't be
>     >     > > > > harmful if it _did_ perform validation, just
> unnecessary. But
>     > having
>     >     > a
>     >     > > > > validate() method that ought not to validate() seems
> like a
>     > recipe
>     >     > for
>     >     > > > > confusion.
>     >     > > > >
>     >     > > > > Also problematic is that there's no use for the Config
>     > returned from
>     >     > > > > Transformations.validate().
>     >     > > > >
>     >     > > > > So I'm not convinced that the superficial similarity
> really
>     > gains
>     >     > > > anything.
>     >     > > > >
>     >     > > > > Kind regards,
>     >     > > > >
>     >     > > > > Tom
>     >     > > > >
>     >     > > > > On Mon, Apr 6, 2020 at 2:29 PM Tom Bentley <
>     > tbent...@redhat.com>
>     >     > wrote:
>     >     > > > >
>     >     > > > >> Hi,
>     >     > > > >>
>     >     > > > >> Hi all,
>     >     > > > >>
>     >     > > > >> Thanks for the discussion so far.
>     >     > > > >>
>     >     > > > >> It seems a bit weird to me that when configuring the
>     > Conditional SMT
>     >     > > > with
>     >     > > > >> a DSL you would use a concise, intuitive DSL for
> expressing
>     > the
>     >     > > > condition,
>     >     > > > >> but not for the transforms that it's guarding. It also
> seems
>     >     > natural, if
>     >     > > > >> you support this for conditionally applying SMTs, that
> you'd
>     > soon
>     >     > want
>     >     > > > to
>     >     > > > >> support the same thing for a generic filter transformer.
>     > Then, if
>     >     > you
>     >     > > > can
>     >     > > > >> configure a filter transformation using this DSL, it
> becomes
>     > odd
>     >     > that
>     >     > > > you
>     >     > > > >> can't do this for mapping transformations. I think it
> would
>     > be a
>     >     > > > mistake to
>     >     > > > >> go specifying an expression language for conditions when
>     > really that
>     >     > > > might
>     >     > > > >> just be part of a language for transformations.
>     >     > > > >>
>     >     > > > >> I think it would be possible today to write an SMT which
>     > allowed
>     >     > you to
>     >     > > > >> express the transformation in a DSL. Concretely, it's
>     > possible to
>     >     > > > imagine a
>     >     > > > >> single transformation taking a DSL something like this:
>     >     > > > >>
>     >     > > > >>   compose(
>     >     > > > >>     Flatten.Key(delimiter: ':'),
>     >     > > > >>     If(condition: TopicMatches(pattern: "blah-.*"),
>     >     > > > >>        then: Flatten.Value(delimiter: '/')))
>     >     > > > >>
>     >     > > > >> All I've really done here, beyond what was already
> proposed,
>     > is
>     >     > rename
>     >     > > > >> Conditional to If, imagine a couple more bits of syntax
>     >     > > > >> (SMTs being constructed by class name and named argument
>     > invocation
>     >     > > > >> syntax to support SMT constructors) and add a higher
> level
>     > compose()
>     >     > > > >> function for chaining SMTs (which could easily be
> replaced
>     > with
>     >     > brace
>     >     > > > >> delimited blocks).
>     >     > > > >>
>     >     > > > >> That may be a discussion we should have, but I think in
>     > _this_ KIP
>     >     > we
>     >     > > > >> should focus on the conditional part, since the above
> example
>     >     > hopefully
>     >     > > > >> shows that it would be possible to reuse it in a DSL if
> there
>     > was
>     >     > > > appetite
>     >     > > > >> for that.
>     >     > > > >>
>     >     > > > >> With that in mind, and since my original suggestion for
> an
>     >     > abbreviated
>     >     > > > >> config syntax didn't appeal to people, I propose that we
>     > stick with
>     >     > the
>     >     > > > >> existing norms for configuring this.
>     >     > > > >> The previous example would look like this:
>     >     > > > >>
>     >     > > > >> transformation: flattenKey,if
>     >     > > > >> transformation.flattenKey.type: Flatten.Key
>     >     > > > >> transformation.flattenKey.delimiter: :
>     >     > > > >> transformation.if.type: If
>     >     > > > >> transformation.if.condition.type: TopicMatches
>     >     > > > >> transformation.if.condition.pattern: blah-.*
>     >     > > > >> transformation.if.then: flattenValue
>     >     > > > >> transformation.if.then.flattenValue.type: Flatten.Value
>     >     > > > >> transformation.if.then.flattenValue.delimiter: /
>     >     > > > >>
>     >     > > > >> Also, I'm inclined to think we should stick to just
> supporting
>     >     > > > >> TopicMatches and Not, since we've not identified an
> actual
>     > need for
>     >     > > > >> 'has-header', 'or' and 'and'.
>     >     > > > >>
>     >     > > > >> An example of the usage of Not would look like this:
>     >     > > > >>
>     >     > > > >> transformation: flattenKey,if
>     >     > > > >> transformation.flattenKey.type: Flatten.Key
>     >     > > > >> transformation.flattenKey.delimiter: :
>     >     > > > >> transformation.if.type: If
>     >     > > > >> transformation.if.condition.type: Not
>     >     > > > >> transformation.if.condition.operand: startsWithBlah
>     >     > > > >> transformation.if.condition.operand.startsWithBlah.type:
>     >     > TopicMatches
>     >     > > > >>
> transformation.if.condition.operand.startsWithBlah.pattern:
>     > blah-.*
>     >     > > > >> transformation.if.then: flattenValue
>     >     > > > >> transformation.if.then.flattenValue.type: Flatten.Value
>     >     > > > >> transformation.if.then.flattenValue.delimiter: /
>     >     > > > >>
>     >     > > > >> Hopefully we can agree that this allows the conditional
> SMT
>     > part to
>     >     > make
>     >     > > > >> progress without getting bogged down.
>     >     > > > >>
>     >     > > > >> Thoughts? If this seems broadly reasonable, I'll update
> the
>     > KIP.
>     >     > > > >>
>     >     > > > >> Kind regards,
>     >     > > > >>
>     >     > > > >> Tom
>     >     > > > >>
>     >     > > > >> On Fri, Apr 3, 2020 at 5:55 PM Gunnar Morling <
>     > gun...@hibernate.org
>     >     > >
>     >     > > > >> wrote:
>     >     > > > >>
>     >     > > > >>> Hi all,
>     >     > > > >>>
>     >     > > > >>> Thanks a lot for this initiative, Tom!
>     >     > > > >>>
>     >     > > > >>> To shed some light, the use case where this first came
> up,
>     > were
>     >     > issues
>     >     > > > >>> we saw with SMTs being applied to the different topics
>     > produced by
>     >     > the
>     >     > > > >>> Debezium change data capture connectors. There are
> different
>     > kinds
>     >     > of
>     >     > > > >>> topics (for change data, schema history, heartbeats
> etc.)
>     > and the
>     >     > > > >>> record structure to expect may vary between those.
> Hence we
>     > saw
>     >     > issues
>     >     > > > >>> with SMTs like ExtractField, which for instance only
> should
>     > be
>     >     > applied
>     >     > > > >>> to all change data topics but not the other ones.
>     >     > > > >>>
>     >     > > > >>> I like the overall approach; for Debezium's purposes,
> the
>     > simple
>     >     > topic
>     >     > > > >>> matching and negation operators would be sufficient
> already.
>     > I
>     >     > agree
>     >     > > > >>> with Chris and would prefer one single condition
> attribute,
>     > which
>     >     > > > >>> contains a single condition or potentially a logical
>     > expression
>     >     > with
>     >     > > > >>> not, and, etc. I think it's less ambiguous, in
> particular
>     > when it
>     >     > > > >>> comes to ordering of the different conditions and
>     > determining their
>     >     > > > >>> precedence.
>     >     > > > >>>
>     >     > > > >>> Would love to see this feature in one or another way in
>     > Connect.
>     >     > > > >>>
>     >     > > > >>> Best,
>     >     > > > >>>
>     >     > > > >>> --Gunnar
>     >     > > > >>>
>     >     > > > >>>
>     >     > > > >>>
>     >     > > > >>> Am Do., 2. Apr. 2020 um 18:48 Uhr schrieb Tom Bentley <
>     >     > > > >>> tbent...@redhat.com>:
>     >     > > > >>> >
>     >     > > > >>> > Hi Chris and Sönke,
>     >     > > > >>> >
>     >     > > > >>> > Using the numbering from Chris's email...
>     >     > > > >>> >
>     >     > > > >>> > 1. That's a good point, I'll see what is needed to
> make
>     > that
>     >     > work.
>     >     > > > >>> >
>     >     > > > >>> > 2. I'm happy enough to add support for "and" and
> "or" as
>     > part of
>     >     > this
>     >     > > > >>> KIP
>     >     > > > >>> > if people can see a need for it.
>     >     > > > >>> >
>     >     > > > >>> > In a similar vein, I was wondering about whether it
> would
>     > be
>     >     > > > worthwhile
>     >     > > > >>> > having the equivalent of an "else" clause (what's in
> the
>     > KIP is
>     >     > > > >>> basically
>     >     > > > >>> > an "if" statement). Without support for "else" I
> think
>     > people
>     >     > would
>     >     > > > >>> often
>     >     > > > >>> > need two conditionals, with the condition of one
> being the
>     >     > negation
>     >     > > > of
>     >     > > > >>> the
>     >     > > > >>> > condition of another.
>     >     > > > >>> >
>     >     > > > >>> > 3. I can see the attraction of an expression
> language. The
>     > pros
>     >     > > > include
>     >     > > > >>> > being terse and familiar to programmers and
> potentially
>     > very
>     >     > flexible
>     >     > > > >>> if
>     >     > > > >>> > that's needed in the future. I had a play and
> implemented
>     > it
>     >     > using
>     >     > > > >>> ANTLR
>     >     > > > >>> > and it's not difficult to write a grammar and
> implement the
>     >     > functions
>     >     > > > >>> we've
>     >     > > > >>> > already discussed and get decent error messages when
> the
>     >     > expression
>     >     > > > is
>     >     > > > >>> > malformed. So on the one hand I quite like the idea.
> On
>     > the other
>     >     > > > hand
>     >     > > > >>> it
>     >     > > > >>> > feels like overkill for the use cases that have
> actually
>     > been
>     >     > > > >>> identified so
>     >     > > > >>> > far.
>     >     > > > >>> >
>     >     > > > >>> > @Sönke what do you make of the expression language
> idea?
>     >     > > > >>> >
>     >     > > > >>> > Kind regards,
>     >     > > > >>> >
>     >     > > > >>> > Tom
>     >     > > > >>> >
>     >     > > > >>> > On Wed, Apr 1, 2020 at 9:49 PM Christopher Egerton <
>     >     > > > >>> chr...@confluent.io>
>     >     > > > >>> > wrote:
>     >     > > > >>> >
>     >     > > > >>> > > Hi Tom,
>     >     > > > >>> > >
>     >     > > > >>> > > This looks great and I'd love to see the
> out-of-the-box
>     > SMTs
>     >     > become
>     >     > > > >>> even
>     >     > > > >>> > > more powerful with the improvements you've
> proposed! Got
>     > a few
>     >     > > > >>> remarks and
>     >     > > > >>> > > would be interested in your thoughts:
>     >     > > > >>> > >
>     >     > > > >>> > > 1. Instead of the new "ConfigDef config(Map<String,
>     > String>
>     >     > props)"
>     >     > > > >>> method,
>     >     > > > >>> > > what would you think about adopting a similar
> approach
>     > as the
>     >     > > > >>> framework
>     >     > > > >>> > > uses with connectors, by adding a "Config
>     > validate(Map<String,
>     >     > > > >>> String>
>     >     > > > >>> > > props)" method that can perform custom validation
>     > outside of
>     >     > what
>     >     > > > >>> can be
>     >     > > > >>> > > performed by the ConfigDef's
> single-property-at-a-time
>     >     > validation?
>     >     > > > >>> It may
>     >     > > > >>> > > be a little heavyweight for use with this
> particular
>     > SMT, but
>     >     > it'd
>     >     > > > >>> provide
>     >     > > > >>> > > more flexibility for other SMT implementations and
> would
>     >     > mirror an
>     >     > > > >>> API that
>     >     > > > >>> > > developers targeting the framework are likely
> already
>     > familiar
>     >     > > > with.
>     >     > > > >>> > > 2. The possibility for adding the logical operators
>     > "and" and
>     >     > "or"
>     >     > > > is
>     >     > > > >>> > > mentioned, but only as a potential future change
> and not
>     > one
>     >     > > > >>> proposed by
>     >     > > > >>> > > this KIP. Why not include those operators sooner
> rather
>     > than
>     >     > later?
>     >     > > > >>> > > 3. The syntax for named conditions that are then
>     > referenced in
>     >     > > > >>> logical
>     >     > > > >>> > > operators is... tricky. It took me a few attempts
> to
>     > grok the
>     >     > > > example
>     >     > > > >>> > > provided in the KIP after reading Sönke's question
> about
>     > the
>     >     > > > example
>     >     > > > >>> for
>     >     > > > >>> > > negation. What about a more sophisticated but less
>     > verbose
>     >     > syntax
>     >     > > > >>> that
>     >     > > > >>> > > supports a single configuration for the condition,
> even
>     > with
>     >     > > > logical
>     >     > > > >>> > > operators? I'm thinking something like
>     >     > > > >>> > > "transforms.conditionalExtract.condition:
>     >     > > > not(has-header:my-header)"
>     >     > > > >>> > > instead of the
> "transforms.conditionalExtract.condition:
>     >     > > > >>> not:hasMyHeader"
>     >     > > > >>> > > and
> "transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> > > has-header:my-header" properties. If support for a
>     > logical
>     >     > "and" is
>     >     > > > >>> added,
>     >     > > > >>> > > this could then be expanded to something like
>     >     > > > >>> > > "transforms.conditionalExtract.condition:
>     >     > > > and(has-header(my-header),
>     >     > > > >>> > > not(topic-matches(my-prefix-.*)))". There would be
>     > additional
>     >     > > > >>> complexity
>     >     > > > >>> > > here with the need to escape parentheses and
> commas that
>     > are
>     >     > > > >>> intended to be
>     >     > > > >>> > > treated literally (as part of a header name, for
> example)
>     >     > instead
>     >     > > > of
>     >     > > > >>> as
>     >     > > > >>> > > part of the syntax for the condition itself, but a
> little
>     >     > > > additional
>     >     > > > >>> > > complexity for edge cases like that may be
> warranted if
>     > it
>     >     > heavily
>     >     > > > >>> reduces
>     >     > > > >>> > > complexity for the common cases. The rationale for
> the
>     > proposed
>     >     > > > >>> > > parentheses-based syntax here instead of what's
>     > mentioned in
>     >     > the
>     >     > > > KIP
>     >     > > > >>> > > (something like "and: <condition1>, <condition2>")
> is to
>     > help
>     >     > with
>     >     > > > >>> > > readability; we probably wouldn't need that with
> the
>     > approach
>     >     > of
>     >     > > > >>> naming
>     >     > > > >>> > > conditions via separate properties, but things may
> get a
>     > little
>     >     > > > >>> nasty with
>     >     > > > >>> > > literal conditions included there, especially with
> the
>     >     > possibility
>     >     > > > of
>     >     > > > >>> > > nesting. Finally, the shift in syntax here should
> make it
>     >     > easier to
>     >     > > > >>> handle
>     >     > > > >>> > > cases like the header value matching brought up by
>     > Sönke; it
>     >     > might
>     >     > > > >>> look
>     >     > > > >>> > > something like "header-matches(header-name,
>     >     > header-value-pattern)".
>     >     > > > >>> > >
>     >     > > > >>> > > Cheers,
>     >     > > > >>> > >
>     >     > > > >>> > > Chris
>     >     > > > >>> > >
>     >     > > > >>> > > On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <
>     >     > tbent...@redhat.com>
>     >     > > > >>> wrote:
>     >     > > > >>> > >
>     >     > > > >>> > > > Hi Sönke,
>     >     > > > >>> > > >
>     >     > > > >>> > > > Thanks for taking a look.
>     >     > > > >>> > > >
>     >     > > > >>> > > > Let me answer in reverse order, since I think it
> might
>     > make
>     >     > more
>     >     > > > >>> sense
>     >     > > > >>> > > that
>     >     > > > >>> > > > way...
>     >     > > > >>> > > >
>     >     > > > >>> > > > Also, while writing that, I noticed that you
> have a
>     > version
>     >     > with
>     >     > > > >>> and
>     >     > > > >>> > > > > without "name" for your transformation in the
> KIP:
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
> transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> > > has-header:my-header
>     >     > > > >>> > > > > and
>     >     > > > >>> > > > > transforms.conditionalExtract.condition:
>     >     > has-header:my-header
>     >     > > > >>> > > > >
>     >     > > > >>> > > >
>     >     > > > >>> > > > The example
>     >     > > > >>> > > >     transforms.conditionalExtract.condition:
>     >     > has-header:my-header
>     >     > > > >>> > > > is a "has-header" condition (the prefix of the
> config
>     > value),
>     >     > > > >>> which will
>     >     > > > >>> > > > match records with a "my-header" header (given
> in the
>     >     > suffix).
>     >     > > > >>> > > >
>     >     > > > >>> > > > The other example given is:
>     >     > > > >>> > > >     transforms.conditionalExtract.condition:
>     > not:hasMyHeader
>     >     > > > >>> > > >
>     >  transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> > > > has-header:my-header
>     >     > > > >>> > > > The root of the condition is a "not" condition
> (the
>     > prefix
>     >     > of the
>     >     > > > >>> value
>     >     > > > >>> > > for
>     >     > > > >>> > > > the transforms.conditionalExtract.condition key)
> of
>     > another
>     >     > named
>     >     > > > >>> > > condition
>     >     > > > >>> > > > called "hasMyHeader" (the suffix). Any name
> could be
>     > used
>     >     > for the
>     >     > > > >>> other
>     >     > > > >>> > > > condition. That other condition is configured at
>     >     > > > >>> > > >
>     > "transforms.conditionalExtract.condition.<conditionName>".
>     >     > That
>     >     > > > >>> condition
>     >     > > > >>> > > > is a "has-header" condition (the prefix), which
> will
>     > match
>     >     > > > records
>     >     > > > >>> with a
>     >     > > > >>> > > > "my-header" header (given in the suffix). So the
>     >     > "has-header:"
>     >     > > > >>> condition
>     >     > > > >>> > > > type would always require a suffix, as would the
> "not:"
>     >     > condition
>     >     > > > >>> type.
>     >     > > > >>> > > > Hypothetically you could have a "true" condition
> type
>     > (which
>     >     > > > would
>     >     > > > >>> not
>     >     > > > >>> > > > require a suffix), and the hypothetical binary
>     > conditions
>     >     > "and:"
>     >     > > > >>> and
>     >     > > > >>> > > "or:"
>     >     > > > >>> > > > would require a pair of other condition names.
>     >     > > > >>> > > >
>     >     > > > >>> > > > So what's proposed is a scheme for encoding
> conditions
>     > where
>     >     > the
>     >     > > > >>> > > condition
>     >     > > > >>> > > > type is the prefix of the value of some
> "....condition"
>     >     > config
>     >     > > > >>> key, and
>     >     > > > >>> > > the
>     >     > > > >>> > > > optional suffix provides parameters for the
> condition.
>     > This
>     >     > makes
>     >     > > > >>> those
>     >     > > > >>> > > > parameters a bit inflexible, but is relatively
>     > succinct.
>     >     > > > >>> > > >
>     >     > > > >>> > > > This leads on to your first point. You're right
> that
>     > use
>     >     > cases
>     >     > > > >>> might
>     >     > > > >>> > > appear
>     >     > > > >>> > > > which need other conditions, and we should make
> it
>     > flexible
>     >     > > > enough
>     >     > > > >>> to be
>     >     > > > >>> > > > able to cope with future use cases. On the other
> hand,
>     > I was
>     >     > > > >>> concerned
>     >     > > > >>> > > that
>     >     > > > >>> > > > we end up with something which is quite
> complicated to
>     >     > configure.
>     >     > > > >>> (There
>     >     > > > >>> > > > comes a point where it might makes more sense
> for the
>     > user to
>     >     > > > >>> write their
>     >     > > > >>> > > > own SMT).
>     >     > > > >>> > > >
>     >     > > > >>> > > > Just of the top of my head it might look like:
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
> transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> type:has-header
>     >     > > > >>> > > > >
> transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> > > > header-name:my-header
>     >     > > > >>> > > > >
> transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> > > field-value:my-value
>     >     > > > >>> > > > >
>     >     > > > >>> > > >
>     >     > > > >>> > > > That won't work because the format is basically a
>     >     > > > >>> Properties/Map<String,
>     >     > > > >>> > > > String> and what you've suggested has duplicate
> keys.
>     >     > > > >>> > > >
>     >     > > > >>> > > > One thing I did briefly consider what the
> ability to
>     > treat
>     >     > > > >>> conditions as
>     >     > > > >>> > > > Configurable objects in their own right (opening
> up the
>     >     > > > >>> possibility of
>     >     > > > >>> > > > people supplying their own Conditions, just like
> they
>     > can
>     >     > supply
>     >     > > > >>> their
>     >     > > > >>> > > own
>     >     > > > >>> > > > SMTs). That might be configured something like
> this:
>     >     > > > >>> > > >
>     >     > > > >>> > > >     transforms.conditionalExtract.condition: not
>     >     > > > >>> > > >
>  transforms.conditionalExtract.condition.not.type:
>     > Not
>     >     > > > >>> > > >
>     >  transforms.conditionalExtract.condition.not.negated: foo
>     >     > > > >>> > > >
>  transforms.conditionalExtract.condition.foo.type:
>     >     > > > >>> HasHeaderWithValue
>     >     > > > >>> > > >
>  transforms.conditionalExtract.condition.foo.header:
>     >     > my-header
>     >     > > > >>> > > >
>  transforms.conditionalExtract.condition.foo.value:
>     >     > my-value
>     >     > > > >>> > > >
>     >     > > > >>> > > > I didn't propose that I couldn't see the use
> cases to
>     > justify
>     >     > > > this
>     >     > > > >>> kind
>     >     > > > >>> > > of
>     >     > > > >>> > > > complexity, especially as the common case would
> surely
>     > be
>     >     > > > matching
>     >     > > > >>> > > against
>     >     > > > >>> > > > topic name (to be honest I wasn't completely
> convinced
>     > by the
>     >     > > > need
>     >     > > > >>> for
>     >     > > > >>> > > > "has-header"). In the current version of the KIP
>     > that's just
>     >     > > > >>> > > >
>     >     > > > >>> > > >     transforms.conditionalExtract.condition:
>     > topic-matches:
>     >     > > > >>> my-prefix-.*
>     >     > > > >>> > > >
>     >     > > > >>> > > > but using the more flexible scheme that would
> require
>     >     > something
>     >     > > > >>> more like
>     >     > > > >>> > > > this:
>     >     > > > >>> > > >
>     >     > > > >>> > > >     transforms.conditionalExtract.condition: bar
>     >     > > > >>> > > >
>  transforms.conditionalExtract.condition.bar.type:
>     >     > TopicMatch
>     >     > > > >>> > > >
>     >  transforms.conditionalExtract.condition.bar.pattern:
>     >     > > > >>> my-prefix-.*
>     >     > > > >>> > > >
>     >     > > > >>> > > > If people know of use cases which would justify
> more
>     >     > > > >>> sophistication, I'm
>     >     > > > >>> > > > happy to reconsider.
>     >     > > > >>> > > >
>     >     > > > >>> > > > Thanks again for taking a look!
>     >     > > > >>> > > >
>     >     > > > >>> > > > Tom
>     >     > > > >>> > > >
>     >     > > > >>> > > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
>     >     > > > >>> > > > <soenke.lie...@opencore.com.invalid> wrote:
>     >     > > > >>> > > >
>     >     > > > >>> > > > > Hi Tom,
>     >     > > > >>> > > > >
>     >     > > > >>> > > > > sounds useful to me, thanks for the KIP!
>     >     > > > >>> > > > > The only thought that I had while reading was
> that
>     > this
>     >     > will
>     >     > > > >>> probably
>     >     > > > >>> > > > raise
>     >     > > > >>> > > > > questions about more involved conditions fairly
>     > quickly.
>     >     > For
>     >     > > > >>> example
>     >     > > > >>> > > the
>     >     > > > >>> > > > > "has-header" will cause an appetite for
> conditions
>     > like
>     >     > > > >>> > > > > "this-header-has-that-value".
>     >     > > > >>> > > > > This would necessitate two parameters to be
> passed
>     > into the
>     >     > > > >>> condition,
>     >     > > > >>> > > > > which I think is not currently included in the
> KIP.
>     > I am
>     >     > not
>     >     > > > >>> saying add
>     >     > > > >>> > > > > this now, but might it make sense to discuss a
>     > concept of
>     >     > how
>     >     > > > >>> that
>     >     > > > >>> > > might
>     >     > > > >>> > > > > look now, to avoid potential changes later on.
>     >     > > > >>> > > > >
>     >     > > > >>> > > > > Just of the top of my head it might look like:
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
> transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> type:has-header
>     >     > > > >>> > > > >
> transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> > > > header-name:my-header
>     >     > > > >>> > > > >
> transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> > > field-value:my-value
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
>     >     > > > >>> > > > > Also, while writing that, I noticed that you
> have a
>     > version
>     >     > > > with
>     >     > > > >>> and
>     >     > > > >>> > > > > without "name" for your transformation in the
> KIP:
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
> transforms.conditionalExtract.condition.hasMyHeader:
>     >     > > > >>> > > has-header:my-header
>     >     > > > >>> > > > > and
>     >     > > > >>> > > > > transforms.conditionalExtract.condition:
>     >     > has-header:my-header
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
>     >     > > > >>> > > > > Is this intentional and the name is optional?
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
>     >     > > > >>> > > > > Best regards,
>     >     > > > >>> > > > >
>     >     > > > >>> > > > > Sönke
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
>     >     > > > >>> > > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <
>     >     > tbent...@redhat.com>
>     >     > > > >>> wrote:
>     >     > > > >>> > > > > >
>     >     > > > >>> > > > > > Does anyone have any comments, feedback or
>     > thoughts about
>     >     > > > this?
>     >     > > > >>> > > > > >
>     >     > > > >>> > > > > > Thanks,
>     >     > > > >>> > > > > >
>     >     > > > >>> > > > > > Tom
>     >     > > > >>> > > > > >
>     >     > > > >>> > > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley
> <
>     >     > > > >>> tbent...@redhat.com>
>     >     > > > >>> > > > > wrote:
>     >     > > > >>> > > > > >
>     >     > > > >>> > > > > > > Hi,
>     >     > > > >>> > > > > > >
>     >     > > > >>> > > > > > > I've opened KIP-585 which is intended to
> provide
>     > a
>     >     > > > mechanism
>     >     > > > >>> to
>     >     > > > >>> > > > > > > conditionally apply SMTs in Kafka Connect.
> I'd be
>     >     > grateful
>     >     > > > >>> for any
>     >     > > > >>> > > > > > > feedback:
>     >     > > > >>> > > > > > >
>     >     > > > >>> > > > >
>     >     > > > >>> > > > >
>     >     > > > >>> > > >
>     >     > > > >>> > >
>     >     > > > >>>
>     >     > > >
>     >     >
>     >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
>     >     > > > >>> > > > > > >
>     >     > > > >>> > > > > > > Many thanks,
>     >     > > > >>> > > > > > >
>     >     > > > >>> > > > > > > Tom
>     >     > > > >>> > > > > > >
>     >     > > > >>> > > > >
>     >     > > > >>> > > >
>     >     > > > >>> > >
>     >     > > > >>>
>     >     > > > >>>
>     >     > > >
>     >     > >
>     >     >
>     >
>     >
>
>

Reply via email to