Re: [DISCUSS] KIP-585: Conditional SMT

Tom Bentley Wed, 08 Apr 2020 09:45:22 -0700

Since no one objected I've updated the KIP with the revised way to
configure this transformation.


On Mon, Apr 6, 2020 at 2:52 PM Tom Bentley <tbent...@redhat.com> wrote:

> To come back about a point Chris made:
>
> 1. Instead of the new "ConfigDef config(Map<String, String> props)" method,
>> what would you think about adopting a similar approach as the framework
>> uses with connectors, by adding a "Config validate(Map<String, String>
>> props)" method that can perform custom validation outside of what can be
>> performed by the ConfigDef's single-property-at-a-time validation? It may
>> be a little heavyweight for use with this particular SMT, but it'd provide
>> more flexibility for other SMT implementations and would mirror an API
>> that
>> developers targeting the framework are likely already familiar with.
>>
>
> The validate() + config() approach taken for Connectors doesn't quite work
> for Transformations.
>
> The default Connector.validate() basically just calls
> `config().validate(connectorConfigs)` and returns a Config. Separately the
> AbstractHeader also calls `config()` and uses the Config and ConfigDef to
> build the ConfigInfos. The contract is not really defined well enough in
> the javadoc, but this means a connector can use validate() to build the
> ConfigDef which it will return from config().
>
> For Transformations we don't really want validate() to perform validation
> at all since ultimately the transformations config will be embedded in the
> connector's and will be validated by the connector itself. It wouldn't be
> harmful if it _did_ perform validation, just unnecessary. But having a
> validate() method that ought not to validate() seems like a recipe for
> confusion.
>
> Also problematic is that there's no use for the Config returned from
> Transformations.validate().
>
> So I'm not convinced that the superficial similarity really gains anything.
>
> Kind regards,
>
> Tom
>
> On Mon, Apr 6, 2020 at 2:29 PM Tom Bentley <tbent...@redhat.com> wrote:
>
>> Hi,
>>
>> Hi all,
>>
>> Thanks for the discussion so far.
>>
>> It seems a bit weird to me that when configuring the Conditional SMT with
>> a DSL you would use a concise, intuitive DSL for expressing the condition,
>> but not for the transforms that it's guarding. It also seems natural, if
>> you support this for conditionally applying SMTs, that you'd soon want to
>> support the same thing for a generic filter transformer. Then, if you can
>> configure a filter transformation using this DSL, it becomes odd that you
>> can't do this for mapping transformations. I think it would be a mistake to
>> go specifying an expression language for conditions when really that might
>> just be part of a language for transformations.
>>
>> I think it would be possible today to write an SMT which allowed you to
>> express the transformation in a DSL. Concretely, it's possible to imagine a
>> single transformation taking a DSL something like this:
>>
>>   compose(
>>     Flatten.Key(delimiter: ':'),
>>     If(condition: TopicMatches(pattern: "blah-.*"),
>>        then: Flatten.Value(delimiter: '/')))
>>
>> All I've really done here, beyond what was already proposed, is rename
>> Conditional to If, imagine a couple more bits of syntax
>> (SMTs being constructed by class name and named argument invocation
>> syntax to support SMT constructors) and add a higher level compose()
>> function for chaining SMTs (which could easily be replaced with brace
>> delimited blocks).
>>
>> That may be a discussion we should have, but I think in _this_ KIP we
>> should focus on the conditional part, since the above example hopefully
>> shows that it would be possible to reuse it in a DSL if there was appetite
>> for that.
>>
>> With that in mind, and since my original suggestion for an abbreviated
>> config syntax didn't appeal to people, I propose that we stick with the
>> existing norms for configuring this.
>> The previous example would look like this:
>>
>> transformation: flattenKey,if
>> transformation.flattenKey.type: Flatten.Key
>> transformation.flattenKey.delimiter: :
>> transformation.if.type: If
>> transformation.if.condition.type: TopicMatches
>> transformation.if.condition.pattern: blah-.*
>> transformation.if.then: flattenValue
>> transformation.if.then.flattenValue.type: Flatten.Value
>> transformation.if.then.flattenValue.delimiter: /
>>
>> Also, I'm inclined to think we should stick to just supporting
>> TopicMatches and Not, since we've not identified an actual need for
>> 'has-header', 'or' and 'and'.
>>
>> An example of the usage of Not would look like this:
>>
>> transformation: flattenKey,if
>> transformation.flattenKey.type: Flatten.Key
>> transformation.flattenKey.delimiter: :
>> transformation.if.type: If
>> transformation.if.condition.type: Not
>> transformation.if.condition.operand: startsWithBlah
>> transformation.if.condition.operand.startsWithBlah.type: TopicMatches
>> transformation.if.condition.operand.startsWithBlah.pattern: blah-.*
>> transformation.if.then: flattenValue
>> transformation.if.then.flattenValue.type: Flatten.Value
>> transformation.if.then.flattenValue.delimiter: /
>>
>> Hopefully we can agree that this allows the conditional SMT part to make
>> progress without getting bogged down.
>>
>> Thoughts? If this seems broadly reasonable, I'll update the KIP.
>>
>> Kind regards,
>>
>> Tom
>>
>> On Fri, Apr 3, 2020 at 5:55 PM Gunnar Morling <gun...@hibernate.org>
>> wrote:
>>
>>> Hi all,
>>>
>>> Thanks a lot for this initiative, Tom!
>>>
>>> To shed some light, the use case where this first came up, were issues
>>> we saw with SMTs being applied to the different topics produced by the
>>> Debezium change data capture connectors. There are different kinds of
>>> topics (for change data, schema history, heartbeats etc.) and the
>>> record structure to expect may vary between those. Hence we saw issues
>>> with SMTs like ExtractField, which for instance only should be applied
>>> to all change data topics but not the other ones.
>>>
>>> I like the overall approach; for Debezium's purposes, the simple topic
>>> matching and negation operators would be sufficient already. I agree
>>> with Chris and would prefer one single condition attribute, which
>>> contains a single condition or potentially a logical expression with
>>> not, and, etc. I think it's less ambiguous, in particular when it
>>> comes to ordering of the different conditions and determining their
>>> precedence.
>>>
>>> Would love to see this feature in one or another way in Connect.
>>>
>>> Best,
>>>
>>> --Gunnar
>>>
>>>
>>>
>>> Am Do., 2. Apr. 2020 um 18:48 Uhr schrieb Tom Bentley <
>>> tbent...@redhat.com>:
>>> >
>>> > Hi Chris and Sönke,
>>> >
>>> > Using the numbering from Chris's email...
>>> >
>>> > 1. That's a good point, I'll see what is needed to make that work.
>>> >
>>> > 2. I'm happy enough to add support for "and" and "or" as part of this
>>> KIP
>>> > if people can see a need for it.
>>> >
>>> > In a similar vein, I was wondering about whether it would be worthwhile
>>> > having the equivalent of an "else" clause (what's in the KIP is
>>> basically
>>> > an "if" statement). Without support for "else" I think people would
>>> often
>>> > need two conditionals, with the condition of one being the negation of
>>> the
>>> > condition of another.
>>> >
>>> > 3. I can see the attraction of an expression language. The pros include
>>> > being terse and familiar to programmers and potentially very flexible
>>> if
>>> > that's needed in the future. I had a play and implemented it using
>>> ANTLR
>>> > and it's not difficult to write a grammar and implement the functions
>>> we've
>>> > already discussed and get decent error messages when the expression is
>>> > malformed. So on the one hand I quite like the idea. On the other hand
>>> it
>>> > feels like overkill for the use cases that have actually been
>>> identified so
>>> > far.
>>> >
>>> > @Sönke what do you make of the expression language idea?
>>> >
>>> > Kind regards,
>>> >
>>> > Tom
>>> >
>>> > On Wed, Apr 1, 2020 at 9:49 PM Christopher Egerton <
>>> chr...@confluent.io>
>>> > wrote:
>>> >
>>> > > Hi Tom,
>>> > >
>>> > > This looks great and I'd love to see the out-of-the-box SMTs become
>>> even
>>> > > more powerful with the improvements you've proposed! Got a few
>>> remarks and
>>> > > would be interested in your thoughts:
>>> > >
>>> > > 1. Instead of the new "ConfigDef config(Map<String, String> props)"
>>> method,
>>> > > what would you think about adopting a similar approach as the
>>> framework
>>> > > uses with connectors, by adding a "Config validate(Map<String,
>>> String>
>>> > > props)" method that can perform custom validation outside of what
>>> can be
>>> > > performed by the ConfigDef's single-property-at-a-time validation?
>>> It may
>>> > > be a little heavyweight for use with this particular SMT, but it'd
>>> provide
>>> > > more flexibility for other SMT implementations and would mirror an
>>> API that
>>> > > developers targeting the framework are likely already familiar with.
>>> > > 2. The possibility for adding the logical operators "and" and "or" is
>>> > > mentioned, but only as a potential future change and not one
>>> proposed by
>>> > > this KIP. Why not include those operators sooner rather than later?
>>> > > 3. The syntax for named conditions that are then referenced in
>>> logical
>>> > > operators is... tricky. It took me a few attempts to grok the example
>>> > > provided in the KIP after reading Sönke's question about the example
>>> for
>>> > > negation. What about a more sophisticated but less verbose syntax
>>> that
>>> > > supports a single configuration for the condition, even with logical
>>> > > operators? I'm thinking something like
>>> > > "transforms.conditionalExtract.condition: not(has-header:my-header)"
>>> > > instead of the "transforms.conditionalExtract.condition:
>>> not:hasMyHeader"
>>> > > and "transforms.conditionalExtract.condition.hasMyHeader:
>>> > > has-header:my-header" properties. If support for a logical "and" is
>>> added,
>>> > > this could then be expanded to something like
>>> > > "transforms.conditionalExtract.condition: and(has-header(my-header),
>>> > > not(topic-matches(my-prefix-.*)))". There would be additional
>>> complexity
>>> > > here with the need to escape parentheses and commas that are
>>> intended to be
>>> > > treated literally (as part of a header name, for example) instead of
>>> as
>>> > > part of the syntax for the condition itself, but a little additional
>>> > > complexity for edge cases like that may be warranted if it heavily
>>> reduces
>>> > > complexity for the common cases. The rationale for the proposed
>>> > > parentheses-based syntax here instead of what's mentioned in the KIP
>>> > > (something like "and: <condition1>, <condition2>") is to help with
>>> > > readability; we probably wouldn't need that with the approach of
>>> naming
>>> > > conditions via separate properties, but things may get a little
>>> nasty with
>>> > > literal conditions included there, especially with the possibility of
>>> > > nesting. Finally, the shift in syntax here should make it easier to
>>> handle
>>> > > cases like the header value matching brought up by Sönke; it might
>>> look
>>> > > something like "header-matches(header-name, header-value-pattern)".
>>> > >
>>> > > Cheers,
>>> > >
>>> > > Chris
>>> > >
>>> > > On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <tbent...@redhat.com>
>>> wrote:
>>> > >
>>> > > > Hi Sönke,
>>> > > >
>>> > > > Thanks for taking a look.
>>> > > >
>>> > > > Let me answer in reverse order, since I think it might make more
>>> sense
>>> > > that
>>> > > > way...
>>> > > >
>>> > > > Also, while writing that, I noticed that you have a version with
>>> and
>>> > > > > without "name" for your transformation in the KIP:
>>> > > > >
>>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>>> > > has-header:my-header
>>> > > > > and
>>> > > > > transforms.conditionalExtract.condition: has-header:my-header
>>> > > > >
>>> > > >
>>> > > > The example
>>> > > >     transforms.conditionalExtract.condition: has-header:my-header
>>> > > > is a "has-header" condition (the prefix of the config value),
>>> which will
>>> > > > match records with a "my-header" header (given in the suffix).
>>> > > >
>>> > > > The other example given is:
>>> > > >     transforms.conditionalExtract.condition: not:hasMyHeader
>>> > > >     transforms.conditionalExtract.condition.hasMyHeader:
>>> > > > has-header:my-header
>>> > > > The root of the condition is a "not" condition (the prefix of the
>>> value
>>> > > for
>>> > > > the transforms.conditionalExtract.condition key) of another named
>>> > > condition
>>> > > > called "hasMyHeader" (the suffix). Any name could be used for the
>>> other
>>> > > > condition. That other condition is configured at
>>> > > > "transforms.conditionalExtract.condition.<conditionName>". That
>>> condition
>>> > > > is a "has-header" condition (the prefix), which will match records
>>> with a
>>> > > > "my-header" header (given in the suffix). So the "has-header:"
>>> condition
>>> > > > type would always require a suffix, as would the "not:" condition
>>> type.
>>> > > > Hypothetically you could have a "true" condition type (which would
>>> not
>>> > > > require a suffix), and the hypothetical binary conditions "and:"
>>> and
>>> > > "or:"
>>> > > > would require a pair of other condition names.
>>> > > >
>>> > > > So what's proposed is a scheme for encoding conditions where the
>>> > > condition
>>> > > > type is the prefix of the value of some "....condition" config
>>> key, and
>>> > > the
>>> > > > optional suffix provides parameters for the condition. This makes
>>> those
>>> > > > parameters a bit inflexible, but is relatively succinct.
>>> > > >
>>> > > > This leads on to your first point. You're right that use cases
>>> might
>>> > > appear
>>> > > > which need other conditions, and we should make it flexible enough
>>> to be
>>> > > > able to cope with future use cases. On the other hand, I was
>>> concerned
>>> > > that
>>> > > > we end up with something which is quite complicated to configure.
>>> (There
>>> > > > comes a point where it might makes more sense for the user to
>>> write their
>>> > > > own SMT).
>>> > > >
>>> > > > Just of the top of my head it might look like:
>>> > > > >
>>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>>> type:has-header
>>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>>> > > > header-name:my-header
>>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>>> > > field-value:my-value
>>> > > > >
>>> > > >
>>> > > > That won't work because the format is basically a
>>> Properties/Map<String,
>>> > > > String> and what you've suggested has duplicate keys.
>>> > > >
>>> > > > One thing I did briefly consider what the ability to treat
>>> conditions as
>>> > > > Configurable objects in their own right (opening up the
>>> possibility of
>>> > > > people supplying their own Conditions, just like they can supply
>>> their
>>> > > own
>>> > > > SMTs). That might be configured something like this:
>>> > > >
>>> > > >     transforms.conditionalExtract.condition: not
>>> > > >     transforms.conditionalExtract.condition.not.type: Not
>>> > > >     transforms.conditionalExtract.condition.not.negated: foo
>>> > > >     transforms.conditionalExtract.condition.foo.type:
>>> HasHeaderWithValue
>>> > > >     transforms.conditionalExtract.condition.foo.header: my-header
>>> > > >     transforms.conditionalExtract.condition.foo.value: my-value
>>> > > >
>>> > > > I didn't propose that I couldn't see the use cases to justify this
>>> kind
>>> > > of
>>> > > > complexity, especially as the common case would surely be matching
>>> > > against
>>> > > > topic name (to be honest I wasn't completely convinced by the need
>>> for
>>> > > > "has-header"). In the current version of the KIP that's just
>>> > > >
>>> > > >     transforms.conditionalExtract.condition: topic-matches:
>>> my-prefix-.*
>>> > > >
>>> > > > but using the more flexible scheme that would require something
>>> more like
>>> > > > this:
>>> > > >
>>> > > >     transforms.conditionalExtract.condition: bar
>>> > > >     transforms.conditionalExtract.condition.bar.type: TopicMatch
>>> > > >     transforms.conditionalExtract.condition.bar.pattern:
>>> my-prefix-.*
>>> > > >
>>> > > > If people know of use cases which would justify more
>>> sophistication, I'm
>>> > > > happy to reconsider.
>>> > > >
>>> > > > Thanks again for taking a look!
>>> > > >
>>> > > > Tom
>>> > > >
>>> > > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
>>> > > > <soenke.lie...@opencore.com.invalid> wrote:
>>> > > >
>>> > > > > Hi Tom,
>>> > > > >
>>> > > > > sounds useful to me, thanks for the KIP!
>>> > > > > The only thought that I had while reading was that this will
>>> probably
>>> > > > raise
>>> > > > > questions about more involved conditions fairly quickly. For
>>> example
>>> > > the
>>> > > > > "has-header" will cause an appetite for conditions like
>>> > > > > "this-header-has-that-value".
>>> > > > > This would necessitate two parameters to be passed into the
>>> condition,
>>> > > > > which I think is not currently included in the KIP. I am not
>>> saying add
>>> > > > > this now, but might it make sense to discuss a concept of how
>>> that
>>> > > might
>>> > > > > look now, to avoid potential changes later on.
>>> > > > >
>>> > > > > Just of the top of my head it might look like:
>>> > > > >
>>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>>> type:has-header
>>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>>> > > > header-name:my-header
>>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>>> > > field-value:my-value
>>> > > > >
>>> > > > >
>>> > > > > Also, while writing that, I noticed that you have a version with
>>> and
>>> > > > > without "name" for your transformation in the KIP:
>>> > > > >
>>> > > > > transforms.conditionalExtract.condition.hasMyHeader:
>>> > > has-header:my-header
>>> > > > > and
>>> > > > > transforms.conditionalExtract.condition: has-header:my-header
>>> > > > >
>>> > > > >
>>> > > > > Is this intentional and the name is optional?
>>> > > > >
>>> > > > >
>>> > > > > Best regards,
>>> > > > >
>>> > > > > Sönke
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <tbent...@redhat.com>
>>> wrote:
>>> > > > > >
>>> > > > > > Does anyone have any comments, feedback or thoughts about this?
>>> > > > > >
>>> > > > > > Thanks,
>>> > > > > >
>>> > > > > > Tom
>>> > > > > >
>>> > > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley <
>>> tbent...@redhat.com>
>>> > > > > wrote:
>>> > > > > >
>>> > > > > > > Hi,
>>> > > > > > >
>>> > > > > > > I've opened KIP-585 which is intended to provide a mechanism
>>> to
>>> > > > > > > conditionally apply SMTs in Kafka Connect. I'd be grateful
>>> for any
>>> > > > > > > feedback:
>>> > > > > > >
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
>>> > > > > > >
>>> > > > > > > Many thanks,
>>> > > > > > >
>>> > > > > > > Tom
>>> > > > > > >
>>> > > > >
>>> > > >
>>> > >
>>>
>>>

Re: [DISCUSS] KIP-585: Conditional SMT

Reply via email to