Since no one objected I've updated the KIP with the revised way to configure this transformation.
On Mon, Apr 6, 2020 at 2:52 PM Tom Bentley <tbent...@redhat.com> wrote: > To come back about a point Chris made: > > 1. Instead of the new "ConfigDef config(Map<String, String> props)" method, >> what would you think about adopting a similar approach as the framework >> uses with connectors, by adding a "Config validate(Map<String, String> >> props)" method that can perform custom validation outside of what can be >> performed by the ConfigDef's single-property-at-a-time validation? It may >> be a little heavyweight for use with this particular SMT, but it'd provide >> more flexibility for other SMT implementations and would mirror an API >> that >> developers targeting the framework are likely already familiar with. >> > > The validate() + config() approach taken for Connectors doesn't quite work > for Transformations. > > The default Connector.validate() basically just calls > `config().validate(connectorConfigs)` and returns a Config. Separately the > AbstractHeader also calls `config()` and uses the Config and ConfigDef to > build the ConfigInfos. The contract is not really defined well enough in > the javadoc, but this means a connector can use validate() to build the > ConfigDef which it will return from config(). > > For Transformations we don't really want validate() to perform validation > at all since ultimately the transformations config will be embedded in the > connector's and will be validated by the connector itself. It wouldn't be > harmful if it _did_ perform validation, just unnecessary. But having a > validate() method that ought not to validate() seems like a recipe for > confusion. > > Also problematic is that there's no use for the Config returned from > Transformations.validate(). > > So I'm not convinced that the superficial similarity really gains anything. > > Kind regards, > > Tom > > On Mon, Apr 6, 2020 at 2:29 PM Tom Bentley <tbent...@redhat.com> wrote: > >> Hi, >> >> Hi all, >> >> Thanks for the discussion so far. >> >> It seems a bit weird to me that when configuring the Conditional SMT with >> a DSL you would use a concise, intuitive DSL for expressing the condition, >> but not for the transforms that it's guarding. It also seems natural, if >> you support this for conditionally applying SMTs, that you'd soon want to >> support the same thing for a generic filter transformer. Then, if you can >> configure a filter transformation using this DSL, it becomes odd that you >> can't do this for mapping transformations. I think it would be a mistake to >> go specifying an expression language for conditions when really that might >> just be part of a language for transformations. >> >> I think it would be possible today to write an SMT which allowed you to >> express the transformation in a DSL. Concretely, it's possible to imagine a >> single transformation taking a DSL something like this: >> >> compose( >> Flatten.Key(delimiter: ':'), >> If(condition: TopicMatches(pattern: "blah-.*"), >> then: Flatten.Value(delimiter: '/'))) >> >> All I've really done here, beyond what was already proposed, is rename >> Conditional to If, imagine a couple more bits of syntax >> (SMTs being constructed by class name and named argument invocation >> syntax to support SMT constructors) and add a higher level compose() >> function for chaining SMTs (which could easily be replaced with brace >> delimited blocks). >> >> That may be a discussion we should have, but I think in _this_ KIP we >> should focus on the conditional part, since the above example hopefully >> shows that it would be possible to reuse it in a DSL if there was appetite >> for that. >> >> With that in mind, and since my original suggestion for an abbreviated >> config syntax didn't appeal to people, I propose that we stick with the >> existing norms for configuring this. >> The previous example would look like this: >> >> transformation: flattenKey,if >> transformation.flattenKey.type: Flatten.Key >> transformation.flattenKey.delimiter: : >> transformation.if.type: If >> transformation.if.condition.type: TopicMatches >> transformation.if.condition.pattern: blah-.* >> transformation.if.then: flattenValue >> transformation.if.then.flattenValue.type: Flatten.Value >> transformation.if.then.flattenValue.delimiter: / >> >> Also, I'm inclined to think we should stick to just supporting >> TopicMatches and Not, since we've not identified an actual need for >> 'has-header', 'or' and 'and'. >> >> An example of the usage of Not would look like this: >> >> transformation: flattenKey,if >> transformation.flattenKey.type: Flatten.Key >> transformation.flattenKey.delimiter: : >> transformation.if.type: If >> transformation.if.condition.type: Not >> transformation.if.condition.operand: startsWithBlah >> transformation.if.condition.operand.startsWithBlah.type: TopicMatches >> transformation.if.condition.operand.startsWithBlah.pattern: blah-.* >> transformation.if.then: flattenValue >> transformation.if.then.flattenValue.type: Flatten.Value >> transformation.if.then.flattenValue.delimiter: / >> >> Hopefully we can agree that this allows the conditional SMT part to make >> progress without getting bogged down. >> >> Thoughts? If this seems broadly reasonable, I'll update the KIP. >> >> Kind regards, >> >> Tom >> >> On Fri, Apr 3, 2020 at 5:55 PM Gunnar Morling <gun...@hibernate.org> >> wrote: >> >>> Hi all, >>> >>> Thanks a lot for this initiative, Tom! >>> >>> To shed some light, the use case where this first came up, were issues >>> we saw with SMTs being applied to the different topics produced by the >>> Debezium change data capture connectors. There are different kinds of >>> topics (for change data, schema history, heartbeats etc.) and the >>> record structure to expect may vary between those. Hence we saw issues >>> with SMTs like ExtractField, which for instance only should be applied >>> to all change data topics but not the other ones. >>> >>> I like the overall approach; for Debezium's purposes, the simple topic >>> matching and negation operators would be sufficient already. I agree >>> with Chris and would prefer one single condition attribute, which >>> contains a single condition or potentially a logical expression with >>> not, and, etc. I think it's less ambiguous, in particular when it >>> comes to ordering of the different conditions and determining their >>> precedence. >>> >>> Would love to see this feature in one or another way in Connect. >>> >>> Best, >>> >>> --Gunnar >>> >>> >>> >>> Am Do., 2. Apr. 2020 um 18:48 Uhr schrieb Tom Bentley < >>> tbent...@redhat.com>: >>> > >>> > Hi Chris and Sönke, >>> > >>> > Using the numbering from Chris's email... >>> > >>> > 1. That's a good point, I'll see what is needed to make that work. >>> > >>> > 2. I'm happy enough to add support for "and" and "or" as part of this >>> KIP >>> > if people can see a need for it. >>> > >>> > In a similar vein, I was wondering about whether it would be worthwhile >>> > having the equivalent of an "else" clause (what's in the KIP is >>> basically >>> > an "if" statement). Without support for "else" I think people would >>> often >>> > need two conditionals, with the condition of one being the negation of >>> the >>> > condition of another. >>> > >>> > 3. I can see the attraction of an expression language. The pros include >>> > being terse and familiar to programmers and potentially very flexible >>> if >>> > that's needed in the future. I had a play and implemented it using >>> ANTLR >>> > and it's not difficult to write a grammar and implement the functions >>> we've >>> > already discussed and get decent error messages when the expression is >>> > malformed. So on the one hand I quite like the idea. On the other hand >>> it >>> > feels like overkill for the use cases that have actually been >>> identified so >>> > far. >>> > >>> > @Sönke what do you make of the expression language idea? >>> > >>> > Kind regards, >>> > >>> > Tom >>> > >>> > On Wed, Apr 1, 2020 at 9:49 PM Christopher Egerton < >>> chr...@confluent.io> >>> > wrote: >>> > >>> > > Hi Tom, >>> > > >>> > > This looks great and I'd love to see the out-of-the-box SMTs become >>> even >>> > > more powerful with the improvements you've proposed! Got a few >>> remarks and >>> > > would be interested in your thoughts: >>> > > >>> > > 1. Instead of the new "ConfigDef config(Map<String, String> props)" >>> method, >>> > > what would you think about adopting a similar approach as the >>> framework >>> > > uses with connectors, by adding a "Config validate(Map<String, >>> String> >>> > > props)" method that can perform custom validation outside of what >>> can be >>> > > performed by the ConfigDef's single-property-at-a-time validation? >>> It may >>> > > be a little heavyweight for use with this particular SMT, but it'd >>> provide >>> > > more flexibility for other SMT implementations and would mirror an >>> API that >>> > > developers targeting the framework are likely already familiar with. >>> > > 2. The possibility for adding the logical operators "and" and "or" is >>> > > mentioned, but only as a potential future change and not one >>> proposed by >>> > > this KIP. Why not include those operators sooner rather than later? >>> > > 3. The syntax for named conditions that are then referenced in >>> logical >>> > > operators is... tricky. It took me a few attempts to grok the example >>> > > provided in the KIP after reading Sönke's question about the example >>> for >>> > > negation. What about a more sophisticated but less verbose syntax >>> that >>> > > supports a single configuration for the condition, even with logical >>> > > operators? I'm thinking something like >>> > > "transforms.conditionalExtract.condition: not(has-header:my-header)" >>> > > instead of the "transforms.conditionalExtract.condition: >>> not:hasMyHeader" >>> > > and "transforms.conditionalExtract.condition.hasMyHeader: >>> > > has-header:my-header" properties. If support for a logical "and" is >>> added, >>> > > this could then be expanded to something like >>> > > "transforms.conditionalExtract.condition: and(has-header(my-header), >>> > > not(topic-matches(my-prefix-.*)))". There would be additional >>> complexity >>> > > here with the need to escape parentheses and commas that are >>> intended to be >>> > > treated literally (as part of a header name, for example) instead of >>> as >>> > > part of the syntax for the condition itself, but a little additional >>> > > complexity for edge cases like that may be warranted if it heavily >>> reduces >>> > > complexity for the common cases. The rationale for the proposed >>> > > parentheses-based syntax here instead of what's mentioned in the KIP >>> > > (something like "and: <condition1>, <condition2>") is to help with >>> > > readability; we probably wouldn't need that with the approach of >>> naming >>> > > conditions via separate properties, but things may get a little >>> nasty with >>> > > literal conditions included there, especially with the possibility of >>> > > nesting. Finally, the shift in syntax here should make it easier to >>> handle >>> > > cases like the header value matching brought up by Sönke; it might >>> look >>> > > something like "header-matches(header-name, header-value-pattern)". >>> > > >>> > > Cheers, >>> > > >>> > > Chris >>> > > >>> > > On Wed, Apr 1, 2020 at 9:00 AM Tom Bentley <tbent...@redhat.com> >>> wrote: >>> > > >>> > > > Hi Sönke, >>> > > > >>> > > > Thanks for taking a look. >>> > > > >>> > > > Let me answer in reverse order, since I think it might make more >>> sense >>> > > that >>> > > > way... >>> > > > >>> > > > Also, while writing that, I noticed that you have a version with >>> and >>> > > > > without "name" for your transformation in the KIP: >>> > > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader: >>> > > has-header:my-header >>> > > > > and >>> > > > > transforms.conditionalExtract.condition: has-header:my-header >>> > > > > >>> > > > >>> > > > The example >>> > > > transforms.conditionalExtract.condition: has-header:my-header >>> > > > is a "has-header" condition (the prefix of the config value), >>> which will >>> > > > match records with a "my-header" header (given in the suffix). >>> > > > >>> > > > The other example given is: >>> > > > transforms.conditionalExtract.condition: not:hasMyHeader >>> > > > transforms.conditionalExtract.condition.hasMyHeader: >>> > > > has-header:my-header >>> > > > The root of the condition is a "not" condition (the prefix of the >>> value >>> > > for >>> > > > the transforms.conditionalExtract.condition key) of another named >>> > > condition >>> > > > called "hasMyHeader" (the suffix). Any name could be used for the >>> other >>> > > > condition. That other condition is configured at >>> > > > "transforms.conditionalExtract.condition.<conditionName>". That >>> condition >>> > > > is a "has-header" condition (the prefix), which will match records >>> with a >>> > > > "my-header" header (given in the suffix). So the "has-header:" >>> condition >>> > > > type would always require a suffix, as would the "not:" condition >>> type. >>> > > > Hypothetically you could have a "true" condition type (which would >>> not >>> > > > require a suffix), and the hypothetical binary conditions "and:" >>> and >>> > > "or:" >>> > > > would require a pair of other condition names. >>> > > > >>> > > > So what's proposed is a scheme for encoding conditions where the >>> > > condition >>> > > > type is the prefix of the value of some "....condition" config >>> key, and >>> > > the >>> > > > optional suffix provides parameters for the condition. This makes >>> those >>> > > > parameters a bit inflexible, but is relatively succinct. >>> > > > >>> > > > This leads on to your first point. You're right that use cases >>> might >>> > > appear >>> > > > which need other conditions, and we should make it flexible enough >>> to be >>> > > > able to cope with future use cases. On the other hand, I was >>> concerned >>> > > that >>> > > > we end up with something which is quite complicated to configure. >>> (There >>> > > > comes a point where it might makes more sense for the user to >>> write their >>> > > > own SMT). >>> > > > >>> > > > Just of the top of my head it might look like: >>> > > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader: >>> type:has-header >>> > > > > transforms.conditionalExtract.condition.hasMyHeader: >>> > > > header-name:my-header >>> > > > > transforms.conditionalExtract.condition.hasMyHeader: >>> > > field-value:my-value >>> > > > > >>> > > > >>> > > > That won't work because the format is basically a >>> Properties/Map<String, >>> > > > String> and what you've suggested has duplicate keys. >>> > > > >>> > > > One thing I did briefly consider what the ability to treat >>> conditions as >>> > > > Configurable objects in their own right (opening up the >>> possibility of >>> > > > people supplying their own Conditions, just like they can supply >>> their >>> > > own >>> > > > SMTs). That might be configured something like this: >>> > > > >>> > > > transforms.conditionalExtract.condition: not >>> > > > transforms.conditionalExtract.condition.not.type: Not >>> > > > transforms.conditionalExtract.condition.not.negated: foo >>> > > > transforms.conditionalExtract.condition.foo.type: >>> HasHeaderWithValue >>> > > > transforms.conditionalExtract.condition.foo.header: my-header >>> > > > transforms.conditionalExtract.condition.foo.value: my-value >>> > > > >>> > > > I didn't propose that I couldn't see the use cases to justify this >>> kind >>> > > of >>> > > > complexity, especially as the common case would surely be matching >>> > > against >>> > > > topic name (to be honest I wasn't completely convinced by the need >>> for >>> > > > "has-header"). In the current version of the KIP that's just >>> > > > >>> > > > transforms.conditionalExtract.condition: topic-matches: >>> my-prefix-.* >>> > > > >>> > > > but using the more flexible scheme that would require something >>> more like >>> > > > this: >>> > > > >>> > > > transforms.conditionalExtract.condition: bar >>> > > > transforms.conditionalExtract.condition.bar.type: TopicMatch >>> > > > transforms.conditionalExtract.condition.bar.pattern: >>> my-prefix-.* >>> > > > >>> > > > If people know of use cases which would justify more >>> sophistication, I'm >>> > > > happy to reconsider. >>> > > > >>> > > > Thanks again for taking a look! >>> > > > >>> > > > Tom >>> > > > >>> > > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau >>> > > > <soenke.lie...@opencore.com.invalid> wrote: >>> > > > >>> > > > > Hi Tom, >>> > > > > >>> > > > > sounds useful to me, thanks for the KIP! >>> > > > > The only thought that I had while reading was that this will >>> probably >>> > > > raise >>> > > > > questions about more involved conditions fairly quickly. For >>> example >>> > > the >>> > > > > "has-header" will cause an appetite for conditions like >>> > > > > "this-header-has-that-value". >>> > > > > This would necessitate two parameters to be passed into the >>> condition, >>> > > > > which I think is not currently included in the KIP. I am not >>> saying add >>> > > > > this now, but might it make sense to discuss a concept of how >>> that >>> > > might >>> > > > > look now, to avoid potential changes later on. >>> > > > > >>> > > > > Just of the top of my head it might look like: >>> > > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader: >>> type:has-header >>> > > > > transforms.conditionalExtract.condition.hasMyHeader: >>> > > > header-name:my-header >>> > > > > transforms.conditionalExtract.condition.hasMyHeader: >>> > > field-value:my-value >>> > > > > >>> > > > > >>> > > > > Also, while writing that, I noticed that you have a version with >>> and >>> > > > > without "name" for your transformation in the KIP: >>> > > > > >>> > > > > transforms.conditionalExtract.condition.hasMyHeader: >>> > > has-header:my-header >>> > > > > and >>> > > > > transforms.conditionalExtract.condition: has-header:my-header >>> > > > > >>> > > > > >>> > > > > Is this intentional and the name is optional? >>> > > > > >>> > > > > >>> > > > > Best regards, >>> > > > > >>> > > > > Sönke >>> > > > > >>> > > > > >>> > > > > >>> > > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley <tbent...@redhat.com> >>> wrote: >>> > > > > > >>> > > > > > Does anyone have any comments, feedback or thoughts about this? >>> > > > > > >>> > > > > > Thanks, >>> > > > > > >>> > > > > > Tom >>> > > > > > >>> > > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley < >>> tbent...@redhat.com> >>> > > > > wrote: >>> > > > > > >>> > > > > > > Hi, >>> > > > > > > >>> > > > > > > I've opened KIP-585 which is intended to provide a mechanism >>> to >>> > > > > > > conditionally apply SMTs in Kafka Connect. I'd be grateful >>> for any >>> > > > > > > feedback: >>> > > > > > > >>> > > > > >>> > > > > >>> > > > >>> > > >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT >>> > > > > > > >>> > > > > > > Many thanks, >>> > > > > > > >>> > > > > > > Tom >>> > > > > > > >>> > > > > >>> > > > >>> > > >>> >>>