Ben Ellis created KAFKA-13320:
---------------------------------
Summary: Suggestion: SMT support for null key/value should be
documented
Key: KAFKA-13320
URL: https://issues.apache.org/jira/browse/KAFKA-13320
Project: Kafka
Issue Type: Wish
Components: KafkaConnect
Reporter: Ben Ellis
While working with a JDBC Sink Connector, I noticed that some SMT choke on a
tombstone (null value) while others handle tombstones fine.
For example:
```
"transforms": "flattenKey,valueToJSON,wrapValue,addTimestamp",
"transforms.flattenKey.type":
"org.apache.kafka.connect.transforms.Flatten$Key",
"transforms.flattenKey.delimiter": "_", "transforms.valueToJSON.type":
"com.github.jcustenborder.kafka.connect.transform.common.ToJSON$Value",
"transforms.valueToJSON.schemas.enable": "false",
"transforms.valueToJSON.predicate": "tombstone",
"transforms.valueToJSON.negate": true,
"transforms.wrapValue.type":"org.apache.kafka.connect.transforms.HoistField$Value",
"transforms.wrapValue.field":"matrix", "transforms.wrapValue.predicate":
"tombstone", "transforms.wrapValue.negate": true,
"transforms.addTimestamp.type":
"org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.addTimestamp.timestamp.field": "message_timestamp", "predicates":
"tombstone", "predicates.tombstone.type":
"org.apache.kafka.connect.transforms.predicates.RecordIsTombstone"
```
To avoid the cryptic error “java.lang.ClassCastException: class
java.util.HashMap cannot be cast to class org.apache.kafka.connect.data.Struct”
when processing a tombstone record, I had to add a negated predicate of
RecordIsTombstone for ToJSON (community SMT) and HoistField, but did not need
to add that to InsertField.
Digging in the source, I find that InsertField handles the case where key or
value is null:
https://github.com/a0x8o/kafka/blob/f8237749f6ad34c09154f807e53273be64e1261e/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/InsertField.java#L130
^ Thanks to this, there's no need to add a predicate to skip InsertField$Value
when value is null.
It would help if the docs listed how the individual SMTs behave when dealing
with a null key/value.
Of course we can always find this out by trial and error or by studying the
source code.
But if we were to make a best practice of describing how an SMT handles null
key/value, that would have two benefits:
1) Save developers time when working with the official (shipped with Kafka) SMT
2) Inspire developers who write their own SMT to likewise document how they
handle null key/value
Perhaps a standard way of dealing with nulls ("no-op if key/value is null")
could be promoted, and SMT authors would only need to document their behavior
when it differs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)