[ https://issues.apache.org/jira/browse/KAFKA-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163021#comment-17163021 ]
Almog Gavra commented on KAFKA-8037: ------------------------------------ There are lots of threads going on in this discussion, but re: whether the optimization should be opt-in or opt-out: [~ableegoldman] [~mjsax] - while I agree with you in theory that these (ones that have side effects and/or are asymmetric) serdes should be discouraged, I don't think that's a realistic possibility. Some of the most popular serdes have these properties: * All confluent schema registry serdes have side effects on serialization * AVRO reader/writer schemas are built to be asymmetric (and that's how they handle schema evolution) * JSON serdes are asymmetric if you allow "additional properties" Moreso, many users might not even know if their serde is symmetric/has side effects and I think that makes it very difficult to require users to opt-out as opposed to allowing them to opt-in. > KTable restore may load bad data > -------------------------------- > > Key: KAFKA-8037 > URL: https://issues.apache.org/jira/browse/KAFKA-8037 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Matthias J. Sax > Priority: Minor > Labels: pull-request-available > > If an input topic contains bad data, users can specify a > `deserialization.exception.handler` to drop corrupted records on read. > However, this mechanism may be by-passed on restore. Assume a > `builder.table()` call reads and drops a corrupted record. If the table state > is lost and restored from the changelog topic, the corrupted record may be > copied into the store, because on restore plain bytes are copied. > If the KTable is used in a join, an internal `store.get()` call to lookup the > record would fail with a deserialization exception if the value part cannot > be deserialized. > GlobalKTables are affected, too (cf. KAFKA-7663 that may allow a fix for > GlobalKTable case). It's unclear to me atm, how this issue could be addressed > for KTables though. > Note, that user state stores are not affected, because they always have a > dedicated changelog topic (and don't reuse an input topic) and thus the > corrupted record would not be written into the changelog. -- This message was sent by Atlassian Jira (v8.3.4#803005)