[jira] [Comment Edited] (KAFKA-8037) KTable restore may load bad data

Bruno Cadonna (Jira) Thu, 23 Jul 2020 05:07:35 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17163485#comment-17163485
 ]


Bruno Cadonna edited comment on KAFKA-8037 at 7/23/20, 12:04 PM:
-----------------------------------------------------------------

I find the idea of putting records in its original byte representation into 
state stores on source tables in combination with the deserialization of 
records during restoration promising. With this we would have in the state 
store the same data as in the source topic minus the bad data.  

Having this two mechanism in place would allow us to switch on the optimization 
by default without any further restrictions on serdes since only the 
deserializer is used and never the serializer.  


was (Author: cadonna):
I find the idea of putting records in its original byte representation into 
state stores on source tables in combination with the deserialization of 
records during restoration interesting. With this we would have in the state 
store the same data as in the source topic minus the bad data.  

Having this two mechanism in place would allow us to switch on the optimization 
by default without any further restrictions on serdes since only the 
deserializer is used and never the serializer.  

> KTable restore may load bad data
> --------------------------------
>
>                 Key: KAFKA-8037
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8037
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Priority: Minor
>              Labels: pull-request-available
>
> If an input topic contains bad data, users can specify a 
> `deserialization.exception.handler` to drop corrupted records on read. 
> However, this mechanism may be by-passed on restore. Assume a 
> `builder.table()` call reads and drops a corrupted record. If the table state 
> is lost and restored from the changelog topic, the corrupted record may be 
> copied into the store, because on restore plain bytes are copied.
> If the KTable is used in a join, an internal `store.get()` call to lookup the 
> record would fail with a deserialization exception if the value part cannot 
> be deserialized.
> GlobalKTables are affected, too (cf. KAFKA-7663 that may allow a fix for 
> GlobalKTable case). It's unclear to me atm, how this issue could be addressed 
> for KTables though.
> Note, that user state stores are not affected, because they always have a 
> dedicated changelog topic (and don't reuse an input topic) and thus the 
> corrupted record would not be written into the changelog.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (KAFKA-8037) KTable restore may load bad data

Reply via email to