[ 
https://issues.apache.org/jira/browse/KAFKA-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16787302#comment-16787302
 ] 

ASF GitHub Bot commented on KAFKA-8037:
---------------------------------------

pkleindl commented on pull request #6394: KAFKA-8037: Check deserialization at 
global state store restoration
URL: https://github.com/apache/kafka/pull/6394
 
 
   This change aims to prevent records that cannot be deserialized from going 
into a global state store during restore. Currently such records are filtered 
during normal operations but will be processed during restore and will cause an 
exception when trying to access the value in the store.
   The change copies the logic from the GlobalStateUpdateTask and builds a list 
of deserializers to use during restoration.
   
   The GlobalStateManagerImplTest was extended to cover the case that a 
StreamsException is expected when a record is processed that can't be 
deserialized with the default LogAndFailExceptionHandler.
   GlobalStateManagerImplLogAndContinueTest was added and contains one new test 
which uses the LogAndContinueExceptionHandler and verifies that a record can be 
ignored during restoration.
   Copying all tests is not ideal, but I found no easy way to override the 
DefaultExceptionHandler just for the one case.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> KTable restore may load bad data
> --------------------------------
>
>                 Key: KAFKA-8037
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8037
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Matthias J. Sax
>            Assignee: Patrik Kleindl
>            Priority: Minor
>              Labels: pull-request-available
>
> If an input topic contains bad data, users can specify a 
> `deserialization.exception.handler` to drop corrupted records on read. 
> However, this mechanism may be by-passed on restore. Assume a 
> `builder.table()` call reads and drops a corrupted record. If the table state 
> is lost and restored from the changelog topic, the corrupted record may be 
> copied into the store, because on restore plain bytes are copied.
> If the KTable is used in a join, an internal `store.get()` call to lookup the 
> record would fail with a deserialization exception if the value part cannot 
> be deserialized.
> GlobalKTables are affected, too (cf. KAFKA-7663 that may allow a fix for 
> GlobalKTable case). It's unclear to me atm, how this issue could be addressed 
> for KTables though.
> Note, that user state stores are not affected, because they always have a 
> dedicated changelog topic (and don't reuse an input topic) and thus the 
> corrupted record would not be written into the changelog.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to