Sophie Blee-Goldman created KAFKA-10102:
-------------------------------------------

             Summary: Source node references not updated after rebuilding 
topology
                 Key: KAFKA-10102
                 URL: https://issues.apache.org/jira/browse/KAFKA-10102
             Project: Kafka
          Issue Type: Bug
          Components: streams
            Reporter: Sophie Blee-Goldman
             Fix For: 2.6.0


Luckily this bug was caught by 
RegexSourceIntegrationTest#testRegexRecordsAreProcessedAfterReassignment – we 
saw it fail with an NPE during SourceNode#deserializeKey, implying that the key 
deserializer was null which in turns implies that the source node was never 
initialized.

This can happen when a task is updated with new regex matched topic partitions. 
In order to update the topology with the new input partitions, we actually just 
create an entirely new topology from scratch including building new source node 
objects. We then re/initialize this new topology once the task is resumed.

The problem is that the task's RecordQueues save a reference to the 
corresponding source node, and use this to pass polled records into the 
topology. But the RecordQueues aren't updated with the newly built source nodes 
and still point to the original nodes.

If the task had not completed restoration before being updated with new 
partitions, it would never have initialized the original topology or source 
nodes, resulting in an NPE when the RecordQueue passes a record to the old, 
uninitialized source node.

This is the only specific known bug, but I haven't checked the entire code base 
so it's possible there are other node references saved that might result in 
bugs. We should try and avoid rebuilding an entirely new topology if at all 
possible, and see if we can just update the input partitions only where 
necessary



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to