chorslev opened a new pull request, #12540: URL: https://github.com/apache/kafka/pull/12540
…h standby replicas and default acceptable lag. The attached stream is an extension to the known de-duplication example often use in Kafka. The goal is to assign a unique identifier to each key in the input topic and pass the key and identifier to the output topic. If a key have previously been observed then reuse the ID given last time. The test case contains an input and transform step that are both deterministic and so should the output due to the exactly-once semantics. The test demonstrates that a re-balancing event can break the exactly-once semantics and this results in ids being reused. The issue can be reliably reproduced on an i7-8750H CPU @ 2.20GHz × 12 with 32 GiB Memory when using caching and the default acceptable lag for standby replicas. When caching is disabled OR the acceptable lag is set to 0 then the test no longer breaks. The underlying cause is unknown and so is unknown whether the above settings fix or hides the problem. A similar issue have been reported here: https://stackoverflow.com/questions/69038181/kafka-streams-aggregation-data-loss-between-instance-restarts-and-rebalances Flow of the test case : 1) Produce 3000 messages. 2) Start a stream. 3) Wait for the 3000 messages to be processed. 4) Start a new stream and wait for it to start syncing 5) Produce 60.000 messages 6) Wait for 5 second. 7) Start a new thread which should introduce a re-balancing event. 8) Wait until the entire log is processed by the stream. 9) Check the uniqueness of the assigned IDs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
