[ https://issues.apache.org/jira/browse/KAFKA-17789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Antoine Michaud updated KAFKA-17789: ------------------------------------ Description: In an application with multiple clients, each having multiple threads, when the app is started with an empty storage (without resetting the whole application), only a part of the clients are restoring the changelog topics. Those non-restoring clients are also not able to shutdown gracefully. Reproduction steps > I'm putting all the actual details, while I'm going to make a project to > reproduce it locally, and I'll link it inside this ticket. * Having the app in a kubernetes environment, with multiple pods (5) so finally having 5 streams clients, and also enough data or poor cpu to have long restoration (enough to see the issue after 1 or 2 minutes) * Already consumed input topics and be live (no lag on input or internal topics) * then stop the app * clear out the local storage * finally restart and see that only 2 or 3 clients are restoring, the others consuming nothing * Bonus: stop the clients, then the stuck clients should not close and should continue sending heartbeats and answering any rebalance assignment Related slack discussion: https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1728296887560369 was: In an application with multiple clients, each having multiple threads, when the app is started with an empty storage (without resetting the whole application), only a part of the clients are restoring the changelog topics. Those non-restoring clients are also not able to shutdown gracefully. Reproduction steps > I'm putting all the actual details, while I'm going to make a project to > reproduce it locally, and I'll link it inside this ticket. * Having the app in a kubernetes environment, with multiple pods (5) so finally having 5 streams clients, and also enough data or poor cpu to have long restoration (enough to see the issue after 1 or 2 minutes) * Already consumed input topics and be live (no lag on input or internal topics) * then stop the app * clear out the local storage * finally restart and see that only 2 or 3 clients are restoring, the others consuming nothing * Bonus: stop the clients, then the stuck clients should not close and should continue sending heartbeats and answering any rebalance assignment > State updater stuck when starting with empty state folder > --------------------------------------------------------- > > Key: KAFKA-17789 > URL: https://issues.apache.org/jira/browse/KAFKA-17789 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 3.8.0 > Reporter: Antoine Michaud > Priority: Critical > Fix For: 4.0.0 > > > In an application with multiple clients, each having multiple threads, when > the app is started with an empty storage (without resetting the whole > application), only a part of the clients are restoring the changelog topics. > Those non-restoring clients are also not able to shutdown gracefully. > > Reproduction steps > > I'm putting all the actual details, while I'm going to make a project to > > reproduce it locally, and I'll link it inside this ticket. > * Having the app in a kubernetes environment, with multiple pods (5) so > finally having 5 streams clients, and also enough data or poor cpu to have > long restoration (enough to see the issue after 1 or 2 minutes) > * Already consumed input topics and be live (no lag on input or internal > topics) > * then stop the app > * clear out the local storage > * finally restart and see that only 2 or 3 clients are restoring, the others > consuming nothing > * Bonus: stop the clients, then the stuck clients should not close and > should continue sending heartbeats and answering any rebalance assignment > Related slack discussion: > https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1728296887560369 -- This message was sent by Atlassian Jira (v8.20.10#820010)