wanglihui-git opened a new issue #12385: URL: https://github.com/apache/druid/issues/12385
### Affected Version 0.18.1——0.22.1 ### Description Due to the large amount of data in the production environment, our kafka cluster had to use a single-replica topic. When a kafka node goes down, the kafka indexing task cannot be started. The normal running Supervisor can still run continuously, but after the reset operation, it can't run either. If this happens in the production environment, and the kafka node is down and cannot be recovered in a short time, how can the Druid task increase the reliability of it? The following is a screenshot of my test. The error message is: 'Timeout of 60000ms expired before the position for partition topic-0 could be determined'.After a while, the Supervisors state changed to 'LOST_CONTACT_WITH_STREAM'.  ![Uploading image.png…]() Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
