wanglihui-git opened a new issue #12385:
URL: https://github.com/apache/druid/issues/12385


   ### Affected Version
   0.18.1——0.22.1
   
   ### Description
   Due to the large amount of data in the production environment, our kafka 
cluster had to use a single-replica topic. When a kafka node goes down, the 
kafka indexing task cannot be started. The normal running Supervisor can still 
run continuously, but after the reset operation, it can't run either.
   
   If this happens in the production environment, and the kafka node is down 
and cannot be recovered in a short time, how can the Druid task increase the 
reliability of it?
   
   The following is a screenshot of my test. The error message is: 'Timeout of 
60000ms expired before the position for partition topic-0 could be 
determined'.After a while, the Supervisors state changed to 
'LOST_CONTACT_WITH_STREAM'.
   
![image](https://user-images.githubusercontent.com/41256589/161210254-6b53cf2e-e43f-4472-9a89-328ce45a1529.png)
   ![Uploading image.png…]()
   
   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to