pchang388 opened a new issue, #12701: URL: https://github.com/apache/druid/issues/12701
Apologies if this breaks any rules, but I tried on the druid forums without much success so trying here to see if I can reach a different audience. Relevant information below and more details in the druid forum post. * Druid Version: 0.22.1 * Kafka Ingestion (idempotent producer) * Overlord type: remote https://www.druidforum.org/t/kafka-ingestion-peon-tasks-success-but-overlord-shows-failure/7374 In general when we run all our tasks, we start seeing issues between Overlord and MM/Peons. Often times, the Peon will show that the task was successful but the overlord believes it failed and tries to shut it down. And things start to get sluggish with the Overlord and it starts taking a while to recognize completed tasks and tasks that are trying to start which seems to be pointing at a communication/coordination failure between Overlord and MM/Peons. We even see TaskAssignment between Overlord and MM timeouts (PT10M - default is PT5M) occur. The only thing that seems to be able to help is reducing the number of tasks we have running concurrently by suspending certain supervisors. Which also indicates an issue with the 3 Druid services handling the load of our current ingestion. But according to system metrics, resource usage is not hitting any limits and it still has more compute it can use. It's odd since we know there are probably a lot of users ingesting more data per hour than us and we don't see this type of issue in their discussions/white papers. Any help will definitely be appreciated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
