didip opened a new issue #11764: URL: https://github.com/apache/druid/issues/11764
### Description First of all, I am aware of indexer's kill setting. We are using it right now to prune everything older than 24 hours. But I am afraid that is not good enough. When using `index_parallel`, if all of your input data are huge, then you need to configure each of your ingestion task with high `maxNumConcurrentSubTasks` (between 100-300). When you have such settings, indexer generates a ton of trash inside `druid_tasks` table: Thousands of `sub_task` with `state = SUCCESS`. And once these things piled up to more than 10000, it breaks Druid. ### Suggestions 1. Druid needs to refactor the `druid_tasks` table to stop dumping blobs into it. That table needs proper columns and indices. 2. The indexer needs to have a setting to specifically clean the `sub_task`. Users legitimately want to keep old records of `index_parallel` for debugging, but the successful `sub_task` should be able to be cleaned aggressively. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
