didip opened a new issue #11764:
URL: https://github.com/apache/druid/issues/11764


   ### Description
   
   First of all, I am aware of indexer's kill setting. We are using it right 
now to prune everything older than 24 hours.
   
   But I am afraid that is not good enough. When using `index_parallel`, if all 
of your input data are huge, then you need to configure each of your ingestion 
task with high `maxNumConcurrentSubTasks` (between 100-300).
   
   When you have such settings, indexer generates a ton of trash inside 
`druid_tasks` table: Thousands of `sub_task` with `state = SUCCESS`. And once 
these things piled up to more than 10000, it breaks Druid.
   
   
   ### Suggestions
   
   1. Druid needs to refactor the `druid_tasks` table to stop dumping blobs 
into it. That table needs proper columns and indices.
   
   2. The indexer needs to have a setting to specifically clean the `sub_task`. 
Users legitimately want to keep old records of `index_parallel` for debugging, 
but the successful `sub_task` should be able to be cleaned aggressively.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to