kroeders commented on issue #9940:
URL: https://github.com/apache/druid/issues/9940#issuecomment-635367098


   hi @jihoonson thank you for the feedback, I think this is the key design 
issue here. For some context, we run an internal fork of Druid where the users 
are in another team. From the user perspective they're regularly cleaning up 
stale data. At a high level the problem statement is that a logical kill 
operation (e.g. delete all traffic from the last day) leads to thousands of 
segments being deleted in serial which is visibly slow. 
   
   So, we could check the segment count and issue multiple kill tasks 
(specifying a batch size parameter instead of a thread count parameter) or even 
one task per segment. As a general rule I want to use existing concurrency 
features whenever possible, but if I understand Tasks properly they seem pretty 
heavyweight, being launched into other processes and visible in the ingestion 
window. I didn't want to end up in a degenerate case where there was one task 
per segment, so we looked at a more lightweight approach to speeding up this 
task. From a user perspective, parallelizing inside the task keeps the UI 
output in synch with the user input and uses minimal overhead 
   
   I'm new to both Druid and this team, but we have a pretty large existing 
cluster with HDFS deep storage. Delete should just be issuing a request to HDFS 
and a pretty lightweight operation for Druid when configured this way - local 
storage will be a different story. 
   
   I should have access to a dev cluster shortly where I can provide metrics 
and try different approaches. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to