kroeders commented on issue #9940: URL: https://github.com/apache/druid/issues/9940#issuecomment-635367098
hi @jihoonson thank you for the feedback, I think this is the key design issue here. For some context, we run an internal fork of Druid where the users are in another team. From the user perspective they're regularly cleaning up stale data. At a high level the problem statement is that a logical kill operation (e.g. delete all traffic from the last day) leads to thousands of segments being deleted in serial which is visibly slow. So, we could check the segment count and issue multiple kill tasks (specifying a batch size parameter instead of a thread count parameter) or even one task per segment. As a general rule I want to use existing concurrency features whenever possible, but if I understand Tasks properly they seem pretty heavyweight, being launched into other processes and visible in the ingestion window. I didn't want to end up in a degenerate case where there was one task per segment, so we looked at a more lightweight approach to speeding up this task. From a user perspective, parallelizing inside the task keeps the UI output in synch with the user input and uses minimal overhead I'm new to both Druid and this team, but we have a pretty large existing cluster with HDFS deep storage. Delete should just be issuing a request to HDFS and a pretty lightweight operation for Druid when configured this way - local storage will be a different story. I should have access to a dev cluster shortly where I can provide metrics and try different approaches. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
