kroeders commented on issue #9940: URL: https://github.com/apache/druid/issues/9940#issuecomment-653153635
Hi again @a2l007 @jihoonson @FrankChen021 I got access to our staging cluster and put together a [prototype](https://github.com/apache/druid/compare/master...kroeders:feature-parallel-killtask) and did some testing and so I wanted to share what I found. The test cluster is reasonably large with some datasources in the TB range. I did tests up to around a few hundred GB and segments up to 500MB and did tests up to a few thousand segments per datasource. I did scaled down testing with local storage as well. I found that the delete times were very fast and for HDFS not proportional to segment size - which makes sense as an HDFS delete is a namenode operation. I saw an expected speedup running in parallel, but overall the deletes were very fast, around 10ms to delete a segment in our cluster. In cases where there is more of a delay I saw the expected speedup, but I had to put an artificial delay to see an appreciable difference. I'm still investigating with users whether anyone is still experiencing the original issue, but it looks like changes at the HDFS layer have improved the situation dramatically. Regarding the discussion about resource scheduling, I think that delete segments would be a different type than the parallel subtask for indexing. These deep storage tasks are delegating to another service so I think it makes sense to let them run in parallel, even for local storage cases. For the kill task specifically, I also saw the discussion with problems related to a very large number of segments aside from accumulated deep storage delays. I noticed in the parallel indexing tasks, we still end up with task items in the ingestion UI. It would be pretty straightforward to set a threshold/batch size and just reissue independent kill tasks without introducing a new subtask framework. Maybe these are two general "dials" in a general resource management framework, number of segments per task and number of concurrent deep storage operations? Jihoon, regarding your question on druid.coordinator.kill.on - I asked our users and they indicated kill tasks are issued manually primarily for policy reasons. Apparently, under certain situation there are rules regarding the retention of data so there are regular schedules for pulling and deleting data. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
