kroeders commented on issue #9940:
URL: https://github.com/apache/druid/issues/9940#issuecomment-653153635


   Hi again @a2l007 @jihoonson @FrankChen021 I got access to our staging 
cluster and put together a 
[prototype](https://github.com/apache/druid/compare/master...kroeders:feature-parallel-killtask)
 and did some testing and so I wanted to share what I found.
   
   The test cluster is reasonably large with some datasources in the TB range. 
I did tests up to around a few hundred GB and segments up to 500MB and did 
tests up to a few thousand segments per datasource. I did scaled down testing 
with local storage as well. I found that the delete times were very fast and 
for HDFS not proportional to segment size - which makes sense as an HDFS delete 
is a namenode operation. I saw an expected speedup running in parallel, but 
overall the deletes were very fast, around 10ms to delete a segment in our 
cluster.
   
   In cases where there is more of a delay I saw the expected speedup, but I 
had to put an artificial delay to see an appreciable difference. I'm still 
investigating with users whether anyone is still experiencing the original 
issue, but it looks like changes at the HDFS layer have improved the situation 
dramatically. 
   
   Regarding the discussion about resource scheduling, I think that delete 
segments would be a different type than the parallel subtask for indexing. 
These deep storage tasks are delegating to another service so I think it makes 
sense to let them run in parallel, even for local storage cases. 
   
   For the kill task specifically, I also saw the discussion with problems 
related to a very large number of segments aside from accumulated deep storage 
delays. I noticed in the parallel indexing tasks, we still end up with task 
items in the ingestion UI. It would be pretty straightforward to set a 
threshold/batch size and just reissue independent kill tasks without 
introducing a new subtask framework. Maybe these are two general "dials" in a 
general resource management framework, number of segments per task and number 
of concurrent deep storage operations? 
   
   Jihoon, regarding your question on druid.coordinator.kill.on - I asked our 
users and they indicated kill tasks are issued manually primarily for policy 
reasons. Apparently, under certain situation there are rules regarding the 
retention of data so there are regular schedules for pulling and deleting data. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to