[GitHub] [druid] kroeders commented on issue #9940: Improve Performance With Multithreaded Kill Unused Segments Task

GitBox Thu, 02 Jul 2020 11:16:44 -0700


kroeders commented on issue #9940:
URL: https://github.com/apache/druid/issues/9940#issuecomment-653153635

Hi again @a2l007 @jihoonson @FrankChen021 I got access to our staging
cluster and put together a
[prototype](https://github.com/apache/druid/compare/master...kroeders:feature-parallel-killtask)
and did some testing and so I wanted to share what I found.

The test cluster is reasonably large with some datasources in the TB range.
I did tests up to around a few hundred GB and segments up to 500MB and did
tests up to a few thousand segments per datasource. I did scaled down testing
with local storage as well. I found that the delete times were very fast and
for HDFS not proportional to segment size - which makes sense as an HDFS delete
is a namenode operation. I saw an expected speedup running in parallel, but
overall the deletes were very fast, around 10ms to delete a segment in our
cluster.

In cases where there is more of a delay I saw the expected speedup, but I
had to put an artificial delay to see an appreciable difference. I'm still
investigating with users whether anyone is still experiencing the original
issue, but it looks like changes at the HDFS layer have improved the situation
dramatically.

Regarding the discussion about resource scheduling, I think that delete
segments would be a different type than the parallel subtask for indexing.
These deep storage tasks are delegating to another service so I think it makes
sense to let them run in parallel, even for local storage cases.

For the kill task specifically, I also saw the discussion with problems
related to a very large number of segments aside from accumulated deep storage
delays. I noticed in the parallel indexing tasks, we still end up with task
items in the ingestion UI. It would be pretty straightforward to set a
threshold/batch size and just reissue independent kill tasks without
introducing a new subtask framework. Maybe these are two general "dials" in a
general resource management framework, number of segments per task and number
of concurrent deep storage operations?

Jihoon, regarding your question on druid.coordinator.kill.on - I asked our
users and they indicated kill tasks are issued manually primarily for policy
reasons. Apparently, under certain situation there are rules regarding the
retention of data so there are regular schedules for pulling and deleting data.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] kroeders commented on issue #9940: Improve Performance With Multithreaded Kill Unused Segments Task

Reply via email to