GWphua commented on PR #19016:
URL: https://github.com/apache/druid/pull/19016#issuecomment-4030260601

   Thanks @kfaraz!
   
   Love the clarification about locks! I know that segment lock is kinda a can 
of worms, and since we are shifting to using this new lock mechanism which 
sounds more polished, I'm down to make changes to my current implementation.
   
   As for compaction being fast, let me share a bit about the use case:
   
   We have not been actively compacting our segments, and are recently trying 
to apply compaction across datasources that are using Kafka ingestion.
   
   These datasources have months worth of data, and have 10k+ segments per 
segmentGranularity (hour). A single major compaction takes 8~10 hours for a 
single time chunk.
   
   We thought of working with this strategy: Compacting the newer data, and let 
the older data expire. This strategy does not solve the issue when compaction 
is slower than ingestion (and also does not work for clusters using 
loadForever). By implementing Concurrent Minor Compactions, we can parallelize 
the workload:
   
   Having multiple minor compaction helps to parallelize the work, and each 
minor compaction takes 5min. This reduces what would take an estimated 24 
months of compaction time down to 18 days. Providing concurrent minor 
compaction will help users that want the segments to be compacted urgently. 
   
   > Supporting multiple concurrent REPLACE tasks on the same interval would 
have to be a future enhancement
   
   The main concern when coming to compaction is whether the speed of 
compaction is able to keep up with our ingestion load. We have compacted all of 
our historical data already, and do not really have a high compaction speed 
requirement anymore. I would love to test the MSQ implementation of minor 
compaction, and if results are satisfactory then this can take a back-seat. 👍
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to