GWphua commented on PR #19016: URL: https://github.com/apache/druid/pull/19016#issuecomment-4030260601
Thanks @kfaraz! Love the clarification about locks! I know that segment lock is kinda a can of worms, and since we are shifting to using this new lock mechanism which sounds more polished, I'm down to make changes to my current implementation. As for compaction being fast, let me share a bit about the use case: We have not been actively compacting our segments, and are recently trying to apply compaction across datasources that are using Kafka ingestion. These datasources have months worth of data, and have 10k+ segments per segmentGranularity (hour). A single major compaction takes 8~10 hours for a single time chunk. We thought of working with this strategy: Compacting the newer data, and let the older data expire. This strategy does not solve the issue when compaction is slower than ingestion (and also does not work for clusters using loadForever). By implementing Concurrent Minor Compactions, we can parallelize the workload: Having multiple minor compaction helps to parallelize the work, and each minor compaction takes 5min. This reduces what would take an estimated 24 months of compaction time down to 18 days. Providing concurrent minor compaction will help users that want the segments to be compacted urgently. > Supporting multiple concurrent REPLACE tasks on the same interval would have to be a future enhancement The main concern when coming to compaction is whether the speed of compaction is able to keep up with our ingestion load. We have compacted all of our historical data already, and do not really have a high compaction speed requirement anymore. I would love to test the MSQ implementation of minor compaction, and if results are satisfactory then this can take a back-seat. 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
