itallam opened a new issue #11784:
URL: https://github.com/apache/druid/issues/11784


   ### Description
   
   We are currently working in adding minor compaction to our systems. We have 
seen with testing minor compaction, a single compaction task is working fine 
for small data sources. However when running minor compaction on one of our 
larger data sources with ~500 segments per interval, the minor compaction task 
is taking several hours to process. We have seen compaction jobs running for 
about 10 hours. This would be much too large to be of value for us. We are 
running a Major Compaction job after about 5 hours. 

For minor compaction to 
be something that will work for us, we will need to reduce the runtime 
drastically. 
   
   To achieve that we are looking to enable parallelism for compaction. For 
this we are planning to implement a parallel compaction task. This task would 
look similar to ‘index_parallel’ as it would run multiple sub-tasks in 
parallel. 
Each of these sub-tasks will be assigned a small sub-set of segments 
in the interval to be compacted. The logic within the sub tasks would be very 
similar to compaction/IndexTask where data is compacted and segments generated 
using OverrideShardSpec. Once the compaction is complete, the original segments 
would then be overwritten by the newly created compacted segments.

Would 
greatly appreciate any input and please let us know if there are any 
suggestions or existing solutions that we may not be aware of.
   ### Motivation
   
   Please provide the following for the desired feature or change:
   - A detailed description of the intended use case, if applicable
   - Rationale for why the desired feature/change would be beneficial
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to