FrankChen021 commented on PR #18939: URL: https://github.com/apache/druid/pull/18939#issuecomment-3787886505
So, in future, there's no compaction term? However, I have a different view. IMO, the compaction and re-indexing should be separated from each other, the should serve complete different purposes Compaction should only performs the merge of small segments without any schema changs(query granularity, segment granularity). The compaction should perform eagerly and aggresively especially for kafka ingestion to reduce number of segment. There're many problems/limitation around this feature that have not solved. for example, the compaction now performs compaction on a whole interval, if there're many segments, it takes very long time(and sometimes it's not realistic to complete) to finish the job. This kind of compaction was originally named as 'Majar compaction', while a 'minor compaction' is there to allow us to compact given segments, but it's buggy now, even we give 2 segments for example, the task will still fetch all segments in that interval. And another problem is that the minor compaction only accepts segments with consecutive segments. these problems are states in: #9712 , #9768, #9571 However, these problems are not solved, and we still experience the large number of small segments for long. Apply the re-indexing term for this use case, I think the term itself does not reflect its feature but introduces confusion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
