capistrant commented on PR #18939:
URL: https://github.com/apache/druid/pull/18939#issuecomment-3791352514

   > So, in future, there's no compaction term?
   > 
   > However, I have a different view. IMO, the compaction and re-indexing 
should be separated from each other, the should serve complete different 
purposes
   > 
   > Compaction should only performs the merge of small segments without any 
schema changs(query granularity, segment granularity). The compaction should 
perform eagerly and aggresively especially for kafka ingestion to reduce number 
of segment. There're many problems/limitation around this feature that have not 
solved. for example, the compaction now performs compaction on a whole 
interval, if there're many segments, it takes very long time(and sometimes it's 
not realistic to complete) to finish the job. This kind of compaction was 
originally named as 'Majar compaction', while a 'minor compaction' is there to 
allow us to compact given segments, but it's buggy now, even we give 2 segments 
for example, the task will still fetch all segments in that interval. And 
another problem is that the minor compaction only accepts segments with 
consecutive segments. these problems are states in: #9712 , #9768, #9571 
However, these problems are not solved, and we still experience the large 
number 
 of small segments for long. Apply the re-indexing term for this use case, I 
think the term itself does not reflect its feature but introduces confusion.
   
   I appreciate the thoughts @FrankChen021. In general, this push away from 
using the term compaction for everything that re-processes existing druid 
segments is long needed. But, I do agree that pure compaction like you spec out 
with minor compactions does still warrant being called "compaction". Whether 
that be as a subset of the "reindexing" space or its own separate concept 
entirely, I guess I don't know. Overall, we have lots of robust production 
ready code for "compaction" already that I could not justify re-building 
anything for "reindexing" specifically. That is the genesis of trying to 
generalize the name as I do think it does make more sense to call pure 
compaction, reindexing, than it does to call activity that changes the 
underlying data definition, compaction. I do want to work towards a naming 
scheme and code base that is logical and reasonable though, so I am open to 
considering how we can best navigate to a world where only stuff that is 
legitimately compaction is cal
 led compaction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to