yuanlihan commented on issue #9712:
URL: https://github.com/apache/druid/issues/9712#issuecomment-620335351


   @jihoonson 
   > However, the first property `skipSegmentWithSizeBytesGreaterThan` seems 
possible to introduce a couple of issues. For example, minor compaction 
couldn't compact segments if their size shows a pattern of (small segment, 
large segment, small segment, large segment, ...) when large segments are 
greater than `skipSegmentWithSizeBytesGreaterThan`. Another issue is that the 
compaction can be used for splitting big segments as well as merging small 
segments.
   
   In these cases, I would like to increase 
`skipSegmentWithSizeBytesGreaterThan` a bit, perhaps  `1.2 * 
skipSegmentWithSizeBytesGreaterThan ≈ inputSegmentSizeBytes`, and then maybe we 
can filter the big segment or the pair of segments(a big and a small) for 
further minor compaction with the specific `partitionSpec` which may be a spec 
for splitting. But still not sure if it is feasible😂.
   
   
   >Maybe we can add `targetRowsPerSegment`, so that auto compaction can skip 
if a segment has a similar number of rows to it.
   
   Personally, I prefer specifying by size in bytes per segment because it may 
be more obvious for users.
   
   
   > For the second property, is there a use case where you don't want to 
compact segments using minor compaction? Or can we always compact if there are 
2 or more segments?
   
   No special use case here. In my opinion, it's acceptable to omit the 
occasional small segments mixed in segments with regular size, especially when 
using minor compaction. This property may make it more flexible for users to 
configure. And I agree that we should consider the use case of splitting a big 
segment which may caused by data skew.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to