yuanlihan commented on issue #9712: URL: https://github.com/apache/druid/issues/9712#issuecomment-620335351
@jihoonson > However, the first property `skipSegmentWithSizeBytesGreaterThan` seems possible to introduce a couple of issues. For example, minor compaction couldn't compact segments if their size shows a pattern of (small segment, large segment, small segment, large segment, ...) when large segments are greater than `skipSegmentWithSizeBytesGreaterThan`. Another issue is that the compaction can be used for splitting big segments as well as merging small segments. In these cases, I would like to increase `skipSegmentWithSizeBytesGreaterThan` a bit, perhaps `1.2 * skipSegmentWithSizeBytesGreaterThan ≈ inputSegmentSizeBytes`, and then maybe we can filter the big segment or the pair of segments(a big and a small) for further minor compaction with the specific `partitionSpec` which may be a spec for splitting. But still not sure if it is feasible😂. >Maybe we can add `targetRowsPerSegment`, so that auto compaction can skip if a segment has a similar number of rows to it. Personally, I prefer specifying by size in bytes per segment because it may be more obvious for users. > For the second property, is there a use case where you don't want to compact segments using minor compaction? Or can we always compact if there are 2 or more segments? No special use case here. In my opinion, it's acceptable to omit the occasional small segments mixed in segments with regular size, especially when using minor compaction. This property may make it more flexible for users to configure. And I agree that we should consider the use case of splitting a big segment which may caused by data skew. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
