mikemccand commented on a change in pull request #266:
URL: https://github.com/apache/lucene/pull/266#discussion_r697495033



##########
File path: lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java
##########
@@ -495,36 +495,36 @@ private MergeSpecification doFindMerges(
       for (int startIdx = 0; startIdx < sortedEligible.size(); startIdx++) {
 
         long totAfterMergeBytes = 0;
-
         final List<SegmentCommitInfo> candidate = new ArrayList<>();
         boolean hitTooLarge = false;
-        long bytesThisMerge = 0;
         for (int idx = startIdx;
             idx < sortedEligible.size()
-                && candidate.size() < mergeFactor
-                && bytesThisMerge < maxMergedSegmentBytes;
+                && candidate.size() < maxMergeAtOnce
+                // We allow merging more that mergeFactor segments together if 
the merged segment

Review comment:
       s/`that`/`than`

##########
File path: lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java
##########
@@ -495,36 +495,36 @@ private MergeSpecification doFindMerges(
       for (int startIdx = 0; startIdx < sortedEligible.size(); startIdx++) {
 
         long totAfterMergeBytes = 0;
-
         final List<SegmentCommitInfo> candidate = new ArrayList<>();
         boolean hitTooLarge = false;
-        long bytesThisMerge = 0;
         for (int idx = startIdx;
             idx < sortedEligible.size()
-                && candidate.size() < mergeFactor
-                && bytesThisMerge < maxMergedSegmentBytes;
+                && candidate.size() < maxMergeAtOnce
+                // We allow merging more that mergeFactor segments together if 
the merged segment
+                // would be less than the floor segment size. This is 
important because segments
+                // below the floor segment size are more aggressively merged 
by this policy, so we
+                // need to grow them as quickly as possible.
+                && (candidate.size() < mergeFactor || totAfterMergeBytes < 
floorSegmentBytes)
+                && totAfterMergeBytes < maxMergedSegmentBytes;
             idx++) {
           final SegmentSizeAndDocs segSizeDocs = sortedEligible.get(idx);
           final long segBytes = segSizeDocs.sizeInBytes;
 
           if (totAfterMergeBytes + segBytes > maxMergedSegmentBytes) {
             hitTooLarge = true;
-            if (candidate.size() == 0) {
-              // We should never have something coming in that _cannot_ be 
merged, so handle
-              // singleton merges
-              candidate.add(segSizeDocs.segInfo);
-              bytesThisMerge += segBytes;
+            // We should never have something coming in that _cannot_ be 
merged, so handle
+            // singleton merges
+            if (candidate.size() > 0) {
+              // NOTE: we continue, so that we can try
+              // "packing" smaller segments into this merge
+              // to see if we can get closer to the max
+              // size; this in general is not perfect since
+              // this is really "bin packing" and we'd have
+              // to try different permutations.
+              continue;
             }
-            // NOTE: we continue, so that we can try
-            // "packing" smaller segments into this merge
-            // to see if we can get closer to the max
-            // size; this in general is not perfect since
-            // this is really "bin packing" and we'd have
-            // to try different permutations.
-            continue;
           }
           candidate.add(segSizeDocs.segInfo);
-          bytesThisMerge += segBytes;

Review comment:
       This (`bytesThisMerge`) was just a dup of `totAfterMergeBytes`?
   
   Edit: oh, I see!  Once you inverted the `if (candidate.size() == 0) ...` 
then `bytesThisMerge` became a dup of `totAfterMergeBytes`?

##########
File path: lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java
##########
@@ -81,8 +81,8 @@
   public static final double DEFAULT_NO_CFS_RATIO = 0.1;
 
   // User-specified maxMergeAtOnce. In practice we always take the min of its
-  // value and segsPerTier to avoid suboptimal merging.
-  private int maxMergeAtOnce = 10;
+  // value and segsPerTier for segments above the floor size to avoid 
suboptimal merging.
+  private int maxMergeAtOnce = 30;

Review comment:
       OK even though we increased `maxMergeAtOnce` to `30`, which means if the 
segments are so tiny that merging them is still under the floor we will allow 
merging up to `30` of such segments, the effective `mergeFactor` for normal 
(big, above floor) merges is still `10` since we take min of `maxMergeAtOnce` 
and `segmentsPerTier` for `mergeFactor`.  Phew, complicated!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to