maytasm commented on a change in pull request #10371:
URL: https://github.com/apache/druid/pull/10371#discussion_r490103518



##########
File path: 
server/src/main/java/org/apache/druid/server/coordinator/duty/CompactionSegmentIterator.java
##########
@@ -19,22 +19,45 @@
 
 package org.apache.druid.server.coordinator.duty;
 
-import it.unimi.dsi.fastutil.objects.Object2LongOpenHashMap;
+import org.apache.druid.server.coordinator.CompactionStatistics;
 import org.apache.druid.timeline.DataSegment;
 
 import java.util.Iterator;
 import java.util.List;
+import java.util.Map;
 
 /**
  * Segments in the lists which are the elements of this iterator are sorted 
according to the natural segment order
  * (see {@link DataSegment#compareTo}).
  */
 public interface CompactionSegmentIterator extends Iterator<List<DataSegment>>

Review comment:
       Actually, I have another idea which I really like and I think works well 
+ clean + simple (I hope you like it too).
   
   Removed `flushAllSegments()` and `totalRemainingStatistics()`  from the 
`CompactionSegmentIterator`. Keep `totalCompactedStatistics()` and 
`totalSkippedStatistics()` in the `CompactionSegmentIterator`. The stats for 
`totalCompactedStatistics()` and `totalSkippedStatistics()`  is maintain and 
stored in the `CompactionSegmentIterator` (`NewestSegmentFirstIterator`). I 
think this makes perfect sense since the "already compacted" and "skipped 
segments" are never returned from the iterator’s `next()` (in fact, it’s not 
even put into the queue that backs the iterator). Hence, as we iterate through 
the already compacted and skipped segments, we should just aggregates the stats 
within the `CompactionSegmentIterator` and keep it within the 
`CompactionSegmentIterator`. The `totalCompactedStatistics()` and 
`totalSkippedStatistics()`  will simply just returns the aggregates up to the 
current point of the iterator being iterated. We will then simply just iterate 
the iterator as before when
  creating the compaction task and the List<DataSegment> that is returned by 
the iterator is then aggregates into the `AutoCompactionSnapshot.Builder`'s 
Compacted stats (this is in `CompactSegments`). Similarly, we will continue to 
iterate the iterator in `makeStats()` (after we run out of task slot), then the 
List<DataSegment> that is returned by the iterator is then aggregates into 
`AutoCompactionSnapshot.Builder`'s Remaining stats (this is in 
`CompactSegments`).
   
   There are Stats maintained in both `CompactSegments`  and 
`CompactionSegmentIterator` . But I think it’s much clearer since we separate 
the “skipped/already compacted” from the “segments returned by the iterator”. 
Also, no complex contracts in the `CompactionSegmentIterator` anymore . Logic 
in `CompactionSegmentIterator`  for `totalCompactedStatistics()` and 
`totalSkippedStatistics()`  is also very simple.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to