maytasm commented on a change in pull request #10371:
URL: https://github.com/apache/druid/pull/10371#discussion_r490103518



##########
File path: 
server/src/main/java/org/apache/druid/server/coordinator/duty/CompactionSegmentIterator.java
##########
@@ -19,22 +19,45 @@
 
 package org.apache.druid.server.coordinator.duty;
 
-import it.unimi.dsi.fastutil.objects.Object2LongOpenHashMap;
+import org.apache.druid.server.coordinator.CompactionStatistics;
 import org.apache.druid.timeline.DataSegment;
 
 import java.util.Iterator;
 import java.util.List;
+import java.util.Map;
 
 /**
  * Segments in the lists which are the elements of this iterator are sorted 
according to the natural segment order
  * (see {@link DataSegment#compareTo}).
  */
 public interface CompactionSegmentIterator extends Iterator<List<DataSegment>>

Review comment:
       Actually, I have another idea which I really like and I think works well 
+ clean + simple (I hope you like it too).
   
   Removed flushAllSegments() and totalRemainingStatistics()  from the 
CompactionSegmentIterator. Keep totalCompactedStatistics() and 
totalSkippedStatistics() in the CompactionSegmentIterator . The stats for 
totalCompactedStatistics() and totalSkippedStatistics()  is maintain and stored 
in the CompactionSegmentIterator  (NewestSegmentFirstIterator). I think this 
makes perfect sense since the already compacted and skipped segments are never 
returned from the iterator’s next() (in fact, it’s never even put into the 
queue that backs the iterator). Hence, as we iterate through the already 
compacted and skipped segments, we should just aggregates the stats within the 
CompactionSegmentIterator . The totalCompactedStatistics() and 
totalSkippedStatistics()  will simply just returns the aggregates up to the 
current point of the iterator being iterated. We will then simply just iterate 
the iterator as before when creating the compaction task. The List<DataSegment> 
that is returned is then 
 aggregates into a map in CompactSegments. Similarly, we will continue to 
iterate the iterator in makeStats after we run out of task slot, then the 
List<DataSegment> that is returned is then aggregates into a different map in 
CompactSegments
   There are Stats map maintained in both CompactSegments  and 
CompactionSegmentIterator . But I think it’s much clearer and we separate the 
“skipped/already compacted” from the “segments returned by the iterator”. Also, 
no complex contracts in the CompactionSegmentIterator . Login in 
CompactionSegmentIterator  for totalCompactedStatistics() and 
totalSkippedStatistics()  is also very simple.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to