maytasm commented on a change in pull request #10371:
URL: https://github.com/apache/druid/pull/10371#discussion_r490103518
##########
File path:
server/src/main/java/org/apache/druid/server/coordinator/duty/CompactionSegmentIterator.java
##########
@@ -19,22 +19,45 @@
package org.apache.druid.server.coordinator.duty;
-import it.unimi.dsi.fastutil.objects.Object2LongOpenHashMap;
+import org.apache.druid.server.coordinator.CompactionStatistics;
import org.apache.druid.timeline.DataSegment;
import java.util.Iterator;
import java.util.List;
+import java.util.Map;
/**
* Segments in the lists which are the elements of this iterator are sorted
according to the natural segment order
* (see {@link DataSegment#compareTo}).
*/
public interface CompactionSegmentIterator extends Iterator<List<DataSegment>>
Review comment:
Actually, I have another idea which I really like and I think works well
+ clean + simple (I hope you like it too).
Removed flushAllSegments() and totalRemainingStatistics() from the
CompactionSegmentIterator. Keep totalCompactedStatistics() and
totalSkippedStatistics() in the CompactionSegmentIterator . The stats for
totalCompactedStatistics() and totalSkippedStatistics() is maintain and stored
in the CompactionSegmentIterator (NewestSegmentFirstIterator). I think this
makes perfect sense since the already compacted and skipped segments are never
returned from the iterator’s next() (in fact, it’s never even put into the
queue that backs the iterator). Hence, as we iterate through the already
compacted and skipped segments, we should just aggregates the stats within the
CompactionSegmentIterator . The totalCompactedStatistics() and
totalSkippedStatistics() will simply just returns the aggregates up to the
current point of the iterator being iterated. We will then simply just iterate
the iterator as before when creating the compaction task. The List<DataSegment>
that is returned is then
aggregates into a map in CompactSegments. Similarly, we will continue to
iterate the iterator in makeStats after we run out of task slot, then the
List<DataSegment> that is returned is then aggregates into a different map in
CompactSegments
There are Stats map maintained in both CompactSegments and
CompactionSegmentIterator . But I think it’s much clearer and we separate the
“skipped/already compacted” from the “segments returned by the iterator”. Also,
no complex contracts in the CompactionSegmentIterator . Login in
CompactionSegmentIterator for totalCompactedStatistics() and
totalSkippedStatistics() is also very simple.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]