tarun11Mavani commented on code in PR #16344:
URL: https://github.com/apache/pinot/pull/16344#discussion_r2285379100


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/converter/stats/RealtimeSegmentStatsContainer.java:
##########
@@ -38,28 +36,59 @@
 public class RealtimeSegmentStatsContainer implements 
SegmentPreIndexStatsContainer {
   private final MutableSegment _mutableSegment;
   private final Map<String, ColumnStatistics> _columnStatisticsMap = new 
HashMap<>();
+  private final int _totalDocCount;
 
-  public RealtimeSegmentStatsContainer(MutableSegment mutableSegment, 
@Nullable int[] sortedDocIds,
+  public RealtimeSegmentStatsContainer(MutableSegment mutableSegment, int[] 
sortedDocIds,
       StatsCollectorConfig statsCollectorConfig) {
+    this(mutableSegment, sortedDocIds, statsCollectorConfig, null);
+  }
+
+  public RealtimeSegmentStatsContainer(MutableSegment mutableSegment, int[] 
sortedDocIds,
+      StatsCollectorConfig statsCollectorConfig, RecordReader recordReader) {
     _mutableSegment = mutableSegment;
 
+    // Determine if we're using compacted reader
+    boolean isUsingCompactedReader = recordReader instanceof 
CompactedPinotSegmentRecordReader;
+
+    // Determine the correct total document count based on whether compaction 
is being used
+    if (isUsingCompactedReader) {
+      // When using CompactedPinotSegmentRecordReader, use the valid document 
count
+      if (mutableSegment.getValidDocIds() != null) {
+        _totalDocCount = 
mutableSegment.getValidDocIds().getMutableRoaringBitmap().getCardinality();
+      } else {
+        _totalDocCount = mutableSegment.getNumDocsIndexed();
+      }
+    } else {
+      // Use the original total document count for non-compacted readers
+      _totalDocCount = mutableSegment.getNumDocsIndexed();
+    }
+
     // Create all column statistics
+    // Determine compaction mode once for all columns
+    boolean useCompactedStatistics = isUsingCompactedReader && 
mutableSegment.getValidDocIds() != null;
+    ThreadSafeMutableRoaringBitmap validDocIds = useCompactedStatistics ? 
mutableSegment.getValidDocIds() : null;
+
     for (String columnName : mutableSegment.getPhysicalColumnNames()) {
       DataSource dataSource = mutableSegment.getDataSource(columnName);
-      if (dataSource instanceof MutableMapDataSource) {
-        ForwardIndexReader reader = dataSource.getForwardIndex();
-        MapColumnPreIndexStatsCollector mapColumnPreIndexStatsCollector =
-            new MapColumnPreIndexStatsCollector(dataSource.getColumnName(), 
statsCollectorConfig);
-        int numDocs = dataSource.getDataSourceMetadata().getNumDocs();
-        ForwardIndexReaderContext readerContext = reader.createContext();
-        for (int row = 0; row < numDocs; row++) {
-          mapColumnPreIndexStatsCollector.collect(reader.getMap(row, 
readerContext));
+
+      if (dataSource.getDictionary() != null) {
+        // Dictionary columns
+        if (useCompactedStatistics) {

Review Comment:
   Wrapping it in another class required adding a new class and then overriding 
each method by calling the method of the actual column statics class inside the 
wrapper class. But I get your point about complicated logic here so I have 
tried to simplify it by moving each type to a separate method. Hope this helps.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to