deepthi912 commented on code in PR #16344: URL: https://github.com/apache/pinot/pull/16344#discussion_r2304966124
########## pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/readers/CompactedPinotSegmentRecordReader.java: ########## @@ -40,60 +42,159 @@ public class CompactedPinotSegmentRecordReader implements RecordReader { private final String _deleteRecordColumn; // Reusable generic row to store the next row to return private final GenericRow _nextRow = new GenericRow(); - // Valid doc ids iterator + + // Iterator approach for valid document IDs private PeekableIntIterator _validDocIdsIterator; + + // Index-based approach for sorted valid document IDs + private int[] _sortedValidDocIds; + private int _currentDocIndex = 0; + // Flag to mark whether we need to fetch another row private boolean _nextRowReturned = true; public CompactedPinotSegmentRecordReader(RoaringBitmap validDocIds) { this(validDocIds, null); } - public CompactedPinotSegmentRecordReader(RoaringBitmap validDocIds, - @Nullable String deleteRecordColumn) { + public CompactedPinotSegmentRecordReader(RoaringBitmap validDocIds, @Nullable String deleteRecordColumn) { _pinotSegmentRecordReader = new PinotSegmentRecordReader(); _validDocIdsBitmap = validDocIds; _validDocIdsIterator = validDocIds.getIntIterator(); _deleteRecordColumn = deleteRecordColumn; } + public CompactedPinotSegmentRecordReader(ThreadSafeMutableRoaringBitmap validDocIds) { + this(validDocIds, null); + } + + public CompactedPinotSegmentRecordReader(ThreadSafeMutableRoaringBitmap validDocIds, Review Comment: QQ, for my understanding... this will compact the current mutable segment during commit correct? As consuming segments keep invalidating the already existing segments, will those segments remain unaffected with this code change or are we trying to modify the existing and updated segments as well? If former, this will only get benifited if the pks are getting invalidated in the same segment correct? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org