Copilot commented on code in PR #18020:
URL: https://github.com/apache/pinot/pull/18020#discussion_r3013582689


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManagerForConsistentDeletes.java:
##########
@@ -564,6 +564,7 @@ protected GenericRow doUpdateRecord(GenericRow record, 
RecordInfo recordInfo) {
               _reusePreviousRow.init(currentSegment, currentDocId);
               _partialUpsertHandler.merge(_reusePreviousRow, record, 
_reuseMergeResultHolder);
               _reuseMergeResultHolder.clear();
+              _reusePreviousRow.clear();

Review Comment:
   Same as in the non-consistent-deletes manager: `_reusePreviousRow.clear()` 
(and `_reuseMergeResultHolder.clear()`) should be in a finally block so the 
reusable `LazyRow` doesn't retain the segment reference/cached values if 
`_partialUpsertHandler.merge(...)` throws.
   ```suggestion
                 try {
                   _partialUpsertHandler.merge(_reusePreviousRow, record, 
_reuseMergeResultHolder);
                 } finally {
                   _reuseMergeResultHolder.clear();
                   _reusePreviousRow.clear();
                 }
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/ConcurrentMapPartitionUpsertMetadataManager.java:
##########
@@ -431,6 +431,7 @@ protected GenericRow doUpdateRecord(GenericRow record, 
RecordInfo recordInfo) {
               _reusePreviousRow.init(currentSegment, currentDocId);
               _partialUpsertHandler.merge(_reusePreviousRow, record, 
_reuseMergeResultHolder);
               _reuseMergeResultHolder.clear();
+              _reusePreviousRow.clear();

Review Comment:
   `_reusePreviousRow.clear()` (and `_reuseMergeResultHolder.clear()`) are only 
executed when `_partialUpsertHandler.merge(...)` returns normally. If merge 
throws, the reusable `LazyRow` can keep holding the previous segment reference 
and cached values, reintroducing the memory retention issue. Wrap the 
merge/clear sequence in a try/finally to guarantee cleanup.
   ```suggestion
                 try {
                   _partialUpsertHandler.merge(_reusePreviousRow, record, 
_reuseMergeResultHolder);
                 } finally {
                   _reuseMergeResultHolder.clear();
                   _reusePreviousRow.clear();
                 }
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java:
##########
@@ -962,24 +966,18 @@ protected void doTakeSnapshot() {
           continue;
         }
         ThreadSafeMutableRoaringBitmap queryableDocIdsBitMap = null;
-        ThreadSafeMutableRoaringBitmap validDocIdsBitMap =
-            immutableSegment.getValidDocIds() == null ? new 
ThreadSafeMutableRoaringBitmap()
-                : immutableSegment.getValidDocIds();
-        
immutableSegment.persistDocIdsSnapshot(V1Constants.VALID_DOC_IDS_SNAPSHOT_FILE_NAME,
-            validDocIdsBitMap);
+        ThreadSafeMutableRoaringBitmap validDocIdsBitMap = 
immutableSegment.getValidDocIds();
+        
immutableSegment.persistDocIdsSnapshot(V1Constants.VALID_DOC_IDS_SNAPSHOT_FILE_NAME,
 validDocIdsBitMap);
         if (_deleteRecordColumn != null) {
-          queryableDocIdsBitMap =
-              immutableSegment.getQueryableDocIds() == null ? new 
ThreadSafeMutableRoaringBitmap()
-                  : immutableSegment.getQueryableDocIds();
+          queryableDocIdsBitMap = immutableSegment.getQueryableDocIds();
           
immutableSegment.persistDocIdsSnapshot(V1Constants.QUERYABLE_DOC_IDS_SNAPSHOT_FILE_NAME,
               queryableDocIdsBitMap);
         }

Review Comment:
   `ImmutableSegmentImpl.getValidDocIds()` / `getQueryableDocIds()` are 
annotated @Nullable, but this snapshot path now persists them unconditionally. 
If either bitmap is null (e.g., segment hasn't had upsert enabled or TTL 
handling left it unset), this will NPE inside `persistDocIdsSnapshot()` and/or 
when calling `getCardinality()`. Consider adding explicit null handling here 
(skip snapshot for that bitmap, or fail fast with a clear message) consistent 
with the null checks added in the `segmentsWithoutSnapshot` loop.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/readers/LazyRow.java:
##########
@@ -46,7 +46,6 @@ public LazyRow() {
   }
 
   public void init(IndexSegment segment, int docId) {

Review Comment:
   `init()` no longer clears the cached field/null-value state. Since 
`getValue()` memoizes per-column results, reusing a single `LazyRow` instance 
across different docIds/segments without calling `clear()` first can return 
stale values from the previous row. This also contradicts the class-level 
Javadoc that says reusing is done by re-initializing via `init(...)`. Consider 
restoring `clear()` at the start of `init(...)` (while still keeping the new 
post-merge `clear()` calls for eager segment reference cleanup), or otherwise 
updating callers + Javadoc so `init(...)` cannot be misused.
   ```suggestion
     public void init(IndexSegment segment, int docId) {
       clear();
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/BasePartitionUpsertMetadataManager.java:
##########
@@ -1004,21 +1002,23 @@ protected void doTakeSnapshot() {
         }
         try {
           ThreadSafeMutableRoaringBitmap segmentQueryableDocIdsBitMap = null;
-          ThreadSafeMutableRoaringBitmap segmentValidDocIdsBitMap =
-              segment.getValidDocIds() == null ? new 
ThreadSafeMutableRoaringBitmap() : segment.getValidDocIds();
-          
segment.persistDocIdsSnapshot(V1Constants.VALID_DOC_IDS_SNAPSHOT_FILE_NAME, 
segmentValidDocIdsBitMap);
+          ThreadSafeMutableRoaringBitmap segmentValidDocIdsBitMap = 
segment.getValidDocIds();
+          // segment that is out of TTL with no valid docId snapshot can be 
skipped while taking snapshot
+          // i.e such segments will have null bitmap
+          if (segmentValidDocIdsBitMap != null) {
+            
segment.persistDocIdsSnapshot(V1Constants.VALID_DOC_IDS_SNAPSHOT_FILE_NAME, 
segmentValidDocIdsBitMap);
+            numPrimaryKeysInSnapshot += 
segmentValidDocIdsBitMap.getCardinality();
+          }
           if (_deleteRecordColumn != null) {
-            segmentQueryableDocIdsBitMap = segment.getQueryableDocIds() == 
null ? new ThreadSafeMutableRoaringBitmap()
-                : segment.getQueryableDocIds();
-            
segment.persistDocIdsSnapshot(V1Constants.QUERYABLE_DOC_IDS_SNAPSHOT_FILE_NAME,
-                segmentQueryableDocIdsBitMap);
+            segmentQueryableDocIdsBitMap = segment.getQueryableDocIds();
+            if (segmentQueryableDocIdsBitMap != null) {
+              
segment.persistDocIdsSnapshot(V1Constants.QUERYABLE_DOC_IDS_SNAPSHOT_FILE_NAME,
+                  segmentQueryableDocIdsBitMap);
+              numQueryableDocIdsInSnapshot += 
segmentQueryableDocIdsBitMap.getCardinality();
+            }
           }
           _updatedSegmentsSinceLastSnapshot.remove(segment);
           numImmutableSegments++;
-          numPrimaryKeysInSnapshot += 
segmentValidDocIdsBitMap.getMutableRoaringBitmap().getCardinality();
-          if (_deleteRecordColumn != null) {
-            numQueryableDocIdsInSnapshot += 
segmentQueryableDocIdsBitMap.getMutableRoaringBitmap().getCardinality();
-          }
         } catch (Exception e) {

Review Comment:
   In the `segmentsWithoutSnapshot` loop, `numImmutableSegments++` and 
`_updatedSegmentsSinceLastSnapshot.remove(segment)` happen even when 
`segmentValidDocIdsBitMap` / `segmentQueryableDocIdsBitMap` are null and no 
snapshot files are written. This can skew snapshot metrics/logging and also 
drops the segment from the "needs snapshot" tracking, preventing a retry later. 
Consider only counting/removing the segment from 
`_updatedSegmentsSinceLastSnapshot` when at least one snapshot file is actually 
persisted (or explicitly persisting an empty snapshot when that’s the desired 
behavior).



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/immutable/ImmutableSegmentImpl.java:
##########
@@ -160,13 +159,14 @@ public void persistDocIdsSnapshot(String fileName, 
ThreadSafeMutableRoaringBitma
         LOGGER.warn("Previous snapshot was not taken cleanly. Remove tmp file: 
{}", tmpFile);
         FileUtils.deleteQuietly(tmpFile);
       }
-      MutableRoaringBitmap docIdsSnapshot = docIds.getMutableRoaringBitmap();
-      try (DataOutputStream dataOutputStream = new DataOutputStream(new 
FileOutputStream(tmpFile))) {
-        docIdsSnapshot.serialize(dataOutputStream);
+      byte[] bytes = docIds.toBytes();
+      int cardinality = docIds.getCardinality();

Review Comment:
   `bytes` and `cardinality` are fetched via two separate synchronized calls on 
`docIds`, so the logged cardinality can drift from the serialized snapshot if 
the bitmap is mutated between calls. Consider capturing both under a single 
lock (e.g., `synchronized (docIds) { ... }`) or adding a single method that 
returns both snapshot bytes and cardinality atomically.
   ```suggestion
         byte[] bytes;
         int cardinality;
         synchronized (docIds) {
           // Capture snapshot bytes and cardinality from the same bitmap state
           bytes = docIds.toBytes();
           cardinality = docIds.getCardinality();
         }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to