jerry-024 commented on code in PR #7611:
URL: https://github.com/apache/paimon/pull/7611#discussion_r3056688100


##########
paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/globalindex/GenericIndexTopoBuilder.java:
##########
@@ -263,6 +359,15 @@ static List<ShardTask> computeShardTasks(
                     continue;
                 }
 
+                // For incremental builds, advance past already-indexed rows
+                long effectiveStart =
+                        maxIndexedRowId >= 0
+                                ? Math.max(shardStart, maxIndexedRowId + 1)
+                                : shardStart;
+                if (effectiveStart > shardEnd) {

Review Comment:
    - This scenario cannot occur in practice. For unaware-bucket append-only 
tables, row IDs are assigned globally and monotonically — they are contiguous 
both within and across files. Within the same shard, there will never be a gap 
between files' row ID ranges, because:
   
        1. scan() only returns alive (ADD) manifest entries — deleted files are 
excluded
        2. Compaction merges contiguous files and preserves original row ID 
ranges, so no gaps are introduced
        3. New data appends always continue from the next available row ID
   
   - The hypothetical scenario (two disjoint files [0,49] and [100,149] in the 
same shard with maxIndexedRowId=149) would require a gap at [50,99], which 
cannot happen given the row ID allocation invariant. The existing gap-detection 
logic is a defensive safeguard for file-level grouping, not an indication that 
gaps are expected within a shard's row ID space.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to