jerry-024 commented on code in PR #7611:
URL: https://github.com/apache/paimon/pull/7611#discussion_r3056688100
##########
paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/globalindex/GenericIndexTopoBuilder.java:
##########
@@ -263,6 +359,15 @@ static List<ShardTask> computeShardTasks(
continue;
}
+ // For incremental builds, advance past already-indexed rows
+ long effectiveStart =
+ maxIndexedRowId >= 0
+ ? Math.max(shardStart, maxIndexedRowId + 1)
+ : shardStart;
+ if (effectiveStart > shardEnd) {
Review Comment:
- This scenario cannot occur in practice. For unaware-bucket append-only
tables, row IDs are assigned globally and monotonically — they are contiguous
both within and across files. Within the same shard, there will never be a gap
between files' row ID ranges, because:
1. scan() only returns alive (ADD) manifest entries — deleted files are
excluded
2. Compaction merges contiguous files and preserves original row ID
ranges, so no gaps are introduced
3. New data appends always continue from the next available row ID
- The hypothetical scenario (two disjoint files [0,49] and [100,149] in the
same shard with maxIndexedRowId=149) would require a gap at [50,99], which
cannot happen given the row ID allocation invariant. The existing gap-detection
logic is a defensive safeguard for file-level grouping, not an indication that
gaps are expected within a shard's row ID space.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]