leaves12138 commented on code in PR #7611:
URL: https://github.com/apache/paimon/pull/7611#discussion_r3052820652
##########
paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/globalindex/GenericIndexTopoBuilder.java:
##########
@@ -263,6 +359,15 @@ static List<ShardTask> computeShardTasks(
continue;
}
+ // For incremental builds, advance past already-indexed rows
+ long effectiveStart =
+ maxIndexedRowId >= 0
+ ? Math.max(shardStart, maxIndexedRowId + 1)
+ : shardStart;
+ if (effectiveStart > shardEnd) {
Review Comment:
When `maxIndexedRowId` falls in the middle of a shard, groups that end
before `effectiveStart` are still materialized as tasks. For example, with
`rowsPerShard = 200`, `maxIndexedRowId = 149`, and two disjoint files `[0,49]`
and `[100,149]`, this loop still creates shard tasks and `createShardTask` ends
up building ranges like `[150,49]`. With assertions enabled that can fail in
`new Range(from, to)`, and with assertions disabled it still runs useless
zero-row tasks. We should skip groups whose max row id is `< effectiveStart`
before calling `createShardTask`, and add a test for the gapped/intra-shard
incremental case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]