JingsongLi commented on code in PR #8186:
URL: https://github.com/apache/paimon/pull/8186#discussion_r3387035775
##########
paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/sink/coordinator/TableWriteCoordinator.java:
##########
@@ -93,6 +100,16 @@ private synchronized void refresh() {
}
this.snapshot = latestSnapshot.get();
this.scan.withSnapshot(snapshot);
+ if (prefetchManifests) {
+ // Eagerly read all data manifests of the current snapshot once to
warm the
+ // table's SegmentsCache (the byte-level manifest cache attached
to the table
+ // inside the Job Manager). This reuses the same threaded `plan()`
read path
+ // that per-task `scan` requests use, so subsequent concurrent
requests hit
+ // warm bytes instead of each performing a cold manifest read.
+ scan.withPartitionFilter(PartitionPredicate.ALWAYS_TRUE)
+ .withBucketFilter(Filter.alwaysTrue())
Review Comment:
This prefetch reuses the mutable coordinator `scan`. A normal request calls
`scan.withPartitionBucket(partition, bucket)`, which leaves `specifiedBucket`
set in `AbstractFileStoreScan`/`ManifestsReader`;
`withBucketFilter(Filter.alwaysTrue())` only adds a permissive filter and does
not clear that field. After the next checkpoint refresh, this plan can
therefore skip manifests outside the last requested bucket range instead of
warming all data manifests. Please use a fresh
`table.store().newScan().withSnapshot(snapshot)` for the prefetch, or add an
explicit way to clear the bucket state, and cover the scan-then-checkpoint case
in the test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]