JingsongLi commented on code in PR #8186:
URL: https://github.com/apache/paimon/pull/8186#discussion_r3387035775


##########
paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/sink/coordinator/TableWriteCoordinator.java:
##########
@@ -93,6 +100,16 @@ private synchronized void refresh() {
         }
         this.snapshot = latestSnapshot.get();
         this.scan.withSnapshot(snapshot);
+        if (prefetchManifests) {
+            // Eagerly read all data manifests of the current snapshot once to 
warm the
+            // table's SegmentsCache (the byte-level manifest cache attached 
to the table
+            // inside the Job Manager). This reuses the same threaded `plan()` 
read path
+            // that per-task `scan` requests use, so subsequent concurrent 
requests hit
+            // warm bytes instead of each performing a cold manifest read.
+            scan.withPartitionFilter(PartitionPredicate.ALWAYS_TRUE)
+                    .withBucketFilter(Filter.alwaysTrue())

Review Comment:
   This prefetch reuses the mutable coordinator `scan`. A normal request calls 
`scan.withPartitionBucket(partition, bucket)`, which leaves `specifiedBucket` 
set in `AbstractFileStoreScan`/`ManifestsReader`; 
`withBucketFilter(Filter.alwaysTrue())` only adds a permissive filter and does 
not clear that field. After the next checkpoint refresh, this plan can 
therefore skip manifests outside the last requested bucket range instead of 
warming all data manifests. Please use a fresh 
`table.store().newScan().withSnapshot(snapshot)` for the prefetch, or add an 
explicit way to clear the bucket state, and cover the scan-then-checkpoint case 
in the test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to