JingsongLi opened a new pull request, #8241:
URL: https://github.com/apache/paimon/pull/8241

   ## Summary
   
   This PR parallelizes snapshot expiration planning and file IO work to reduce 
object-store latency impact when expiring many snapshots. It keeps final 
snapshot/changelog deletion ordering where required, while moving expensive 
snapshot reads, manifest reads, tag reads, and file deletion planning onto the 
existing file operation thread pool.
   
   ## Changes
   
   - Collect snapshots concurrently and reuse the collected snapshot list 
during `snapshot.time-retained` checks.
   - Split data/changelog/manifest cleanup into planning and centralized 
deletion phases so work can be planned concurrently and deleted with 
deduplication.
   - Parallelize manifest reads used by file deletion and tag/skipping-set 
construction with bounded manifest read batching.
   - Read tag files concurrently when `TagManager` is created with 
`CoreOptions`.
   - Add `SlowFileIO` test support and a slow object-store expiration test that 
injects 20ms FileIO latency and asserts concurrent IO occurs.
   
   ## Testing
   
   - `git diff --check`
   - `mvn -pl paimon-core -am -Pfast-build -DskipTests compile`
   - `mvn -pl paimon-core -am -Pfast-build -DfailIfNoTests=false 
-Dtest=ExpireSnapshotsTest test`
   - `mvn -pl paimon-core -am -Pfast-build -DfailIfNoTests=false 
-Dtest=FileDeletionTest test`
   - `mvn -pl paimon-core -am -Pfast-build -DfailIfNoTests=false 
-Dtest=TagManagerTest test`
   
   ## Notes
   
   A local temporary benchmark with 20ms artificial FileIO latency and 24 
commits measured `expire.expire()` improving from about 9.66s median to about 
2.27s median in that scenario. The benchmark was only used locally; the 
committed coverage is the deterministic slow FileIO regression test.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to