XiaoHongbo-Hope opened a new pull request, #7981:
URL: https://github.com/apache/paimon/pull/7981

   ### Purpose
   
   Two issues introduced in #7699 (block-level local-cache):
   
   **1. Cache silently bypassed when `data-token.enabled=true` (REST+DLF mode)**
   
   `RESTCatalog.file_io_for_data` returns a raw `RESTTokenFileIO` without going
   through `CachingFileIO.wrap_with_caching_if_needed`. User configures
   `local-cache.enabled=true` + `local-cache.dir=...`, no warning, but cache
   directory stays empty — caching is never applied to *any* read in this mode
   (data, manifest, global-index — everything goes through `table.file_io`
   which equals the un-wrapped `RESTTokenFileIO`).
   
   This affects every DLF-backed deployment, which is the main production
   shape for REST catalog users.
   
   **2. `local-cache.max-size` defaults to unlimited (`2^63-1`)**
   
   When the option is not explicitly set, the fallback is `Long.MAX_VALUE`-
   equivalent for both memory and disk cache. Long-running jobs accumulate
   manifest / global-index / data blocks until OOM (memory) or disk-full.
   
   Mainstream cache libraries (Caffeine, Guava) require an explicit cap or
   default to a safe value; pypaimon should too.
   
   ### Fix
   
   - `RESTCatalog.file_io_for_data` now wraps the returned FileIO with
     `CachingFileIO.wrap_with_caching_if_needed` regardless of which branch
     (fuse / data-token / default) produced it. `wrap_with_caching_if_needed`
     is a no-op when cache is disabled or already wrapped, so existing
     non-cache flows are unchanged.
   - `CachingFileIO.create_cache_manager` falls back to **256 MB for memory**
     cache and **10 GB for disk** cache when `local-cache.max-size` is unset.
     Explicit user values are honoured unchanged.
   
   ### Tests
   
   New repros and pins in `caching_file_io_test.py`:
   
   - `test_file_io_for_data_wraps_cache_when_data_token_enabled` — fails on
     current code (`RESTTokenFileIO` returned, not `CachingFileIO`); passes
     after the fix.
   - `test_default_memory_cache_max_size_capped` — pins 256 MB default.
   - `test_default_disk_cache_max_size_capped` — pins 10 GB default.
   
   Existing 31 cache tests + 20 FUSE local-path tests all pass; one mock
   helper in `test_fuse_local_path.py` needed `_cache_manager = None` after
   the wire change.
   
   ### Follow-up
   
   Java side has identical issues at `CachingFileIO.java:97-98` and the
   equivalent REST+DLF code path. Should mirror this fix in Java.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to