XiaoHongbo-Hope opened a new pull request, #7981:
URL: https://github.com/apache/paimon/pull/7981
### Purpose
Two issues introduced in #7699 (block-level local-cache):
**1. Cache silently bypassed when `data-token.enabled=true` (REST+DLF mode)**
`RESTCatalog.file_io_for_data` returns a raw `RESTTokenFileIO` without going
through `CachingFileIO.wrap_with_caching_if_needed`. User configures
`local-cache.enabled=true` + `local-cache.dir=...`, no warning, but cache
directory stays empty — caching is never applied to *any* read in this mode
(data, manifest, global-index — everything goes through `table.file_io`
which equals the un-wrapped `RESTTokenFileIO`).
This affects every DLF-backed deployment, which is the main production
shape for REST catalog users.
**2. `local-cache.max-size` defaults to unlimited (`2^63-1`)**
When the option is not explicitly set, the fallback is `Long.MAX_VALUE`-
equivalent for both memory and disk cache. Long-running jobs accumulate
manifest / global-index / data blocks until OOM (memory) or disk-full.
Mainstream cache libraries (Caffeine, Guava) require an explicit cap or
default to a safe value; pypaimon should too.
### Fix
- `RESTCatalog.file_io_for_data` now wraps the returned FileIO with
`CachingFileIO.wrap_with_caching_if_needed` regardless of which branch
(fuse / data-token / default) produced it. `wrap_with_caching_if_needed`
is a no-op when cache is disabled or already wrapped, so existing
non-cache flows are unchanged.
- `CachingFileIO.create_cache_manager` falls back to **256 MB for memory**
cache and **10 GB for disk** cache when `local-cache.max-size` is unset.
Explicit user values are honoured unchanged.
### Tests
New repros and pins in `caching_file_io_test.py`:
- `test_file_io_for_data_wraps_cache_when_data_token_enabled` — fails on
current code (`RESTTokenFileIO` returned, not `CachingFileIO`); passes
after the fix.
- `test_default_memory_cache_max_size_capped` — pins 256 MB default.
- `test_default_disk_cache_max_size_capped` — pins 10 GB default.
Existing 31 cache tests + 20 FUSE local-path tests all pass; one mock
helper in `test_fuse_local_path.py` needed `_cache_manager = None` after
the wire change.
### Follow-up
Java side has identical issues at `CachingFileIO.java:97-98` and the
equivalent REST+DLF code path. Should mirror this fix in Java.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]