blcksrx opened a new pull request, #14440:
URL: https://github.com/apache/iceberg/pull/14440
### Allow dual catalog cache expiration policies (expire-after-access and
expire-after-write)
---
This change enhances Iceberg’s catalog-level cache by allowing
`expire-after-access` and `expire-after-write` policies to be used
concurrently. This provides more flexible and powerful cache-tuning strategies
by enabling both time-since-last-access and time-since-creation eviction
policies on the same cache.
This is achieved by introducing a new property,
`cache.expiration.expire-after-write-interval-ms`, which works in conjunction
with the existing `cache.expiration-interval-ms` (which controls
`expire-after-access`).
This addresses the pain point described in Issue #14417, where long-running
streaming jobs could prevent cache entries from ever expiring under a pure
access-based policy. By combining both policies, users can ensure periodic data
refreshes while still efficiently caching frequently accessed tables.
---
#### Background
In many scenarios, especially with long-running services, you want to
balance performance with data freshness. For example:
- Performance: Caching is essential to avoid the high cost of reloading
table metadata for every query.
- Freshness: Cached entries must be periodically refreshed to pick up new
snapshots or to prevent issues with expired credentials.
Using `expire-after-access` alone is not sufficient for continuous
workloads, as frequently accessed entries may never be evicted. Using`
expire-after-write` alone can be inefficient if it evicts "hot" entries that
are still in active use.
By using both, you can configure a "best-of-both-worlds" strategy:
- A shorter expire-after-access duration to quickly evict inactive tables.
- A longer expire-after-write duration to act as a safety net, ensuring that
even "hot" tables are refreshed periodically (e.g., before underlying
credentials expire).
---
#### What changed in this PR
##### ✅ New catalog property for dual-policy expiration
Two distinct properties now control cache expiration. An entry is evicted if
either condition is met. If both are set to a non-positive value, caching is
disabled.
| Property | Default | Description
|
| --------------------------------- | ------------------ |
------------------------------------------------------ |
| cache.expire-after-write-ms | 0 | Duration in
milliseconds to expire a table from the cache after being created.tables will
not refresh on write. 0 disables this policy. its disabled by default |
##### ✅ CachingCatalog supports dual policies
`CachingCatalog.wrap(...)` has been updated to configure the underlying
Caffeine cache with both expiration policies when both properties are provided.
##### ✅ Tests cover dual-policy scenarios
Tests have been added to validate that when both policies are active, an
entry is evicted based on whichever condition is met first, and that each
policy works correctly on its own.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]