Hi all,

Reviving the cache-policy discussion that Hossein Torabi (@blcksrx) started
in October 2025 [1]. That ML thread received zero replies, and PR #14440
[2] was subsequently auto-closed by the stale bot in March, despite
directional support from @gaborkaszab, @findepi, and @pvary on the PR
itself. I'm starting a fresh thread for broader visibility.

[1] https://lists.apache.org/thread/4hnk0d5bfcw4y5ow5l1n2y4x9m2qgmjh
[2] https://github.com/apache/iceberg/pull/14440

*Problem (recap)*

CachingCatalog uses expireAfterAccess exclusively. In long-running
workloads — most concretely Spark Structured Streaming with a
stream-to-static join against a slowly-changing Iceberg reference table —
every microbatch read resets the TTL, so the cache never observes new
snapshots. The only documented workaround (
spark.sql.catalog.<name>.cache-enabled=false) forces full metadata reload
on every microbatch.

*Bench results*

I cherry-picked PR #14440 onto current main and ran a benchmark to give the
proposal an empirical footing. Full setup, reproduction steps, and results
are in this PR comment: [3]

The headline numbers:

   - The staleness bug is reproducible: with expireAfterAccess(200ms) and
   300ms of continuous reads, the cache returns 5.8M stale results and never
   observes the underlying snapshot update.
   - expireAfterWrite(200ms) under the same load: 800k reads, all correctly
   refreshed after the TTL boundary.
   - Per-call latency: 0.07us (access-only, today's default) vs 906us
   (cache-disabled workaround) vs 0.44us (write-only, this PR). The proposed
   dual policy costs essentially nothing on the hit path.
   - Projected for a 1Hz streaming microbatch over 24h: ~78 sec/day of
   metadata overhead with the workaround vs. ~0.06 sec/day with write-only
   (TTL=1h). On S3 the absolute saving is on the order of hours/day per cached
   table.

[3] https://github.com/apache/iceberg/pull/14440#issuecomment-4561571918

*Open question for the community*

The dual-policy approach in #14440 (both expireAfterAccess and
expireAfterWrite, independently configurable, default = current behavior)
had positive directional feedback from @gaborkaszab, @findepi, and @pvary
on the PR. Is there a competing design we should consider before proceeding
— for example, the pluggable Cache factory suggested in [4]? Speak up now
or it's likely we'll move forward with the dual-policy approach.

I'll coordinate the PR-level next steps (reviving #14440 vs. opening a
successor PR with attribution) with @blcksrx directly on #14440.

[4] https://github.com/apache/iceberg/issues/14417#issuecomment-3451805984

Best,
Noritaka Sekiyama

Reply via email to