[
https://issues.apache.org/jira/browse/KAFKA-19970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nandini Singhal updated KAFKA-19970:
------------------------------------
Description:
The remote log index cache (RemoteIndexCache) currently only uses weight-based
eviction. This can cause old, smaller index files to remain cached indefinitely
while newer indices thrash the cache, leading to inefficient cache utilization
and increased remote fetch failures.
(RemoteIndexCache.java:142-144):
{code:java}
return Caffeine.newBuilder()
.maximumWeight(maxSize)
.weigher((Uuid key, Entry entry) -> (int) entry.entrySizeBytes)
.evictionListener(...)
.build();{code}
In environments with:
- Heavy backfill workloads (reading old data once, then moving to newer data)
- Sequential read patterns through tiered storage
- Variable index file sizes
The cache can end up in a state where:
- Old index files from completed backfills remain cached
- Newer index files thrash continuously
- Cache hit rate degrades over time
- Increased remote storage fetch errors due to cache misses
Add time-based eviction using Caffeine's expireAfterAccess to the cache
configuration so that even if an entry remains in a favorable frequency bucket,
it will be evicted after not being accessed for the configured duration.
was:
The remote log index cache (RemoteIndexCache) currently only uses weight-based
eviction. This can cause old, smaller index files to remain cached indefinitely
while newer indices thrash the cache, leading to inefficient cache utilization
and increased remote fetch failures.
(RemoteIndexCache.java:142-144):
{code:java}
return Caffeine.newBuilder()
.maximumWeight(maxSize)
.weigher((Uuid key, Entry entry) -> (int) entry.entrySizeBytes)
.evictionListener(...)
.build();{code}
In environments with:
- Heavy backfill workloads (reading old data once, then moving to newer data)
- Sequential read patterns through tiered storage
- Variable index file sizes
The cache can end up in a state where:
- Old index files from completed backfills remain cached (small, low
frequency)
- Newer index files thrash continuously (larger, similar frequency)
- Cache hit rate degrades over time
- Increased remote storage fetch errors due to cache misses
Add time-based eviction using Caffeine's expireAfterAccess to the cache
configuration so that even if an entry remains in a favorable frequency bucket,
it will be evicted after not being accessed for the configured duration.
> Add time-based eviction to tiered storage index cache to prevent stale
> entries from accumulating
> ------------------------------------------------------------------------------------------------
>
> Key: KAFKA-19970
> URL: https://issues.apache.org/jira/browse/KAFKA-19970
> Project: Kafka
> Issue Type: Improvement
> Components: Tiered-Storage
> Reporter: Nandini Singhal
> Assignee: Nandini Singhal
> Priority: Major
>
> The remote log index cache (RemoteIndexCache) currently only uses
> weight-based eviction. This can cause old, smaller index files to remain
> cached indefinitely while newer indices thrash the cache, leading to
> inefficient cache utilization and increased remote fetch failures.
> (RemoteIndexCache.java:142-144):
> {code:java}
> return Caffeine.newBuilder()
> .maximumWeight(maxSize)
> .weigher((Uuid key, Entry entry) -> (int) entry.entrySizeBytes)
> .evictionListener(...)
> .build();{code}
> In environments with:
> - Heavy backfill workloads (reading old data once, then moving to newer
> data)
> - Sequential read patterns through tiered storage
> - Variable index file sizes
> The cache can end up in a state where:
> - Old index files from completed backfills remain cached
> - Newer index files thrash continuously
> - Cache hit rate degrades over time
> - Increased remote storage fetch errors due to cache misses
>
> Add time-based eviction using Caffeine's expireAfterAccess to the cache
> configuration so that even if an entry remains in a favorable frequency
> bucket, it will be evicted after not being accessed for the configured
> duration.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)