Fetch Request performing remote KIP-405 reads gotcha

Stanislav Kozlovski Wed, 26 Mar 2025 13:27:58 -0700

Hey all,

I was doing a deep dive on the internals of KIP-405's read path and I was
surprised to learn that the broker only fetches remote data for ONE
partition in a given FetchRequest. In other words, if a consumer sends a
FetchRequest requesting 50 topic-partitions, and each partition's requested
offset is not stored locally - the broker will fetch and respond with just
one partition's worth of data from the remote store, and the rest will be
empty.


I found this very unintuitive (shocking, really), given our defaults for
total fetch response is 50 MiB and per partition is 1 MiB. In essence, this
means that a fetch request may be 50x smaller than it ought to be and be
the bottleneck for throughput when performing remote (historical) reads.

I synced very briefly with Satish offline and realized there is a JIRA
tracking this (KAFKA-14915
<https://issues.apache.org/jira/browse/KAFKA-14915> I believe), but I
figured it's better to raise the discussion with the community than
continue async.

I see a few negatives with this behavior. In order of priority:
1. it is unintuitive and not documented
2. it is a potential performance bottleneck
3. it somewhat obsoletes great features like read caching and prefetching
that have been implemented in popular KIP-405 plugins (the Aiven one
supporting all 3 clouds in particular). The goal of these features, as I
understand them, is to increase throughput and reduce latency, but the
plugin may very well NOT be given a chance to serve data from cache since
it'll be called for only one partition per request.

I acknowledge the proper implementation isn't straightforward, so
I understand why a version with this behavior was shipped. I am not sure if
I would have marked the feature GA though.

In any case, I particularly want to begin this discussion by focusing on 1)
- the lack of documentation. (the easiest to fix)

I didn't find this information in KIP-405
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage>
nor in the documentation of the fetch.max.bytes
<https://kafka.apache.org/documentation/#consumerconfigs_fetch.max.bytes>
config.
I couldn't find it through googling. I even asked all popular commercial
LLMs.

How should we best document this behavior? My default was to add it to the
fetch.max.bytes config.

A short note on KIP-405 would be useful too, but that document is too
verbose for instructing users in my opinion. We had Tiered Storage Early
Access Release Notes
<http://splay/KAFKA/Kafka+Tiered+Storage+Early+Access+Release+Notes> (it
wasn't mentioned there either)... maybe we could create a similar one
marking current limitations and link it (as one of the first things) from
the KIP?

Fetch Request performing remote KIP-405 reads gotcha

Reply via email to