Hi Thomas,

Went over the KIP-1254
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-1254%3A+Kafka+Consumer+Support+for+Remote+Tiered+Storage+Fetch>
that
describes the changes required on the consumer side.

1. To enable this feature, the clients have to reimplement the logic if
they are not using a Java client.
2. The client becomes heavy and requires all the remote storage
dependencies.
3. May not be fully compatible with all the existing / proposed client
APIs.
4. Did you explore having a light-weight broker in the cluster that serves
only remote traffic similar to reading from
preferred replica? KIP-1255
<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399281539>
 proposes
<https://docs.google.com/presentation/d/10ZZeJ_8RPc-gPXFxQe1KC7VLSTPg0Rb8I4taZr0RktM/edit?slide=id.g282ec88ea22_1_5#slide=id.g282ec88ea22_1_5>
the same, it is in draft stage. These brokers may not need much disk /
memory,
can be kept in the same AZ as consumers and solely serve FETCH requests
from remote storage and can be scaled quickly.

Thanks,
Kamal



On Wed, Dec 10, 2025 at 6:04 PM Thomas Thornton via dev <
[email protected]> wrote:

> HI Stan,
>
> Thanks for the detailed feedback! We've now published KIP-1254 [1] which is
> the consumer-side companion to KIP-1248 and addresses your questions in
> detail.
>
> To highlight a few points:
>
> On storage format coupling: We've added wording to the Version
> Compatibility section [2]. This design intentionally shifts segment parsing
> from broker to consumer to reduce broker load. While this couples consumers
> to the on-disk format, SupportedStorageFormatVersions ensures graceful
> fallback when formats evolve. Consumers remain decoupled from storage
> backends (S3/GCS/Azure) via the RemoteStorageFetcher plugin interface. For
> proprietary Kafka implementations with different storage formats, this
> mechanism allows them to participate - if the client supports their format,
> direct fetch works; otherwise it falls back gracefully.
>
> Optional feature: Yes, opt-in via fetch.remote.enabled=false (default). See
> Consumer Configs [3].
>
> Forward-compatibility: Covered in Version Compatibility [2]. The client
> sends a list of format versions it supports (e.g., ApacheKafkaV1). If a 6.x
> broker uses a new format not in the 4.x client's list, the broker falls
> back to traditional fetch.
>
> For the specific consumer-side questions:
>
> 1. Plugin system: We introduce RemoteStorageFetcher [4], a read-only
> interface similar to RemoteStorageManager on the broker side. Plugin
> matching is handled implicitly via SupportedStorageFormatVersions - if
> format versions don't align, the broker falls back to traditional fetch.
>
> 2. Cost & performance: The broker provides byte position hints derived from
> the OffsetIndex, and consumers request only the needed range via
> startPosition/endPosition in RemoteStorageFetcher.fetchLogSegment(). This
> enables byte-range GETs for S3-compatible systems. For backends that don't
> support range requests, the plugin implementation would handle buffering -
> this is implementation-specific and outside the KIP scope.
>
> 3. Fallbacks: Yes, covered in the Fallback section [5]. The consumer falls
> back to broker-mediated fetch on: timeout, connection failure, auth
> failure, or if RemoteStorageFetcher is not configured.
>
> Let us know if you'd like more detail on any of these.
>
> Thanks,
> Tom
>
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1254%3A+Kafka+Consumer+Support+for+Remote+Tiered+Storage+Fetch
> [2]
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399279678#KIP1254:KafkaConsumerSupportforRemoteTieredStorageFetch-VersionCompatibility
>
> [3]
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399279678#KIP1254:KafkaConsumerSupportforRemoteTieredStorageFetch-ConsumerConfigs
>
> [4]
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399279678#KIP1254:KafkaConsumerSupportforRemoteTieredStorageFetch-RemoteStorageFetcher
> [5]
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399279678#KIP1254:KafkaConsumerSupportforRemoteTieredStorageFetch-Fallback
>
>
> On Tue, Dec 2, 2025 at 12:03 PM Stanislav Kozlovski <
> [email protected]> wrote:
>
> > Hey Henry, thanks for the KIP! I'm excited to see this proposal as I've
> > heard it be discussed privately before too.
> >
> > Can we have some wording that talks about the trade-offs of coupling
> > clients to the underlying storage format? Today, the underlying segment
> > format is decoupled from the clients, since the broker handles conversion
> > of log messages to what the protocol expects. I'm sure certain
> proprietary
> > Kafka implementations use different formats for their underlying storage
> -
> > it's an interesting question how they would handle this (to be explicit,
> > I'm not proposing we should cater our design to those systems though,
> > simply calling it out as a potential contention point).
> >
> > Things I'm thinking about:
> > - Would this be a optional feature?
> > - How would forward-compatibility look like?
> >
> > e.g if we ever want to switch the underlying storage format? To
> > bullet-proof ourselves, do we want to introduce some version matching
> which
> > could then help us understand non-compatibility and throw errors? (e.g we
> > change storage format in 6.x, and a 4.x client tries to read from a 6.x
> > broker/storage-foramt)
> >
> > Can we also have some wording on how this feature would look like on the
> > consumer-side? The proposal right now suggests we handle this in a
> > follow-up KIP, which makes sense for the details - but what about a
> > high-level overview and motivation?
> >
> > 1. We would likely need a similar plugin system for Consumers like
> brokers
> > have for KIP-405. Getting that interface right would be important.
> Ensuring
> > the plugin configured on the consumer matches the plugin configured on
> the
> > broker would be useful from a UX point of view too.
> >
> > 2. From a cost and performance perspective, how do we envision this being
> > used/configured on the consumer side?
> >
> > A single segment could be GBs of size. It's unlikely a consumer would
> want
> > to download the whole thing at once.
> >
> > For tiered backends that are S3-compatible cloud object storage systems,
> > we could likely use byte-range GETs, thus avoiding reading too much data
> > that'll get discarded. Are there concerns with other systems? A few words
> > on this topic would help imo.
> >
> > 3. Should we have fall-backs to the current behavior?
> >
> > Best,
> > Stan
> >
> > On 2025/12/02 11:04:13 Kamal Chandraprakash wrote:
> > > Hi Haiying,
> > >
> > > Thanks for the KIP!
> > >
> > > 1. Do you plan to add support for transactional consumers? Currently,
> the
> > > consumer doesn't return the aborted transaction records to the handler.
> > > 2. To access the remote storage directly, the client might need
> > additional
> > > certificates / keys. How do you plan to expose those configs on the
> > client?
> > > 3. Will it support the Queues for Kafka feature KIP-932
> > > <
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-932*3A*Queues*for*Kafka__;JSsrKw!!DCbAVzZNrAf4!FnaTZ-RleISfnxHyS-2F1lvhDvglTHhW5Yg-cFch2FgGCd0lw2nUJ3gJtd1AqiwMlghMiwLQ7a6aD9KQlvax_GzYJG2eqtg$
> > >?
> > > And so on.
> > >
> > > --
> > > Kamal
> > >
> > > On Tue, Dec 2, 2025 at 10:29 AM Haiying Cai via dev <
> > [email protected]>
> > > wrote:
> > >
> > > > For some reason, the KIP link was truncated in the original email.
> > Here
> > > > is the link again:
> > > >
> > > > KIP:
> > > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1248*3A*Allow*consumer*to*fetch*from*remote*tiered*storage__;JSsrKysrKysr!!DCbAVzZNrAf4!FnaTZ-RleISfnxHyS-2F1lvhDvglTHhW5Yg-cFch2FgGCd0lw2nUJ3gJtd1AqiwMlghMiwLQ7a6aD9KQlvax_GzYdpe2QXU$
> > > >
> > > > Henry Haiying Cai
> > > >
> > > > On 2025/12/02 04:34:39 Henry Haiying Cai via dev wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > I would like to start discussion on KIP-1248: Allow consumer to
> fetch
> > > > from remote tiered storage
> > > > >
> > > > >
> > > > >
> > > > > KIP link: KIP-1248: Allow consumer to fetch from remote tiered
> > storage -
> > > > Apache Kafka - Apache Software Foundation
> > > > >
> > > > > |
> > > > > |
> > > > > |  |
> > > > > KIP-1248: Allow consumer to fetch from remote tiered storage -
> Apache
> > > > Ka...
> > > > >
> > > > >
> > > > >  |
> > > > >
> > > > >  |
> > > > >
> > > > >  |
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > The KIP proposes to allow consumer clients to fetch from remote
> > tiered
> > > > storage directly to avoid hitting broker's network capacity and cache
> > > > performance.  This is very useful to serve large backfill requests
> > from a
> > > > new or fallen-off consumer.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Any feedback is appreciated.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Best regards,
> > > > >
> > > > >
> > > > >
> > > > > Henry
> > >
> >
>

Reply via email to