Hi Thomas, Went over the KIP-1254 <https://cwiki.apache.org/confluence/display/KAFKA/KIP-1254%3A+Kafka+Consumer+Support+for+Remote+Tiered+Storage+Fetch> that describes the changes required on the consumer side.
1. To enable this feature, the clients have to reimplement the logic if they are not using a Java client. 2. The client becomes heavy and requires all the remote storage dependencies. 3. May not be fully compatible with all the existing / proposed client APIs. 4. Did you explore having a light-weight broker in the cluster that serves only remote traffic similar to reading from preferred replica? KIP-1255 <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399281539> proposes <https://docs.google.com/presentation/d/10ZZeJ_8RPc-gPXFxQe1KC7VLSTPg0Rb8I4taZr0RktM/edit?slide=id.g282ec88ea22_1_5#slide=id.g282ec88ea22_1_5> the same, it is in draft stage. These brokers may not need much disk / memory, can be kept in the same AZ as consumers and solely serve FETCH requests from remote storage and can be scaled quickly. Thanks, Kamal On Wed, Dec 10, 2025 at 6:04 PM Thomas Thornton via dev < [email protected]> wrote: > HI Stan, > > Thanks for the detailed feedback! We've now published KIP-1254 [1] which is > the consumer-side companion to KIP-1248 and addresses your questions in > detail. > > To highlight a few points: > > On storage format coupling: We've added wording to the Version > Compatibility section [2]. This design intentionally shifts segment parsing > from broker to consumer to reduce broker load. While this couples consumers > to the on-disk format, SupportedStorageFormatVersions ensures graceful > fallback when formats evolve. Consumers remain decoupled from storage > backends (S3/GCS/Azure) via the RemoteStorageFetcher plugin interface. For > proprietary Kafka implementations with different storage formats, this > mechanism allows them to participate - if the client supports their format, > direct fetch works; otherwise it falls back gracefully. > > Optional feature: Yes, opt-in via fetch.remote.enabled=false (default). See > Consumer Configs [3]. > > Forward-compatibility: Covered in Version Compatibility [2]. The client > sends a list of format versions it supports (e.g., ApacheKafkaV1). If a 6.x > broker uses a new format not in the 4.x client's list, the broker falls > back to traditional fetch. > > For the specific consumer-side questions: > > 1. Plugin system: We introduce RemoteStorageFetcher [4], a read-only > interface similar to RemoteStorageManager on the broker side. Plugin > matching is handled implicitly via SupportedStorageFormatVersions - if > format versions don't align, the broker falls back to traditional fetch. > > 2. Cost & performance: The broker provides byte position hints derived from > the OffsetIndex, and consumers request only the needed range via > startPosition/endPosition in RemoteStorageFetcher.fetchLogSegment(). This > enables byte-range GETs for S3-compatible systems. For backends that don't > support range requests, the plugin implementation would handle buffering - > this is implementation-specific and outside the KIP scope. > > 3. Fallbacks: Yes, covered in the Fallback section [5]. The consumer falls > back to broker-mediated fetch on: timeout, connection failure, auth > failure, or if RemoteStorageFetcher is not configured. > > Let us know if you'd like more detail on any of these. > > Thanks, > Tom > > > > [1] > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1254%3A+Kafka+Consumer+Support+for+Remote+Tiered+Storage+Fetch > [2] > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399279678#KIP1254:KafkaConsumerSupportforRemoteTieredStorageFetch-VersionCompatibility > > [3] > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399279678#KIP1254:KafkaConsumerSupportforRemoteTieredStorageFetch-ConsumerConfigs > > [4] > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399279678#KIP1254:KafkaConsumerSupportforRemoteTieredStorageFetch-RemoteStorageFetcher > [5] > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=399279678#KIP1254:KafkaConsumerSupportforRemoteTieredStorageFetch-Fallback > > > On Tue, Dec 2, 2025 at 12:03 PM Stanislav Kozlovski < > [email protected]> wrote: > > > Hey Henry, thanks for the KIP! I'm excited to see this proposal as I've > > heard it be discussed privately before too. > > > > Can we have some wording that talks about the trade-offs of coupling > > clients to the underlying storage format? Today, the underlying segment > > format is decoupled from the clients, since the broker handles conversion > > of log messages to what the protocol expects. I'm sure certain > proprietary > > Kafka implementations use different formats for their underlying storage > - > > it's an interesting question how they would handle this (to be explicit, > > I'm not proposing we should cater our design to those systems though, > > simply calling it out as a potential contention point). > > > > Things I'm thinking about: > > - Would this be a optional feature? > > - How would forward-compatibility look like? > > > > e.g if we ever want to switch the underlying storage format? To > > bullet-proof ourselves, do we want to introduce some version matching > which > > could then help us understand non-compatibility and throw errors? (e.g we > > change storage format in 6.x, and a 4.x client tries to read from a 6.x > > broker/storage-foramt) > > > > Can we also have some wording on how this feature would look like on the > > consumer-side? The proposal right now suggests we handle this in a > > follow-up KIP, which makes sense for the details - but what about a > > high-level overview and motivation? > > > > 1. We would likely need a similar plugin system for Consumers like > brokers > > have for KIP-405. Getting that interface right would be important. > Ensuring > > the plugin configured on the consumer matches the plugin configured on > the > > broker would be useful from a UX point of view too. > > > > 2. From a cost and performance perspective, how do we envision this being > > used/configured on the consumer side? > > > > A single segment could be GBs of size. It's unlikely a consumer would > want > > to download the whole thing at once. > > > > For tiered backends that are S3-compatible cloud object storage systems, > > we could likely use byte-range GETs, thus avoiding reading too much data > > that'll get discarded. Are there concerns with other systems? A few words > > on this topic would help imo. > > > > 3. Should we have fall-backs to the current behavior? > > > > Best, > > Stan > > > > On 2025/12/02 11:04:13 Kamal Chandraprakash wrote: > > > Hi Haiying, > > > > > > Thanks for the KIP! > > > > > > 1. Do you plan to add support for transactional consumers? Currently, > the > > > consumer doesn't return the aborted transaction records to the handler. > > > 2. To access the remote storage directly, the client might need > > additional > > > certificates / keys. How do you plan to expose those configs on the > > client? > > > 3. Will it support the Queues for Kafka feature KIP-932 > > > < > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-932*3A*Queues*for*Kafka__;JSsrKw!!DCbAVzZNrAf4!FnaTZ-RleISfnxHyS-2F1lvhDvglTHhW5Yg-cFch2FgGCd0lw2nUJ3gJtd1AqiwMlghMiwLQ7a6aD9KQlvax_GzYJG2eqtg$ > > >? > > > And so on. > > > > > > -- > > > Kamal > > > > > > On Tue, Dec 2, 2025 at 10:29 AM Haiying Cai via dev < > > [email protected]> > > > wrote: > > > > > > > For some reason, the KIP link was truncated in the original email. > > Here > > > > is the link again: > > > > > > > > KIP: > > > > > > > https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1248*3A*Allow*consumer*to*fetch*from*remote*tiered*storage__;JSsrKysrKysr!!DCbAVzZNrAf4!FnaTZ-RleISfnxHyS-2F1lvhDvglTHhW5Yg-cFch2FgGCd0lw2nUJ3gJtd1AqiwMlghMiwLQ7a6aD9KQlvax_GzYdpe2QXU$ > > > > > > > > Henry Haiying Cai > > > > > > > > On 2025/12/02 04:34:39 Henry Haiying Cai via dev wrote: > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > > > > I would like to start discussion on KIP-1248: Allow consumer to > fetch > > > > from remote tiered storage > > > > > > > > > > > > > > > > > > > > KIP link: KIP-1248: Allow consumer to fetch from remote tiered > > storage - > > > > Apache Kafka - Apache Software Foundation > > > > > > > > > > | > > > > > | > > > > > | | > > > > > KIP-1248: Allow consumer to fetch from remote tiered storage - > Apache > > > > Ka... > > > > > > > > > > > > > > > | > > > > > > > > > > | > > > > > > > > > > | > > > > > > > > > > > > > > > > > > > > > > > > > The KIP proposes to allow consumer clients to fetch from remote > > tiered > > > > storage directly to avoid hitting broker's network capacity and cache > > > > performance. This is very useful to serve large backfill requests > > from a > > > > new or fallen-off consumer. > > > > > > > > > > > > > > > > > > > > > > > > > Any feedback is appreciated. > > > > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > > > > > Henry > > > > > >
