Hey Henry, thanks for the KIP! I'm excited to see this proposal as I've heard 
it be discussed privately before too.

Can we have some wording that talks about the trade-offs of coupling clients to 
the underlying storage format? Today, the underlying segment format is 
decoupled from the clients, since the broker handles conversion of log messages 
to what the protocol expects. I'm sure certain proprietary Kafka 
implementations use different formats for their underlying storage - it's an 
interesting question how they would handle this (to be explicit, I'm not 
proposing we should cater our design to those systems though, simply calling it 
out as a potential contention point). 

Things I'm thinking about:
- Would this be a optional feature?
- How would forward-compatibility look like?

e.g if we ever want to switch the underlying storage format? To bullet-proof 
ourselves, do we want to introduce some version matching which could then help 
us understand non-compatibility and throw errors? (e.g we change storage format 
in 6.x, and a 4.x client tries to read from a 6.x broker/storage-foramt)

Can we also have some wording on how this feature would look like on the 
consumer-side? The proposal right now suggests we handle this in a follow-up 
KIP, which makes sense for the details - but what about a high-level overview 
and motivation?

1. We would likely need a similar plugin system for Consumers like brokers have 
for KIP-405. Getting that interface right would be important. Ensuring the 
plugin configured on the consumer matches the plugin configured on the broker 
would be useful from a UX point of view too.

2. From a cost and performance perspective, how do we envision this being 
used/configured on the consumer side?

A single segment could be GBs of size. It's unlikely a consumer would want to 
download the whole thing at once.

For tiered backends that are S3-compatible cloud object storage systems, we 
could likely use byte-range GETs, thus avoiding reading too much data that'll 
get discarded. Are there concerns with other systems? A few words on this topic 
would help imo.

3. Should we have fall-backs to the current behavior?

Best,
Stan

On 2025/12/02 11:04:13 Kamal Chandraprakash wrote:
> Hi Haiying,
> 
> Thanks for the KIP!
> 
> 1. Do you plan to add support for transactional consumers? Currently, the
> consumer doesn't return the aborted transaction records to the handler.
> 2. To access the remote storage directly, the client might need additional
> certificates / keys. How do you plan to expose those configs on the client?
> 3. Will it support the Queues for Kafka feature KIP-932
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka>?
> And so on.
> 
> --
> Kamal
> 
> On Tue, Dec 2, 2025 at 10:29 AM Haiying Cai via dev <[email protected]>
> wrote:
> 
> > For some reason, the KIP link was truncated in the original email.  Here
> > is the link again:
> >
> > KIP:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1248%3A+Allow+consumer+to+fetch+from+remote+tiered+storage
> >
> > Henry Haiying Cai
> >
> > On 2025/12/02 04:34:39 Henry Haiying Cai via dev wrote:
> > >
> > > Hi all,
> > >
> > >
> > >
> > >
> > > I would like to start discussion on KIP-1248: Allow consumer to fetch
> > from remote tiered storage
> > >
> > >
> > >
> > > KIP link: KIP-1248: Allow consumer to fetch from remote tiered storage -
> > Apache Kafka - Apache Software Foundation
> > >
> > > |
> > > |
> > > |  |
> > > KIP-1248: Allow consumer to fetch from remote tiered storage - Apache
> > Ka...
> > >
> > >
> > >  |
> > >
> > >  |
> > >
> > >  |
> > >
> > >
> > >
> > >
> > > The KIP proposes to allow consumer clients to fetch from remote tiered
> > storage directly to avoid hitting broker's network capacity and cache
> > performance.  This is very useful to serve large backfill requests from a
> > new or fallen-off consumer.
> > >
> > >
> > >
> > >
> > > Any feedback is appreciated.
> > >
> > >
> > >
> > >
> > > Best regards,
> > >
> > >
> > >
> > > Henry
> 

Reply via email to