Re: [DISCUSS] KIP-1150 Diskless Topics

Colin McCabe Tue, 13 May 2025 14:44:59 -0700

Hi Josep,

Thanks for the KIP.

I think there's a bit of confusion in the motivation and naming here. As Jun 
said, what's being proposed here is not truly "diskless" -- we're still storing 
a fair amount of metadata on local disks.

The proposal talks about "Unification/Relationship with Tiered Storage: 
Identifying a long-term vision for Diskless and Tiered Storage plugins" as 
"future work." But it seems like when we're adding a new feature, we should 
consider how it interacts with existing features before we add it, not after 
it's already in place.

To that end, it's useful to compare this KIP against KIP-1176: Tiered Storage 
for Active Log Segment. In their current forms, both KIP-1176 and KIP-1150 
require small disks on each broker. Traditional Kafka tiered storage 
essentially lets us treat s3 (or other blobstore) as cold storage for older 
data. KIP-1176 is essentially a refinement of that model that allows us to tier 
the active log segments as well.

As it stands currently, the big advantage of KIP-1150 over the traditional 
tiered storage is that with KIP-1150, you don't have to send most of your data 
through normal Kafka replication. This, in turn, is mainly about saving costs 
on clouds where replication is expensive.

When I read KIP-1163, I see the following:

> 1. Producers send Produce requests to any broker.
> 2. The broker accumulates Produce requests in a buffer until exceeding some 
> size or time limit.
> 3. When enough data accumulates or the timeout elapses, the Broker creates a 
> shared log segment and batch
>  coordinates for all of the buffered batches.
> 4. The shared log segment is uploaded to object storage and is written 
> durably.
> 5. The broker commits the batch coordinates with the Batch Coordinator 
> (described in details in KIP-1164).
> 6. The Batch Coordinator assigns offsets to the written batches, persists the 
> batch coordinates, and responds
>  to the Broker.
> 7. The broker sends responses to all Produce requests that are associated 
> with the committed object.

To me this raises a few questions:

A. What kind of latencies should we expect here? It seems like we're both 
buffering lots of produce requests, and waiting until they're written to s3.

B. Could we do something similar with KIP-1176 by not ack'ing the 
ProduceRequest until the tiering had caught up to what we produced? This will 
have higher latency, but maybe not higher than KIP-1150 (see point A). If we 
could do that then maybe the cost advantage of KIP-1150 disappears, since I 
could put all the replicas of my topic in one AZ, and ensure durability by 
waiting for s3.

Another piece of feedback I would give is that I do not think the batch 
coordinator should be pluggable. Since this is a central part of the system, we 
should try to focus our efforts on designing a single good one, rather than 
having lots of pluggable ones. Making this pluggable also will make it 
difficult to evolve the system in the future. We should present a compelling 
use-case for pluaggability before introducing it. (In the case of supporting 
all the different blobstores, the need for pluggability is obvious, of course.)

For "Compatibility, Deprecation, and Migration Plan," we just have some text 
saying that this feature didn't exist before, and now it will. But this isn't 
very helpful. Instead, we should try to spell out what parts of the system will 
come with compatibility guarantees. For example, will the format in which we 
write data to s3 (or other blobstore) be stable and documented, so that 3rd 
party tools can work with it? Or will we keep it internal and unstable?

best,
Colin

On Wed, Apr 16, 2025, at 04:58, Josep Prat wrote:
> Hi Kafka Devs!
>
> We want to start a new KIP discussion about introducing a new type of
> topics that would make use of Object Storage as the primary source of
> storage. However, as this KIP is big we decided to split it into multiple
> related KIPs.
> We have the motivational KIP-1150 (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics)
> that aims to discuss if Apache Kafka should aim to have this type of
> feature at all. This KIP doesn't go onto details on how to implement it.
> This follows the same approach used when we discussed KRaft.
>
> But as we know that it is sometimes really hard to discuss on that meta
> level, we also created several sub-kips (linked in KIP-1150) that offer an
> implementation of this feature.
>
> We kindly ask you to use the proper DISCUSS threads for each type of
> concern and keep this one to discuss whether Apache Kafka wants to have
> this feature or not.
>
> Thanks in advance on behalf of all the authors of this KIP.
>
> ------------------
> Josep Prat
> Open Source Engineering Director, Aiven
> josep.p...@aiven.io   |   +491715557497 | aiven.io
> Aiven Deutschland GmbH
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa, Hannu Valtonen,
> Anna Richardson, Kenneth Chen
> Amtsgericht Charlottenburg, HRB 209739 B

Re: [DISCUSS] KIP-1150 Diskless Topics

Reply via email to