Hi everyone,

I’m opening this thread to help us get on the same page regarding the
recent "cloud-native" KIPs.

In advancing *KIP-1267 (Tiered Storage Cost Attribution Metrics)*, I’ve
noticed we have several different initiatives—like diskless topics, remote
fetching, and better metrics—that are all moving in parallel. They are all
trying to solve the same problem: making Kafka cheaper and more elastic in
the cloud. However, they are currently disconnected.

To ensure we build a cohesive platform rather than just a collection of
features, I propose we group these discussions into three main areas:

*1. Cost Tracking (The Foundation)* We can't optimize what we can't
measure. *KIP-1267* (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1267%3A+Tiered+Storage+Cost+Attribution+Metrics)
gives us the granular metrics we need to actually bill users for storage
and API calls. This builds on the operational metrics from *KIP-963* (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-963%3A+Additional+metrics+in+Tiered+Storage).
Without this layer, we cannot safely run the multi-tenant models we are
designing.

*2. The Storage Decision* We need to decide which path to take for storage
disaggregation. Do we pursue the evolutionary path of *KIP-1176* (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1176%3A+Tiered+Storage+for+Active+Log+Segment),
which keeps local disks for performance? Or do we go with the revolutionary
path of *KIP-1150* (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics),
which removes disks entirely? This decision dictates our future
infrastructure and shouldn't be made in isolation.

*3. Efficiency & Multi-Tenancy* We also have critical work happening on the
consumer side with *KIP-1248* (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1248%3A+Broker+support+for+remote+tiered+storage+fetch+from+consumer)
and *KIP-1254* (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1254%3A+Kafka+Consumer+Support+for+Remote+Tiered+Storage+Fetch
<https://www.google.com/search?q=https://cwiki.apache.org/confluence/display/KAFKA/KIP-1254%253A%2BKafka%2BConsumer%2BSupport%2Bfor%2BRemote%2BTiered%2BStorage%2BFetch&authuser=1>).
As we look toward *Virtual Clusters (KIP-1134)* (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1134%3A+Multi-tenancy+in+Kafka%3A+Virtual+Clusters)
and *Dynamic Controllers (KIP-853)* (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Controller+Membership+Changes),
the need for the rigorous cost tracking I’ve outlined in KIP-1267 becomes
even more urgent.

I suggest we treat these KIPs as a single "Cloud-Native" capability set.
I’d like to discuss how *KIP-1267* can serve as the standard way to track
costs for these new architectures.

Regards,
Viquar khan
https://www.linkedin.com/in/vaquar-khan-b695577/

Reply via email to