Hi everyone, I’m opening this thread to help us get on the same page regarding the recent "cloud-native" KIPs.
In advancing *KIP-1267 (Tiered Storage Cost Attribution Metrics)*, I’ve noticed we have several different initiatives—like diskless topics, remote fetching, and better metrics—that are all moving in parallel. They are all trying to solve the same problem: making Kafka cheaper and more elastic in the cloud. However, they are currently disconnected. To ensure we build a cohesive platform rather than just a collection of features, I propose we group these discussions into three main areas: *1. Cost Tracking (The Foundation)* We can't optimize what we can't measure. *KIP-1267* ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-1267%3A+Tiered+Storage+Cost+Attribution+Metrics) gives us the granular metrics we need to actually bill users for storage and API calls. This builds on the operational metrics from *KIP-963* ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-963%3A+Additional+metrics+in+Tiered+Storage). Without this layer, we cannot safely run the multi-tenant models we are designing. *2. The Storage Decision* We need to decide which path to take for storage disaggregation. Do we pursue the evolutionary path of *KIP-1176* ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-1176%3A+Tiered+Storage+for+Active+Log+Segment), which keeps local disks for performance? Or do we go with the revolutionary path of *KIP-1150* ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics), which removes disks entirely? This decision dictates our future infrastructure and shouldn't be made in isolation. *3. Efficiency & Multi-Tenancy* We also have critical work happening on the consumer side with *KIP-1248* ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-1248%3A+Broker+support+for+remote+tiered+storage+fetch+from+consumer) and *KIP-1254* ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-1254%3A+Kafka+Consumer+Support+for+Remote+Tiered+Storage+Fetch <https://www.google.com/search?q=https://cwiki.apache.org/confluence/display/KAFKA/KIP-1254%253A%2BKafka%2BConsumer%2BSupport%2Bfor%2BRemote%2BTiered%2BStorage%2BFetch&authuser=1>). As we look toward *Virtual Clusters (KIP-1134)* ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-1134%3A+Multi-tenancy+in+Kafka%3A+Virtual+Clusters) and *Dynamic Controllers (KIP-853)* ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Controller+Membership+Changes), the need for the rigorous cost tracking I’ve outlined in KIP-1267 becomes even more urgent. I suggest we treat these KIPs as a single "Cloud-Native" capability set. I’d like to discuss how *KIP-1267* can serve as the standard way to track costs for these new architectures. Regards, Viquar khan https://www.linkedin.com/in/vaquar-khan-b695577/
