Hi Pritam, Thanks for the KIP!
I'm a little unsure of the motivation here and would appreciate some more context from your experience. 1. The KIP states "Broker resource utilization is expected to decrease by approximately 20%, primarily due to reduced partition count and metadata overhead." Can you share your cluster/connector topology and testing method that arrived at this statistic? A Connect cluster's internal topics are amortized among all of the connectors within that cluster, and in a typical deployment, these connectors should be handling at least 10-100x the number of data partitions/bytes as are present in internal topics. So I would not expect the overhead for internal topics to persistently consume 20% of a Kafka cluster's resources. 2.The KIP states "Every new cluster requires three new topics, leading to an exponential increase in topic creation." Where are you seeing "exponential" topic creation? It should be linear in the number of Connect clusters, so I'm wondering if this is an unfortunate wording or an exaggeration. 3. The KIP states "Cross-team dependencies slow down provisioning, delaying deployments." In my experience provisioning Kafka topics is generally a lightweight operation, and the Connect cluster does it automatically on your behalf on first startup. If you are in an environment where additional processes are in place that makes topic creation a nuisance, I think that reflects on your environment more than on Connect, and therefore it seems a bit odd for Connect to implement a workaround. I am also interested whether other users experience the same nuisance around topic provisioning, and how generally useful this feature is. I think you should also mention some of the technical tradeoffs of this feature, such as: * Read amplification: Connect workers need to consume irrelevant data from other clusters and discard it after incurring the costs of transferring and deserializing the data, both during startup and ongoing operations. * Security concerns: Sharing credentials for internal topics among multiple services violates the Principle of Least Privilege and makes compromise of one Connect cluster more impactful. * Correlated failures: The unavailability/corruption of one topic now affects multiple Connect clusters instead of just one * Complications with Exactly Once mode: transactional writes to internal topics may cause unavailability in other clusters from hanging transactions. Thanks! Greg On Mon, Apr 28, 2025 at 8:01 AM pritam kumar <kumarpritamm...@gmail.com> wrote: > Hi Kafka Community, > > I'd like to start a discussion on KIP-1173: Connect Storage Topics Sharing > Across Clusters > < > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1173%3A+Connect+Storage+Topics+Sharing+Across+Clusters > > > . > > The primary motivation for writing this KIP and proposing this enhancement > came from the operational overhead associated with the creation of > > *three storage topics every time when spinning up a new Kafka Connect > Cluster. *While each cluster only requires *three topics*, their cumulative > impact grows significantly as more kafka connect clusters are deployed > not only operationally but also but also from the management, monitoring > and cleaning perspective. > > This also makes it very hard to provision the Kafka Connect Clusters on > demand even if operating on the same Kafka Cluster. > > But as these topics have very light traffic and are compacted, instead of > provisioning dedicated topics for every cluster, Kafka Connect > clusters can *share > internal topics* across multiple deployments. This brings *immediate > benefits*: > > - *Drastically Reduces Topic Proliferation* – Eliminates unnecessary > topic creation. > - *Faster Kafka Connect Cluster Deployment* – No waiting for new topic > provisioning. > - *Large Enterprises with Multiple Teams Using Kafka Connect* > - *Scenario:* In large organisations, multiple teams manage > different *Kafka Connect clusters* for various data pipelines. > - *Benefit:* Instead of waiting for new *internal topics* to be > provisioned each time a new cluster is deployed, teams can > *immediately > start* using pre-existing shared topics, reducing lead time and > improving efficiency. > - *Cloud-Native & Kubernetes-Based Deployments* > - *Scenario:* Many organisations deploy Kafka Connect in > *containerised > environments* (e.g., Kubernetes), where clusters are > frequently *scaled > up/down* or *recreated* dynamically. > - *Benefit:* Since internal topics are already available, new > clusters can *spin up instantly*, without waiting for *topic > provisioning* or *Kafka ACL approvals*. > - How this will help different organisations: > - *Lower Operational Load* – Reduces disk-intensive cleanup operations. > - Broker resource utilization is expected to decrease by > approximately 20%, primarily due to reduced partition count and > metadata > overhead. This optimization can enable further cluster downscaling, > contributing directly to lower infrastructure costs (e.g., fewer > brokers, > reduced EBS storage footprint, and lower I/O throughput). > - Administrative overhead and monitoring complexity are projected to > reduce by 30%, due to: > - Fewer topics to configure, monitor, and apply > retention/compaction policies to. > - Reduced rebalancing operations during cluster scale-in or > scale-out events. > - *Simplified Management* – Less overhead in monitoring and > maintaining internal topics. > > More details on this can be found inside this KIP. > > KIP LINK -> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1173%3A+Connect+Storage+Topics+Sharing+Across+Clusters > > Thanks, > Pritam >