Hi Pritam,

Thanks for the KIP!

I'm a little unsure of the motivation here and would appreciate some more
context from your experience.

1. The KIP states "Broker resource utilization is expected to decrease by
approximately 20%, primarily due to reduced partition count and metadata
overhead." Can you share your cluster/connector topology and testing method
that arrived at this statistic?
A Connect cluster's internal topics are amortized among all of the
connectors within that cluster, and in a typical deployment, these
connectors should be handling at least 10-100x the number of data
partitions/bytes as are present in internal topics. So I would not expect
the overhead for internal topics to persistently consume 20% of a Kafka
cluster's resources.

2.The KIP states "Every new cluster requires three new topics, leading to
an exponential increase in topic creation."
Where are you seeing "exponential" topic creation? It should be linear in
the number of Connect clusters, so I'm wondering if this is an unfortunate
wording or an exaggeration.

3. The KIP states "Cross-team dependencies slow down provisioning, delaying
deployments."
In my experience provisioning Kafka topics is generally a lightweight
operation, and the Connect cluster does it automatically on your behalf on
first startup. If you are in an environment where additional processes are
in place that makes topic creation a nuisance, I think that reflects on
your environment more than on Connect, and therefore it seems a bit odd for
Connect to implement a workaround.
I am also interested whether other users experience the same nuisance
around topic provisioning, and how generally useful this feature is.

I think you should also mention some of the technical tradeoffs of this
feature, such as:
* Read amplification: Connect workers need to consume irrelevant data from
other clusters and discard it after incurring the costs of transferring and
deserializing the data, both during startup and ongoing operations.
* Security concerns: Sharing credentials for internal topics among multiple
services violates the Principle of Least Privilege and makes compromise of
one Connect cluster more impactful.
* Correlated failures: The unavailability/corruption of one topic now
affects multiple Connect clusters instead of just one
* Complications with Exactly Once mode: transactional writes to internal
topics may cause unavailability in other clusters from hanging transactions.

Thanks!
Greg

On Mon, Apr 28, 2025 at 8:01 AM pritam kumar <kumarpritamm...@gmail.com>
wrote:

> Hi Kafka Community,
>
> I'd like to start a discussion on KIP-1173: Connect Storage Topics Sharing
> Across Clusters
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1173%3A+Connect+Storage+Topics+Sharing+Across+Clusters
> >
> .
>
> The primary motivation for writing this KIP and proposing this enhancement
> came from the operational overhead associated with the creation of
>
> *three storage topics every time when spinning up a new Kafka Connect
> Cluster. *While each cluster only requires *three topics*, their cumulative
> impact grows significantly as more kafka connect clusters are deployed
> not only operationally but also but also from the management, monitoring
> and cleaning perspective.
>
> This also makes it very hard to provision the Kafka Connect Clusters on
> demand even if operating on the same Kafka Cluster.
>
> But as these topics have very light traffic and are compacted, instead of
> provisioning dedicated topics for every cluster, Kafka Connect
> clusters can *share
> internal topics* across multiple deployments. This brings *immediate
> benefits*:
>
>    - *Drastically Reduces Topic Proliferation* – Eliminates unnecessary
>    topic creation.
>    - *Faster Kafka Connect Cluster Deployment* – No waiting for new topic
>    provisioning.
>       - *Large Enterprises with Multiple Teams Using Kafka Connect*
>          - *Scenario:* In large organisations, multiple teams manage
>          different *Kafka Connect clusters* for various data pipelines.
>          - *Benefit:* Instead of waiting for new *internal topics* to be
>          provisioned each time a new cluster is deployed, teams can
> *immediately
>          start* using pre-existing shared topics, reducing lead time and
>          improving efficiency.
>       - *Cloud-Native & Kubernetes-Based Deployments*
>          - *Scenario:* Many organisations deploy Kafka Connect in
> *containerised
>          environments* (e.g., Kubernetes), where clusters are
> frequently *scaled
>          up/down* or *recreated* dynamically.
>          - *Benefit:* Since internal topics are already available, new
>          clusters can *spin up instantly*, without waiting for *topic
>          provisioning* or *Kafka ACL approvals*.
>       - How this will help different organisations:
>    - *Lower Operational Load* – Reduces disk-intensive cleanup operations.
>       - Broker resource utilization is expected to decrease by
>       approximately 20%, primarily due to reduced partition count and
> metadata
>       overhead. This optimization can enable further cluster downscaling,
>       contributing directly to lower infrastructure costs (e.g., fewer
> brokers,
>       reduced EBS storage footprint, and lower I/O throughput).
>       - Administrative overhead and monitoring complexity are projected to
>       reduce by 30%, due to:
>          - Fewer topics to configure, monitor, and apply
>          retention/compaction policies to.
>          - Reduced rebalancing operations during cluster scale-in or
>          scale-out events.
>       - *Simplified Management* – Less overhead in monitoring and
>       maintaining internal topics.
>
> More details on this can be found inside this KIP.
>
> KIP LINK ->
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1173%3A+Connect+Storage+Topics+Sharing+Across+Clusters
>
> Thanks,
> Pritam
>

Reply via email to