[
https://issues.apache.org/jira/browse/KAFKA-15841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825454#comment-17825454
]
Greg Harris commented on KAFKA-15841:
-------------------------------------
[~henriquemota] Okay I think i understand better what you're trying to achieve.
> ... one topic per table...
> We have a JDBC Sink for each table.
Okay, you're using scenario (1), one connector per-topic, which should come to
at most 90 * 100 = 9000 connectors per Connect cluster. That is certainly too
many to fit on a single machine, and certainly needs a cluster to distribute
the work.
In this scenario, Connect should be able to distribute approximately 9000/M
connectors and 9000/M tasks to each of the M workers in a distributed cluster,
barring any other practical limits/timeouts that i'm not aware of, so check for
ERROR messages.
> We tried to change the 'topics' property in the configurations using the
> 'taskConfigs(int maxTasks)' method, but Kafka Connect ignores this property
> when it is returned by 'taskConfigs(int maxTasks)'.
The reason it does this is because the `topics` property is passed to the
consumers to have them subscribe to the input topics, and the Consumer/Connect
processing model has this subscription be the same for all consumers.
This doesn't mean that every consumer is consuming every topic, however. Having
a uniform subscription across all of the consumers in a group tells the
consumers to assign the work among themselves, assigning the topic-partitions
to each of the consumers according to the configured assignor.
As an example, say your connector config had `topics=a,b`, and these two topics
had 2 partitions, and tasks.max=2.
The `topics` configs for both task-0 and task-1 would both be `a,b`, but the 4
partitions could be distributed like this by the consumer partition assignor:
task-0: a-0, b-0
task-1: a-1, b-1
Or any permutation. This is where the partitioner I mentioned is important; The
RangeAssignor can generate some pretty unbalanced assignments:
[https://kafka.apache.org/37/javadoc/org/apache/kafka/clients/consumer/RangeAssignor.html]
If you choose a different assignor (RoundRobin, Sticky, etc), then you can
switch to scenario (2), with one connector per client, and some tasks.max
around 10. This would give you ~90 connectors with 900 tasks, each working on
10 topics.
Tou can tune tasks.max up and down if you need more throughput or want less
consumer/task overhead.
> Add Support for Topic-Level Partitioning in Kafka Connect
> ---------------------------------------------------------
>
> Key: KAFKA-15841
> URL: https://issues.apache.org/jira/browse/KAFKA-15841
> Project: Kafka
> Issue Type: Improvement
> Components: connect
> Reporter: Henrique Mota
> Priority: Trivial
> Attachments: image-2024-02-19-13-48-55-875.png
>
>
> In our organization, we utilize JDBC sink connectors to consume data from
> various topics, where each topic is dedicated to a specific tenant with a
> single partition. Recently, we developed a custom sink based on the standard
> JDBC sink, enabling us to pause consumption of a topic when encountering
> problematic records.
> However, we face limitations within Kafka Connect, as it doesn't allow for
> appropriate partitioning of topics among workers. We attempted a workaround
> by breaking down the topics list within the 'topics' parameter.
> Unfortunately, Kafka Connect overrides this parameter after invoking the
> {{taskConfigs(int maxTasks)}} method from the
> {{org.apache.kafka.connect.connector.Connector}} class.
> We request the addition of support in Kafka Connect to enable the
> partitioning of topics among workers without requiring a fork. This
> enhancement would facilitate better load distribution and allow for more
> flexible configurations, particularly in scenarios where topics are dedicated
> to different tenants.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)