[ 
https://issues.apache.org/jira/browse/KAFKA-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17109753#comment-17109753
 ] 

Chris Egerton commented on KAFKA-8177:
--------------------------------------

As of 
[KIP-458|https://cwiki.apache.org/confluence/display/KAFKA/KIP-458%3A+Connector+Client+Config+Override+Policy]
 this is now possible by explicitly setting the {{consumer.override.group.id}} 
property in your connector configuration.

Note that this only takes effect if the connector client override policy for 
the worker allows it; otherwise, the configuration will be rejected by the 
worker.

[~pgwhalen] I know this isn't an optimal solution (seems like it'd be easier if 
we just included the cluster ID in the consumer group ID from the beginning but 
it's too late to do that now without breaking backwards compatibility for 
existing sink connectors), but it should address this gap in functionality in 
the Connect framework. Do you mind if we mark this ticket resolved?

> Allow for separate connect instances to have sink connectors with the same 
> name
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-8177
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8177
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>            Reporter: Paul Whalen
>            Priority: Minor
>              Labels: connect
>
> If you have multiple Connect instances (either a single standalone or 
> distributed group of workers) running against the same Kafka cluster, the 
> connect instances cannot each have a sink connector with the same name and 
> still operate independently. This is because the consumer group ID used 
> internally for reading from the source topic(s) is entirely derived from the 
> connector's name: 
> [https://github.com/apache/kafka/blob/d0e436c471ba4122ddcc0f7a1624546f97c4a517/connect/runtime/src/main/java/org/apache/kafka/connect/util/SinkUtils.java#L24]
> The documentation of Connect implies to me that it supports "multi-tenancy," 
> that is, as long as...
>  * In standalone mode, the {{offset.storage.file.filename}} is not shared 
> between instances
>  * In distributed mode, {{group.id}} and {{config.storage.topic}}, 
> {{offset.storage.topic}}, and {{status.storage.topic}} are not the same 
> between instances
> ... then the connect instances can operate completely independently without 
> fear of conflict.  But the sink connector consumer group naming policy makes 
> this untrue. Obviously this can be achieved by uniquely naming connectors 
> across instances, but in some environments that could be a bit of a nuisance, 
> or a challenging policy to enforce. For instance, imagine a large group of 
> developers or data analysts all running their own standalone Connect to load 
> into a SQL database for their own analysis, or replicating to mirroring to 
> their own local cluster for testing.
> The obvious solution is allow supplying config that gives a Connect instance 
> some notion of identity, and to use that when creating the sink task consumer 
> group. Distributed mode already has this obviously ({{group.id}}), but it 
> would need to be added for standalone mode. Maybe {{instance.id}}? Given that 
> solution it seems like this would need a small KIP.
> I could also imagine this solving this problem through better documentation 
> ("ensure your connector names are unique!"), but having that subtlety doesn't 
> seem worth it to me. (Optionally) assigning identity to every Connect 
> instance seems strictly more clear, without any downside.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to