[jira] [Commented] (KAFKA-5061) client.id should be set for Connect producers/consumers

Ewen Cheslack-Postava (JIRA) Sun, 24 Sep 2017 16:28:18 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-5061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178405#comment-16178405
 ]


Ewen Cheslack-Postava commented on KAFKA-5061:
----------------------------------------------

Hmm, not sure I did suggest a default? There are at least two options there and 
unfortunately there is a tradeoff in:

a) uniquely identifying the clients in metrics
b) enforcing quotas

For (a), connect is relatively unusual in that many applications using Kafka 
won't need more than one producer/consumer per process and metrics name 
conflicts aren't an issue for global per-process registries like JMX; Connect, 
however, has common cases where using just the task name for the client ID 
could cause conflicts since it will be common to have 2 tasks from the same 
connector running in the same worker process. (Note that KIP-196 is adding 
per-task metrics, some of which might replace the need for the lower-level 
producer/consumer metrics, so some of these arguments may not be as critical; 
however, that info is still useful for debugging, so not ideal if just lost).

For (b), the tradeoff depends a lot on whether you are working with a secured 
cluster (i.e. whether your quotas can include your auth principal or if you're 
working only with client.id), and of course, whether you even care about quotas 
in your environment. If you're only working with client.id, then using the full 
task ID isn't great since you probably want to apply quoatas to the logical 
group of all tasks for a given connector (i.e. in that case you really want to 
define client.id == connector ID). And of course, I can also imagine just 
wanting to apply quotas across the entire Connect cluster, in which case you 
want your client IDs == Connect cluster ID.

Finally, I think there is also an issue wrt allowing some sort of prefix/custom 
formatting on top of the basic task ID -- if you're running multiple connect 
clusters, you may need to be able to differentiate their tasks. This probably 
relates to KAFKA-4028 and might mean just having a sane default template but 
letting the user override it would be the option that provides the most 
flexibility (assuming we think that flexibility is really needed). And of 
course, such a config would need a KIP as it would be new user-facing API.

So I'm not sure what the right answer here is exactly -- I can see a number of 
use cases with different requirements, but I'm not sure which ones people will 
be trying to use in practice and would prefer to keep things less complicated 
if possible.

> client.id should be set for Connect producers/consumers
> -------------------------------------------------------
>
>                 Key: KAFKA-5061
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5061
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>    Affects Versions: 0.10.2.1
>            Reporter: Ewen Cheslack-Postava
>              Labels: needs-kip, newbie++
>
> In order to properly monitor individual tasks using the producer and consumer 
> metrics, we need to have the framework disambiguate them. Currently when we 
> create producers 
> (https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L362)
>  and create consumers 
> (https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L371-L394)
>  the client ID is not being set. You can override it for the entire worker 
> via worker-level producer/consumer overrides, but you can't get per-task 
> metrics.
> There are a couple of things we might want to consider doing here:
> 1. Provide default client IDs based on the worker group ID + task ID 
> (providing uniqueness for multiple connect clusters up to the scope of the 
> Kafka cluster they are operating on). This seems ideal since it's a good 
> default; however it is a public-facing change and may need a KIP. Normally I 
> would be less worried about this, but some folks may be relying on picking up 
> metrics without this being set, in which case such a change would break their 
> monitoring.
> 2. Allow overriding client.id on a per-connector basis. I'm not sure if this 
> will really be useful or not -- it lets you differentiate between metrics for 
> different connectors' tasks, but within a connector, all metrics would go to 
> a single client.id. On the other hand, this makes the tasks act as a single 
> group from the perspective of broker handling of client IDs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-5061) client.id should be set for Connect producers/consumers

Reply via email to