Hi Paul,

Thanks for the KIP. A few comments.

To me, biggest question here is if we can fix this behavior without adding
a config. In particular, today, we don't even set the client.id for the
producer and consumer at all, right? The *only* way it is set is if you
include an override in the worker config, but in that case you need to be
explicitly opting in with a `producer.` or `consumer.` prefix, i.e. the
settings are `producer.client.id` and `consumer.client.id`. Otherwise, I
think we're getting the default behavior where we generate unique,
per-process IDs, i.e. via this logic
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L662-L664

If that's the case, would it maybe be possible to compatibly change the
default to use task IDs in the client ID, but only if we don't see an
existing override from the worker config? This would only change the
behavior when someone is using the default, but since the default would
just use what is effectively a random ID that is useless for monitoring
metrics, presumably this wouldn't affect any existing users. I think that
would avoid having to introduce the config, give better out of the box
behavior, and still be a safe, compatible change to make.


Other than that, just two minor comments. On the config naming, not sure
about a better name, but I think the config name could be a bit clearer if
we need to have it. Maybe something including "task" like
"task.based.client.ids" or something like that (or change the type to be an
enum and make it something like task.client.ids=[default|task] and leave it
open for extension in the future if needed).

Finally, you have this:

*"Allow overriding client.id <http://client.id/> on a per-connector basis"*
>
> This is a much more complex change, and would require individual
> connectors to be updated to support the change. In contrast, the proposed
> approach would immediately allow detailed consumer/producer monitoring for
> all existing connectors.
>

I don't think this is quite accurate. I think the reason to reject is that
for your particular requirement for metrics, it simply doesn't give enough
granularity (there's only one value per entire connector), but it doesn't
require any changes to connectors. The framework allocates all of these and
there are already framework-defined config values that all connectors share
(some for only sinks or sources), so the framework can handle all of this
without changes to connectors. Further, with connector-specific overrides,
you could get task-specific values if interpolation were supported in the
config value (as we now do with managed secrets). For example, it could
support something like client.id=connector-${taskId} and the task ID would
be substituted automatically into the string.

I don't necessarily like that solution (seems complicated and not a great
user experience), but it could work.

-Ewen




On Thu, Dec 20, 2018 at 5:05 PM Paul Davidson <pdavid...@salesforce.com>
wrote:

> Hi everyone,
>
> I would like to start a discussion around the following KIP:
> *
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-411%3A+Add+option+to+make+Kafka+Connect+task+client+ID+values+unique
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-411%3A+Add+option+to+make+Kafka+Connect+task+client+ID+values+unique
> >*
>
> This proposes a small change to allow Kafka Connect the option to
> auto-generate unique client IDs for each task. This enables granular
> monitoring of the producer / consumer client in each task.
>
> Feedback is appreciated, thanks in advance!
>
> Paul
>

Reply via email to