Ewen Cheslack-Postava commented on KAFKA-6433:

This needs a lot of thought around upgrades, compatibility, and debuggability. 
There are all sorts of weird issues you can get into with something like this.

I agree that the general goal of checking that important configs are aligned is 
absolutely the right thing to do. Today, unless I'm forgetting something, we 
basically only check that the group and config offset match. Lots of other 
things could potentially mismatch and cause problems.

But things "matching" can be tricky. Topic names are pretty straightforward and 
we can validate easily. Validating anything like "the same set of connectors" 
is tricky given both versioning and upgrading a cluster with a *new* connector. 
Same for converters and transformations. We'd need to define clear rules for 
what "compatibility" means here and when a node is allowed to run a 
connector/task. And who is the source of truth? Who defines what's new?

Personally, I'd argue it's actually clearer to have a log message saying 
"couldn't start connector X because class not found" from node Y than have to 
determine why all connectors/tasks are running on node Z because node W wasn't 
allowed to join worker group N for some mismatch of connectors. It might fail 
faster, but it tells you exactly what the problem is and leads to a clear 



> Connect distributed workers should fail if their config is "incompatible" 
> with leader's
> ---------------------------------------------------------------------------------------
>                 Key: KAFKA-6433
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6433
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>    Affects Versions: 1.0.0
>            Reporter: Randall Hauch
>            Priority: Major
>              Labels: needs-kip
> Currently, each distributed worker config must have the same `worker.id` and 
> must use the same internal topics for configs, offsets, and status. 
> Additionally, each worker must be configured to have the same connectors, 
> SMTs, and converters; confusing error messages will result when some workers 
> are able to deploy connector tasks with SMTs while others fail when they are 
> missing plugins the other workers do have.
> Ideally, a Connect workers would only be allowed to join the cluster if it 
> were "compatible" with the the existing cluster, where "compatible" perhaps 
> includes using the same internal topics and having the same set of plugins.

This message was sent by Atlassian JIRA

Reply via email to