H Chris,

Thanks for taking a look.

1. Yes updating the description can potentially trigger a rebalance. I
don't expect users to frequently update the description so I thought
this was acceptable. I've added a note to the KIP to mention it.

2. The tags model you described could be interesting but it looks
pretty involved with multiple new endpoints and brand new concepts.

With this KIP, I really took the most basic approach and proposed the
simplest set of changes that could get in the next release and, I
think, immediately bring benefits. However sometimes this is not the
best approach as a "quick fix" could end up problematic to support in
the future. The drawbacks (new connector config field + causing a
rebalance on update) look relatively benign to me so I thought this
could be an acceptable proposal.

Thanks,
Mickael



On Thu, Feb 23, 2023 at 2:45 PM Chris Egerton <chr...@aiven.io.invalid> wrote:
>
> Actually, I misspoke--a rebalance isn't triggered when an existing
> connector's config is updated. Assuming the set of workers remains stable,
> a rebalance is only necessary when a new connector is created, an existing
> one is deleted, or a new set of task configs is generated.
>
> This weakens the point about unnecessary rebalances when a connector's
> description is updated, but doesn't entirely address it. Spurious
> rebalances may still be triggered if a new set of task configs is
> generated, which for reasons outlined above, is fairly likely.
>
> On Thu, Feb 23, 2023 at 7:41 AM Chris Egerton <chr...@aiven.io> wrote:
>
> > Hi Mickael,
> >
> > Thanks for the KIP!
> >
> > While it's tempting to add this field to the out-of-the-box connector
> > config def, I'm a little hesitant, for two reasons.
> >
> > 1. Adding this directly to the connector config could have unintended
> > consequences on the actual data processing by the connector. Any time a
> > connector's config is modified, the Connector object running for it is
> > restarted with that new config. In most cases this is a trivial operation
> > since we have incremental rebalancing enabled by default, the connector can
> > (and probably should!) generate task configs that are functionally
> > identical to the ones it last generated, and most (though not all)
> > Connector classes are fairly lightweight and leave the real work to their
> > Task class. However, due to KAFKA-9228 [1], it's not just common practice
> > for connectors to perform transparent passthrough of most of their configs
> > when generating task configs, it's actually necessary to work around a bug
> > in the runtime. As a result, tweaking the description of a connector would
> > be fairly likely to result in a full restart of all of its tasks, in
> > addition to triggering two rebalances (which may not be so lightweight if
> > users are still running with eager rebalancing... which, sadly, I've heard
> > is still happening today).
> >
> > 2. The motivation section mentions some information that might go into the
> > description field, such as the team that owns the connector and emergency
> > contact info. It seems like this info might benefit from a little more
> > structure if we're trying to design for programmatic access by GUIs and
> > CLIs (which I'm assuming is the case, since I can't imagine a human being
> > getting much use out of the raw output from the GET
> > /connector-plugins/{name}/config and PUT
> > /connector-plugins/{name}/config/validate endpoints). This might also make
> > it easier to add custom validation logic around what kind of information is
> > present via REST extension.
> >
> >
> > With these thoughts in mind, what do you think about adding a new generic
> > "tags" object to connectors that can support arbitrary user-provided
> > key/value pairs? If using the POST /connectors endpoint, it might look
> > something like this:
> >
> > {
> >   "name": "my-connector",
> >   "config": {
> >     "connector.class": "MirrorSource",
> >     "tasks.max": "908"
> >   },
> >   "tags": {
> >     "team": "data-infra",
> >     "phone": "12345678901",
> >     "email": "din...@example.org"
> >   }
> > }
> >
> > And, to allow users to modify connector tags after one has been created,
> > we might introduce a /connectors/{name}/tags endpoint with PUT/PATCH/DELETE
> > verbs that writes tag info for a connector to the config topic, but without
> > altering the actual connector config (allowing us to skip a rebalance
> > altogether).
> >
> > One other thing we could consider is allowing cluster administrators to
> > require that connectors are created with a certain set of tags, or even add
> > per-tag regex validation. They might specify something like
> > "connector.required.tags = team, phone, email" or
> > "connector.tags.phone.regex = [0-9]{11}" in a worker config file. But this
> > is probably overboard for now, especially since it's already possible to
> > accomplish via a REST extension.
> >
> > Finally, I'm also wondering if, in pursuit of expanding Connect's
> > out-of-the-box support for dealing with connector metadata, we might want
> > to expose created/last-updated times for connector configurations. We
> > definitely don't have to do that in this KIP but if you agree that this
> > would be useful, we should probably keep it in mind to avoid painting
> > ourselves into a corner. This is why I'm thinking of using "tags" instead
> > of something a little more generic like "metadata", BTW.
> >
> > [1] -
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration
> >
> > Cheers,
> >
> > Chris
> >
> > On Thu, Feb 23, 2023 at 4:52 AM Mickael Maison <mickael.mai...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> I created a very small KIP to add a description field to connectors:
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration
> >>
> >> Let me know if you have any feedback.
> >>
> >> Thanks,
> >> Mickael
> >>
> >

Reply via email to