Hi Mickael,

I do sympathize with the desire for a "quick fix". I think your point about
these being problematic to support sums up my hesitation here pretty well,
both with respect to the potential footgun of unintended rebalances (should
users try to do more with this field than we expect), and with making other
similar improvements (i.e., expanded metadata support in Kafka Connect)
more difficult to design correctly for us and to use effectively for users
(especially if we were to deprecate/remove the description field at a later
date).

It's also worth noting that users can already add a "description" field to
the JSON config for any connector they create; the benefit provided by this
KIP is that they're automatically prompted to do so if they're using a
UI/CLI/etc. that builds on top of the various /connector-plugins endpoints
to find the set of configuration properties that a user can/should populate
before creating a connector.

I'd welcome thoughts from you and others on whether the tradeoffs here are
worth it; right now I'm still not sure about this one.

Cheers,

Chris

On Tue, Mar 7, 2023 at 6:42 AM Mickael Maison <mickael.mai...@gmail.com>
wrote:

> H Chris,
>
> Thanks for taking a look.
>
> 1. Yes updating the description can potentially trigger a rebalance. I
> don't expect users to frequently update the description so I thought
> this was acceptable. I've added a note to the KIP to mention it.
>
> 2. The tags model you described could be interesting but it looks
> pretty involved with multiple new endpoints and brand new concepts.
>
> With this KIP, I really took the most basic approach and proposed the
> simplest set of changes that could get in the next release and, I
> think, immediately bring benefits. However sometimes this is not the
> best approach as a "quick fix" could end up problematic to support in
> the future. The drawbacks (new connector config field + causing a
> rebalance on update) look relatively benign to me so I thought this
> could be an acceptable proposal.
>
> Thanks,
> Mickael
>
>
>
> On Thu, Feb 23, 2023 at 2:45 PM Chris Egerton <chr...@aiven.io.invalid>
> wrote:
> >
> > Actually, I misspoke--a rebalance isn't triggered when an existing
> > connector's config is updated. Assuming the set of workers remains
> stable,
> > a rebalance is only necessary when a new connector is created, an
> existing
> > one is deleted, or a new set of task configs is generated.
> >
> > This weakens the point about unnecessary rebalances when a connector's
> > description is updated, but doesn't entirely address it. Spurious
> > rebalances may still be triggered if a new set of task configs is
> > generated, which for reasons outlined above, is fairly likely.
> >
> > On Thu, Feb 23, 2023 at 7:41 AM Chris Egerton <chr...@aiven.io> wrote:
> >
> > > Hi Mickael,
> > >
> > > Thanks for the KIP!
> > >
> > > While it's tempting to add this field to the out-of-the-box connector
> > > config def, I'm a little hesitant, for two reasons.
> > >
> > > 1. Adding this directly to the connector config could have unintended
> > > consequences on the actual data processing by the connector. Any time a
> > > connector's config is modified, the Connector object running for it is
> > > restarted with that new config. In most cases this is a trivial
> operation
> > > since we have incremental rebalancing enabled by default, the
> connector can
> > > (and probably should!) generate task configs that are functionally
> > > identical to the ones it last generated, and most (though not all)
> > > Connector classes are fairly lightweight and leave the real work to
> their
> > > Task class. However, due to KAFKA-9228 [1], it's not just common
> practice
> > > for connectors to perform transparent passthrough of most of their
> configs
> > > when generating task configs, it's actually necessary to work around a
> bug
> > > in the runtime. As a result, tweaking the description of a connector
> would
> > > be fairly likely to result in a full restart of all of its tasks, in
> > > addition to triggering two rebalances (which may not be so lightweight
> if
> > > users are still running with eager rebalancing... which, sadly, I've
> heard
> > > is still happening today).
> > >
> > > 2. The motivation section mentions some information that might go into
> the
> > > description field, such as the team that owns the connector and
> emergency
> > > contact info. It seems like this info might benefit from a little more
> > > structure if we're trying to design for programmatic access by GUIs and
> > > CLIs (which I'm assuming is the case, since I can't imagine a human
> being
> > > getting much use out of the raw output from the GET
> > > /connector-plugins/{name}/config and PUT
> > > /connector-plugins/{name}/config/validate endpoints). This might also
> make
> > > it easier to add custom validation logic around what kind of
> information is
> > > present via REST extension.
> > >
> > >
> > > With these thoughts in mind, what do you think about adding a new
> generic
> > > "tags" object to connectors that can support arbitrary user-provided
> > > key/value pairs? If using the POST /connectors endpoint, it might look
> > > something like this:
> > >
> > > {
> > >   "name": "my-connector",
> > >   "config": {
> > >     "connector.class": "MirrorSource",
> > >     "tasks.max": "908"
> > >   },
> > >   "tags": {
> > >     "team": "data-infra",
> > >     "phone": "12345678901",
> > >     "email": "din...@example.org"
> > >   }
> > > }
> > >
> > > And, to allow users to modify connector tags after one has been
> created,
> > > we might introduce a /connectors/{name}/tags endpoint with
> PUT/PATCH/DELETE
> > > verbs that writes tag info for a connector to the config topic, but
> without
> > > altering the actual connector config (allowing us to skip a rebalance
> > > altogether).
> > >
> > > One other thing we could consider is allowing cluster administrators to
> > > require that connectors are created with a certain set of tags, or
> even add
> > > per-tag regex validation. They might specify something like
> > > "connector.required.tags = team, phone, email" or
> > > "connector.tags.phone.regex = [0-9]{11}" in a worker config file. But
> this
> > > is probably overboard for now, especially since it's already possible
> to
> > > accomplish via a REST extension.
> > >
> > > Finally, I'm also wondering if, in pursuit of expanding Connect's
> > > out-of-the-box support for dealing with connector metadata, we might
> want
> > > to expose created/last-updated times for connector configurations. We
> > > definitely don't have to do that in this KIP but if you agree that this
> > > would be useful, we should probably keep it in mind to avoid painting
> > > ourselves into a corner. This is why I'm thinking of using "tags"
> instead
> > > of something a little more generic like "metadata", BTW.
> > >
> > > [1] -
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration
> > >
> > > Cheers,
> > >
> > > Chris
> > >
> > > On Thu, Feb 23, 2023 at 4:52 AM Mickael Maison <
> mickael.mai...@gmail.com>
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> I created a very small KIP to add a description field to connectors:
> > >>
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-908%3A+Add+description+field+to+connector+configuration
> > >>
> > >> Let me know if you have any feedback.
> > >>
> > >> Thanks,
> > >> Mickael
> > >>
> > >
>

Reply via email to