Re: [DISCUSS] KIP-882: Make Kafka Connect REST API request timeouts configurable

Yash Mayya Sun, 19 Mar 2023 02:08:16 -0700

Hi Greg,

> Applying this to a hypothetical extension of timeouts
> to other endpoints: my concern was primarily about
> whether this design applied everywhere felt natural or
> awkward. In my opinion, a request parameter would
> draw a lot of outsized attention in documentation, and
> be the first parameter for a lot of the endpoints, while a
> header could be specified as a more obscure global
> modifier to all requests.


That's a fair point, I do agree that the choice of a request header here
seems more appropriate now.

> My chief concern was about the clarity of the API in the
> future when both configurable timeouts and asynchronous
>  APIs exist. Will the configurable timeouts still make sense,
>  and will they be a feature that people will use?

I do believe that there's a place for both of these to co-exist - there
will always be use-cases that require config validation errors to be
reported synchronously so that connector configurations can be corrected or
iterated upon. I don't really see there being valid use cases for having a
timeout of 20 or 60 minutes as you suggested, but I do think that the
genuine edge cases that currently exist requiring a config validation
timeout of greater than the current 90 seconds (2-3 minutes perhaps)
deserve a way to be handled until we have an asynchronous config validation
API (if we ever do go down that path).

I've updated the KIP to reflect the change to use a request header rather
than a query parameter to configure the timeout value.

Thanks,
Yash

On Tue, Mar 14, 2023 at 2:46 AM Greg Harris <[email protected]>
wrote:

> Yash,
>
> 1. I think that Request-Timeout is a fine choice. I did not see any
> standard header for a use-case like this, so we are free to choose a name
> for ourselves.
>
> 2. While such proposals do not exist, I don't think that means that we
> should exclude them from consideration. If we do, then we run the risk of
> making future implementations more complicated, impossible, or awkward to
> use. This is a balance of course, as we cannot let future features paralyze
> us from making improvements today. We cannot hold ourselves responsible for
> handling unknown unknowns, but we can at least try to plan around known
> unknowns.
> Applying this to a hypothetical extension of timeouts to other endpoints:
> my concern was primarily about whether this design applied everywhere felt
> natural or awkward. In my opinion, a request parameter would draw a lot of
> outsized attention in documentation, and be the first parameter for a lot
> of the endpoints, while a header could be specified as a more obscure
> global modifier to all requests.
>
> 3. 4. Applying the same standard to a hypothetical asynchronous I don't
> believe it makes async validation substantially more complicated, except
> the "request timeout interrupts connector creation" interaction. I also do
> not think that it would make it impossible to implement an async validation
> scheme in the future, since the synchronization is already in place, the
> timeout is just different.
> My chief concern was about the clarity of the API in the future when both
> configurable timeouts and asynchronous APIs exist. Will the configurable
> timeouts still make sense, and will they be a feature that people will use?
> Are we expecting users to hold a single REST request open for 5, 20, or 60
> minutes to accommodate a connector with an excessively long validation
> step? Successfully validating a request requires the caller and the network
> to remain stable for the entire duration of the request to have a chance of
> success, and consumes (some small) amount of resources during that time.
> And what if they decide they no longer want the validation result, and wish
> to cancel the operation to free resources (both connect and external
> system)? There is no standard mechanism to discard or cancel an HTTP
> request (not even a connection loss).
>
> You have said that the default timeout for other endpoints is more than
> satisfactory for the typical user, and I agree. If other operations
> (deleting, restarting, pause/unpause, etc) had a substantial synchronous
> component that would benefit from a configurable timeout, I would be more
> inclined to support configurable timeouts (all at once, or validation first
> and other operations later).
> But as it stands, I don't think I see the value-add of configurable
> timeouts once there is another solution which targets the validate call
> duration specifically, and I'm concerned that configurable timeouts would
> be made irrelevant in that case.
>
> Thanks,
> Greg
>
> On Wed, Mar 8, 2023 at 7:21 AM Yash Mayya <[email protected]> wrote:
>
> > Hi Greg,
> >
> > Thanks for the response!
> >
> > 1. Hm that's a fair point, and while there aren't any strict guidelines
> or
> > rules governing whether something should be a query parameter versus a
> > request header, I agree that it might be more idiomatic for a request
> > timeout value to be accepted via a custom request header. What do you
> > think about using a header name like "X-Request-Timeout", or maybe simply
> > "Request-Timeout" if we want to take into account
> > https://www.rfc-editor.org/rfc/rfc6648?
> >
> > 2. 3. 4. Adding asynchronous validations via new endpoints / new request
> > parameters on existing endpoints is definitely an interesting idea but
> I'm
> > not sure whether this KIP should take into consideration a hypothetical
> > future proposal? If there were already such a concrete proposal, I would
> > definitely agree that this KIP should take that into consideration - but
> > that isn't the case today. Furthermore, I'm not sure I understand why the
> > addition of request timeout configurability to the existing endpoints
> would
> > preclude the introduction of an asynchronous validation API in the
> future?
> >
> > Thanks,
> > Yash
> >
> > On Sat, Mar 4, 2023 at 1:13 AM Greg Harris <[email protected]
> >
> > wrote:
> >
> > > Yash,
> > >
> > > 1.
> > > Currently the request parameters in the REST API are individual and
> > pertain
> > > to just one endpoint.
> > > They also change the content of the query result, or change the action
> > > taken on the cluster.
> > > I think that a request timeout is a property of the HTTP request more
> > than
> > > it is a property of the cluster query or cluster action.
> > > The two solutions have very similar tradeoffs, but I'm interested in
> > > whether one is more idiomatic and more obvious to users.
> > >
> > > 2.
> > > I understand that only these three endpoints are in need of increased
> > > timeouts at this time due to long connector validations.
> > > From another perspective, this change is making the API more irregular
> > and
> > > someone in the future might choose to make it more regular by
> > standardizing
> > > the configurable timeout functionality.
> > > I wouldn't (in this KIP) dismiss someone's desire to configure other
> > > timeouts in the future (in another KIP), and design them into a corner.
> > > It is acceptable to limit the scope of this change to just the three
> > > endpoints due to practical reasons, but I don't think that should
> prevent
> > > us from trying to ensure that this design fits in the "end goal" state
> of
> > > the Connect service.
> > >
> > > 3. 4.
> > > I am not suggesting an incompatible change, as the current synchronous
> > > behavior is still a useful API for certain situations. I think that it
> is
> > > possible to add asynchronous validations in a backwards compatible way,
> > > using new endpoints or other new request parameters.
> > > The interface could be designed such that users with connectors that
> > exceed
> > > the synchronous timeouts can utilize the asynchronous API. Tooling can
> > use
> > > the asynchronous API when it is available, and fall back to the
> > synchronous
> > > API when it is not.
> > > I think that it also may be more in-line with the design of the rest of
> > the
> > > REST API, where nearly every other request is asynchronous. That's why
> > > you're only targeting these three endpoints, they're the only ones
> with a
> > > synchronicity constraint.
> > > Again, I'm not necessarily saying that you must implement such an
> > > asynchronous validation scheme in this KIP, but we should consider if
> > that
> > > is a more extensible solution. If we decided to implement configurable
> > > synchronous timeouts now, how would that complement an asynchronous API
> > in
> > > the future?
> > >
> > > On Thu, Mar 2, 2023 at 10:00 PM Yash Mayya <[email protected]>
> wrote:
> > >
> > > > Hi Greg,
> > > >
> > > > Thanks for taking a look!
> > > >
> > > > 1. I believe Chris suggested the use of a query parameter above as we
> > > > already have precedents for using query parameters to configure per
> > > request
> > > > behavior in Kafka Connect (the restart connectors API and the get
> > > > connectors API for instance). Also, the limited choice of endpoints
> > > > targeted is intentional (see my reply to the next point).
> > > >
> > > > 2. I intentionally targeted just the three listed endpoints where
> > > > synchronous connector config validations come into the picture. This
> is
> > > > because of the legitimate cases where config validation for specific
> > > > connector plugins might exceed the default request timeout in edge
> case
> > > > scenarios (outlined in the KIP's motivation section). Other Connect
> > REST
> > > > endpoints shouldn't be taking longer than the default 90 second
> request
> > > > timeout; if they do so, it would either be indicative of a bug in the
> > > > Connect framework or a cluster health issue - neither of which should
> > be
> > > > covered up by manually setting a longer request timeout.
> > > >
> > > > 3. 4. I think changing the config validation behavior would be a
> > backward
> > > > incompatible change and I wanted to avoid that in this particular
> KIP.
> > > > There are multiple programmatic UIs out there which rely on the
> current
> > > > synchronous config validation behavior and breaking the existing
> > contract
> > > > would definitely require a larger discussion.
> > > >
> > > > Thanks,
> > > > Yash
> > > >
> > > > On Fri, Mar 3, 2023 at 12:04 AM Greg Harris
> > <[email protected]
> > > >
> > > > wrote:
> > > >
> > > > > Hey Yash,
> > > > >
> > > > > Thanks for the KIP, and sorry for the late review.
> > > > >
> > > > > 1. Have you considered a HTTP header to provide the
> > client-configurable
> > > > > timeout? A header might more naturally extend to all of the other
> > > > endpoints
> > > > > in the future, rather than duplicating the query parameter across
> > > > > endpoints.
> > > > >
> > > > > 2. I understand that this change is targeted at just long duration
> > > > > Connector::validation calls, is that due to voluntary scope
> > > constraints?
> > > > > Implementing configurable timeouts for all endpoints in a uniform
> way
> > > > could
> > > > > be desirable, even if the default timeout will work for nearly all
> of
> > > the
> > > > > other calls.
> > > > >
> > > > > 3. Did you consider adding asynchronous validation as a user-facing
> > > > > feature? I think that relying on the synchronous validation results
> > in
> > > a
> > > > > single HTTP request is a bit of an anti-pattern, and one that we've
> > > > > inherited from the original REST design. It seems useful when using
> > the
> > > > > REST API by hand, but seems to be a liability when used in
> > environments
> > > > > with an external management layer.
> > > > >
> > > > > 4. Perhaps rather than allowing synchronous calls to
> > Connector:validate
> > > > to
> > > > > increase in duration, we should provide a way for connectors to
> > surface
> > > > > validation problems later in their lifecycle. Currently there is
> the
> > > > > ConnectorContext::raiseError that transitions the task to FAILED,
> > > > perhaps a
> > > > > similar API could asynchronously emit re-validation results. We've
> > also
> > > > had
> > > > > a problem with long start() duration for the same reasons as
> > > validate().
> > > > >
> > > > > I understand if you want to keep this KIP as tightly focused as
> > > possible,
> > > > > but i'm worried that it is addressing the symptom and not the
> > problem.
> > > I
> > > > > want to make sure that this change is impactful and isn't obsoleted
> > by
> > > a
> > > > > later improvement.
> > > > >
> > > > > Thanks,
> > > > > Greg
> > > > >
> > > > >
> > > > > On Wed, Mar 1, 2023 at 5:07 AM Yash Mayya <[email protected]>
> > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Thanks for all the feedback and discussion. I've renamed the KIP
> > > title
> > > > to
> > > > > > "Kafka Connect REST API timeout improvements" since we're
> > > introducing a
> > > > > > couple of improvements (cancelling create / update connector
> > requests
> > > > > when
> > > > > > config validations timeout and avoiding double config validations
> > in
> > > > > > distributed mode) along with making the request timeouts
> > > configurable.
> > > > > The
> > > > > > new KIP link is
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-882%3A+Kafka+Connect+REST+API+timeout+improvements
> > > > > > and I've called for a vote for the KIP here -
> > > > > > https://lists.apache.org/thread/mgx8lczx2f57pk7x3vh0nqk00s79grgp
> .
> > > > > >
> > > > > > Thanks,
> > > > > > Yash
> > > > > >
> > > > > > On Sat, Nov 5, 2022 at 11:42 PM Sagar <[email protected]
> >
> > > > wrote:
> > > > > >
> > > > > > > Hey Yash,
> > > > > > >
> > > > > > > Thanks for the explanation. I think it should be fine to
> delegate
> > > the
> > > > > > > validation directly to the leader.
> > > > > > >
> > > > > > > Thanks!
> > > > > > > Sagar.
> > > > > > >
> > > > > > > On Sat, Nov 5, 2022 at 10:42 AM Yash Mayya <
> [email protected]
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Sagar,
> > > > > > > >
> > > > > > > > Thanks for chiming in!
> > > > > > > >
> > > > > > > > > Having said that, why does the worker forward to the
> > > > > > > > > leader? I am thinking if the worker can perform the
> > validation
> > > on
> > > > > > it's
> > > > > > > > own,
> > > > > > > > > we could let it do the validation instead of forwarding
> > > > everything
> > > > > to
> > > > > > > the
> > > > > > > > > leader
> > > > > > > >
> > > > > > > > Only the leader is allowed to perform writes to the config
> > topic,
> > > > so
> > > > > > any
> > > > > > > > request that requires a config topic write ends up being
> > > forwarded
> > > > to
> > > > > > the
> > > > > > > > leader. The `POST /connectors` and `PUT
> > > > > /connectors/{connector}/config`
> > > > > > > > endpoints call `Herder::putConnectorConfig` which internally
> > > does a
> > > > > > > config
> > > > > > > > validation first before initiating another herder request to
> > > write
> > > > to
> > > > > > the
> > > > > > > > config topic (in distributed mode). If we want to allow the
> > first
> > > > > > worker
> > > > > > > to
> > > > > > > > do the config validation, and then forward the request to the
> > > > leader
> > > > > > just
> > > > > > > > for the write to the config topic, we'd either need something
> > > like
> > > > a
> > > > > > skip
> > > > > > > > validations query parameter on the endpoint like Chris talks
> > > about
> > > > > > above
> > > > > > > or
> > > > > > > > else a new internal only endpoint that just does a write to
> the
> > > > > config
> > > > > > > > topic without any prior config validation. However, we agreed
> > > that
> > > > > this
> > > > > > > > optimization doesn't really seem necessary for now and can be
> > > done
> > > > > > later
> > > > > > > if
> > > > > > > > deemed useful. I'd be happy to hear differing thoughts if
> any,
> > > > > however.
> > > > > > > >
> > > > > > > > > I think a bound is certainly needed but IMO it shouldn't go
> > > > beyond
> > > > > > > > > 10 mins considering this is just validation
> > > > > > > >
> > > > > > > > Yeah, I agree that this seems like a fair value - I've
> elected
> > to
> > > > go
> > > > > > > with a
> > > > > > > > default value of 10 minutes for the proposed worker
> > configuration
> > > > > that
> > > > > > > sets
> > > > > > > > an upper bound for the timeout query parameter.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Yash
> > > > > > > >
> > > > > > > > On Sat, Nov 5, 2022 at 10:30 AM Yash Mayya <
> > [email protected]
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Chris,
> > > > > > > > >
> > > > > > > > > Thanks again for your feedback. I think a worker
> > configuration
> > > > for
> > > > > > the
> > > > > > > > > upper bound makes sense - I initially thought we could
> > hardcode
> > > > it
> > > > > > > (just
> > > > > > > > > like the current request timeout is), but there's no reason
> > to
> > > > set
> > > > > > > > another
> > > > > > > > > artificial bound that isn't user configurable which is
> > exactly
> > > > what
> > > > > > > we're
> > > > > > > > > trying to change here in the first place. I've updated the
> > KIP
> > > > > based
> > > > > > on
> > > > > > > > all
> > > > > > > > > our discussion above.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Yash
> > > > > > > > >
> > > > > > > > > On Thu, Nov 3, 2022 at 11:01 PM Sagar <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Hey Yash,
> > > > > > > > >>
> > > > > > > > >> Thanks for the KIP! This looks like a useful feature.
> > > > > > > > >>
> > > > > > > > >> I think the discussion thread already has some great
> points
> > by
> > > > > > Chris.
> > > > > > > > Just
> > > > > > > > >> a couple of points/clarifications=>
> > > > > > > > >>
> > > > > > > > >> Regarding, pt#2 , I guess it might be better to forward to
> > the
> > > > > > leader
> > > > > > > as
> > > > > > > > >> suggested by Yash. Having said that, why does the worker
> > > forward
> > > > > to
> > > > > > > the
> > > > > > > > >> leader? I am thinking if the worker can perform the
> > validation
> > > > on
> > > > > > it's
> > > > > > > > >> own,
> > > > > > > > >> we could let it do the validation instead of forwarding
> > > > everything
> > > > > > to
> > > > > > > > the
> > > > > > > > >> leader(even though it might be cheap to forward all
> requests
> > > to
> > > > > the
> > > > > > > > >> leader).
> > > > > > > > >>
> > > > > > > > >> Pt#3 => I think a bound is certainly needed but IMO it
> > > shouldn't
> > > > > go
> > > > > > > > beyond
> > > > > > > > >> 10 mins considering this is just validation. We shouldn't
> > end
> > > up
> > > > > in
> > > > > > a
> > > > > > > > >> situation where a few faulty connectors end up blocking a
> > lot
> > > of
> > > > > > > request
> > > > > > > > >> processing threads, so while increasing the config is
> > > certainly
> > > > > > > helpful,
> > > > > > > > >> we
> > > > > > > > >> shouldn't set too high a value IMO. Of course I am also
> open
> > > to
> > > > > > > > >> suggestions
> > > > > > > > >> here.
> > > > > > > > >>
> > > > > > > > >> Thanks!
> > > > > > > > >> Sagar.
> > > > > > > > >>
> > > > > > > > >> On Thu, Nov 3, 2022 at 9:01 PM Chris Egerton
> > > > > > <[email protected]
> > > > > > > >
> > > > > > > > >> wrote:
> > > > > > > > >>
> > > > > > > > >> > Hi Yash,
> > > > > > > > >> >
> > > > > > > > >> > RE 2: That's a great point about validations already
> being
> > > > > > performed
> > > > > > > > by
> > > > > > > > >> the
> > > > > > > > >> > leader. For completeness's sake, I'd like to note that
> > this
> > > > only
> > > > > > > holds
> > > > > > > > >> for
> > > > > > > > >> > valid configurations; invalid ones are caught right now
> > > before
> > > > > > being
> > > > > > > > >> > forwarded to the leader. Still, I think it's fine to
> > forward
> > > > to
> > > > > > the
> > > > > > > > >> leader
> > > > > > > > >> > for now and optimize further in the future if necessary.
> > If
> > > > > > frequent
> > > > > > > > >> > validations are taking place they should be conducted
> via
> > > the
> > > > > `PUT
> > > > > > > > >> > /connector-plugins/{pluginName}/config/validate`
> endpoint,
> > > > which
> > > > > > > won't
> > > > > > > > >> do
> > > > > > > > >> > any forwarding at all.
> > > > > > > > >> >
> > > > > > > > >> > RE 3: Yes, those endpoints LGTM. And yes, bounds on the
> > > > timeout
> > > > > > also
> > > > > > > > >> seem
> > > > > > > > >> > reasonable... maybe a low-importance worker property
> could
> > > > work
> > > > > > for
> > > > > > > > >> that?
> > > > > > > > >> > Not sure what would make sense for a default; probably
> > > > somewhere
> > > > > > in
> > > > > > > > the
> > > > > > > > >> > 10-60 minute range but would be interested in others'
> > > > thoughts.
> > > > > > > > >> >
> > > > > > > > >> > Thanks for the clarification on the zombie fencing
> logic.
> > I
> > > > > think
> > > > > > we
> > > > > > > > >> might
> > > > > > > > >> > want to have some more subtle logic around the
> interaction
> > > > > between
> > > > > > > > >> calls to
> > > > > > > > >> > Admin::fenceProducers and a worker-level timeout
> property
> > if
> > > > we
> > > > > go
> > > > > > > > that
> > > > > > > > >> > route, but we can cross that particular bridge if we get
> > > back
> > > > to
> > > > > > it.
> > > > > > > > >> >
> > > > > > > > >> > Cheers,
> > > > > > > > >> >
> > > > > > > > >> > Chris
> > > > > > > > >> >
> > > > > > > > >> > On Wed, Nov 2, 2022 at 1:48 PM Yash Mayya <
> > > > [email protected]
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >> >
> > > > > > > > >> > > Hi Chris,
> > > > > > > > >> > >
> > > > > > > > >> > > Thanks a lot for the super quick response and the
> great
> > > > > > feedback!
> > > > > > > > >> > >
> > > > > > > > >> > > 1. I think that makes a lot of sense, and I'd be happy
> > to
> > > > > update
> > > > > > > the
> > > > > > > > >> KIP
> > > > > > > > >> > to
> > > > > > > > >> > > include this change in the scope. The current behavior
> > > where
> > > > > the
> > > > > > > API
> > > > > > > > >> > > response indicates a time out but the connector is
> > > > > > created/updated
> > > > > > > > >> > > eventually anyway can be pretty confusing and is
> > generally
> > > > > not a
> > > > > > > > good
> > > > > > > > >> > user
> > > > > > > > >> > > experience IMO.
> > > > > > > > >> > >
> > > > > > > > >> > > 2. Wow, thanks for pointing this out - it's a really
> > good
> > > > > catch
> > > > > > > and
> > > > > > > > >> > > something I hadn't noticed was happening with the
> > current
> > > > > > > > >> implementation.
> > > > > > > > >> > > While I do like the idea of having a query parameter
> > that
> > > > > > > determines
> > > > > > > > >> > > whether validations can be skipped, I'm wondering if
> it
> > > > might
> > > > > > not
> > > > > > > be
> > > > > > > > >> > easier
> > > > > > > > >> > > and cleaner to just do the leader check earlier and
> > avoid
> > > > > doing
> > > > > > > the
> > > > > > > > >> > > unnecessary config validation on the first worker?
> Since
> > > > each
> > > > > > > config
> > > > > > > > >> > > validation happens on its own thread, I'm not so sure
> > > about
> > > > > the
> > > > > > > > >> concern
> > > > > > > > >> > of
> > > > > > > > >> > > overloading the leader even on larger clusters,
> > especially
> > > > > since
> > > > > > > > >> > > validations aren't typically long running operations.
> > > > > > Furthermore,
> > > > > > > > >> even
> > > > > > > > >> > > with the current implementation, the leader will
> always
> > be
> > > > > > doing a
> > > > > > > > >> config
> > > > > > > > >> > > validation for connector create / update REST API
> > requests
> > > > on
> > > > > > any
> > > > > > > > >> worker.
> > > > > > > > >> > >
> > > > > > > > >> > > 3. That's a good point, and this way we can also
> > restrict
> > > > the
> > > > > > APIs
> > > > > > > > >> whose
> > > > > > > > >> > > timeouts are configurable - I'm thinking `PUT
> > > > > > > > >> > > /connector-plugins/{pluginName}/config/validate`,
> `POST
> > > > > > > /connectors`
> > > > > > > > >> and
> > > > > > > > >> > > `PUT /connectors/{connector}/config` are the ones
> where
> > > > such a
> > > > > > > > timeout
> > > > > > > > >> > > parameter could be useful. Also, do you think we
> should
> > > > > enforce
> > > > > > > some
> > > > > > > > >> > > reasonable bounds for the timeout config?
> > > > > > > > >> > >
> > > > > > > > >> > > On the zombie fencing point, the implication was that
> > the
> > > > new
> > > > > > > worker
> > > > > > > > >> > > property would not control the timeout used for the
> call
> > > to
> > > > > > > > >> > > Admin::fenceProducers. However, if we go with a
> timeout
> > > > query
> > > > > > > > >> parameter
> > > > > > > > >> > > approach, even the timeout for the `PUT
> > > > > > > > /connectors/{connector}/fence'
> > > > > > > > >> > > endpoint will remain unaffected.
> > > > > > > > >> > >
> > > > > > > > >> > > Thanks,
> > > > > > > > >> > > Yash
> > > > > > > > >> > >
> > > > > > > > >> > > On Wed, Nov 2, 2022 at 8:13 PM Chris Egerton
> > > > > > > > <[email protected]
> > > > > > > > >> >
> > > > > > > > >> > > wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > > Hi Yash,
> > > > > > > > >> > > >
> > > > > > > > >> > > > Thanks for the KIP. It's a nice, focused change.
> > > > Initially I
> > > > > > was
> > > > > > > > >> > hesitant
> > > > > > > > >> > > > to support cases where connector validation takes
> this
> > > > long,
> > > > > > but
> > > > > > > > >> > > > considering the alternative is that we give users a
> > 500
> > > > > error
> > > > > > > > >> response
> > > > > > > > >> > > but
> > > > > > > > >> > > > leave the request to create/modify the connector
> > queued
> > > up
> > > > > in
> > > > > > > the
> > > > > > > > >> > > herder, I
> > > > > > > > >> > > > think I can get behind the motivation here. There's
> > also
> > > > an
> > > > > > > > >> argument to
> > > > > > > > >> > > be
> > > > > > > > >> > > > made about keeping Kafka Connect available even when
> > the
> > > > > > systems
> > > > > > > > >> that
> > > > > > > > >> > it
> > > > > > > > >> > > > connects to are in a degraded state.
> > > > > > > > >> > > >
> > > > > > > > >> > > > I have a few alternatives I'd be interested in your
> > > > thoughts
> > > > > > on:
> > > > > > > > >> > > >
> > > > > > > > >> > > > 1. Since the primary concern here seems to be that
> > > custom
> > > > > > > > connector
> > > > > > > > >> > > > validation logic can take too long, do we have any
> > > > thoughts
> > > > > on
> > > > > > > > >> adding
> > > > > > > > >> > > logic
> > > > > > > > >> > > > to check for request timeout after validation has
> > > > completed
> > > > > > and,
> > > > > > > > if
> > > > > > > > >> it
> > > > > > > > >> > > has,
> > > > > > > > >> > > > aborting the attempt to create/modify the connector?
> > > > > > > > >> > > >
> > > > > > > > >> > > > 2. Right now it's possible that we'll perform two
> > > > connector
> > > > > > > config
> > > > > > > > >> > > > validations per create/modify request; once on the
> > > worker
> > > > > that
> > > > > > > > >> > initially
> > > > > > > > >> > > > receives the request, and then again if that worker
> is
> > > not
> > > > > the
> > > > > > > > >> leader
> > > > > > > > >> > of
> > > > > > > > >> > > > the cluster and has to forward the request to the
> > > leader.
> > > > > Any
> > > > > > > > >> thoughts
> > > > > > > > >> > on
> > > > > > > > >> > > > optimizing this to only require a single validation
> > per
> > > > > > request?
> > > > > > > > We
> > > > > > > > >> > > > probably wouldn't want to force all validations to
> > take
> > > > > place
> > > > > > on
> > > > > > > > the
> > > > > > > > >> > > leader
> > > > > > > > >> > > > (could lead to overloading it pretty quickly in
> large
> > > > > > clusters),
> > > > > > > > >> but we
> > > > > > > > >> > > > could add an internal-only query parameter to skip
> > > > > validation
> > > > > > > and
> > > > > > > > >> then
> > > > > > > > >> > > use
> > > > > > > > >> > > > that parameter when forwarding requests from
> followers
> > > to
> > > > > the
> > > > > > > > >> leader.
> > > > > > > > >> > > >
> > > > > > > > >> > > > 3. A worker property is pretty coarse-grained, and
> > > > difficult
> > > > > > to
> > > > > > > > >> change.
> > > > > > > > >> > > We
> > > > > > > > >> > > > might allow per-request toggling of the timeout by
> > > adding
> > > > a
> > > > > > URL
> > > > > > > > >> query
> > > > > > > > >> > > > parameter like '?timeout=90s' to the REST API to
> allow
> > > > > > tweaking
> > > > > > > of
> > > > > > > > >> the
> > > > > > > > >> > > > timeout on a more granular basis, and without having
> > to
> > > > > > perform
> > > > > > > a
> > > > > > > > >> > worker
> > > > > > > > >> > > > restart.
> > > > > > > > >> > > >
> > > > > > > > >> > > > I'd also like to clarify a point about the rejected
> > > > > > alternative
> > > > > > > > >> "Allow
> > > > > > > > >> > > > configuring producer zombie fencing admin request
> > > > > timeout"--is
> > > > > > > the
> > > > > > > > >> > > > implication here that the "
> > rest.api.request.timeout.ms"
> > > > > > > property
> > > > > > > > >> will
> > > > > > > > >> > > not
> > > > > > > > >> > > > control the REST timeout for requests to the 'PUT
> > > > > > > > >> > > > /connectors/{connector}/fence' endpoint, or just
> that
> > it
> > > > > won't
> > > > > > > > >> control
> > > > > > > > >> > > the
> > > > > > > > >> > > > timeout that we use for the call to
> > > Admin::fenceProducers?
> > > > > > > > >> > > >
> > > > > > > > >> > > > Cheers,
> > > > > > > > >> > > >
> > > > > > > > >> > > > Chris
> > > > > > > > >> > > >
> > > > > > > > >> > > > On Wed, Nov 2, 2022 at 10:07 AM Yash Mayya <
> > > > > > > [email protected]>
> > > > > > > > >> > wrote:
> > > > > > > > >> > > >
> > > > > > > > >> > > > > Hi all,
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > I'd like to start a discussion thread on this
> small
> > > KIP
> > > > -
> > > > > > > > >> > > > >
> > > > > > > > >> > > > >
> > > > > > > > >> > > >
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://eu01.z.antigena.com/l/8mqVuSEhdZF05LKGk7J1Cb6esUWht6ps4-73ocd196fgyxC~lGhTLqBEt9BPJy1MBx6RzKVi7qRYYpJwCHRixlmdcayj7YMv~-0fIZJ2f8EjEhp-Qck3~O3Ga1JBTgaY4tTAlwh0VjF6217iBaGVG2PNa-2TO5z8zC5s0Nb28kcG2mUg1tcEfzsAtmWEXWRECoeD4cdUXyPvWeFtYL-ccVv-PmQydgSIYkdeR7~XItAXezft9Mfs5pzjejXSK~MhWW
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > It proposes the addition of a new Kafka Connect
> > worker
> > > > > > > > >> configuration
> > > > > > > > >> > to
> > > > > > > > >> > > > > allow configuring REST API request timeouts.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > Thanks,
> > > > > > > > >> > > > > Yash
> > > > > > > > >> > > > >
> > > > > > > > >> > > >
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-882: Make Kafka Connect REST API request timeouts configurable

Reply via email to