Re: [DISCUSS] KIP-745: Connect API to restart connector and tasks

Konstantine Karantasis Mon, 07 Jun 2021 12:31:53 -0700

Thanks Randall. I reply inline as well.


On Sat, Jun 5, 2021 at 1:57 PM Randall Hauch <rha...@apache.org> wrote:

> Thanks for the review and feedback, Konstantine. I think I've addressed all
> of your concerns below.
>
> On Fri, Jun 4, 2021 at 7:55 PM Konstantine Karantasis
> <konstant...@confluent.io.invalid> wrote:
>
> > Thanks for the KIP Randall!
> > Really happy to see this feature coming along finally. Will indeed
> resolve
> > an unintuitive aspect of Connect's REST API.
> >
> > I'm in favor of the current approach overall. Just a few comments below
> > that could use a brief discussion:
> >
> > 1) Is there a specific reason why you choose the config topic to persist
> > the restart requests?
> > I know it doesn't really matter which internal topic is used for
> backwards
> > compatibility, because the worker ignores records that does not
> understand.
> >
>
> I think there's value in the restart request record being ordered along
> with the other config topic records, including configuration records,
> target state change records, and session key updates. This also aligns with
> how the distributed herder will process the restart requests within its
> `tick()` method, again making sure there are no ongoing starts and stops
> from rebalances or new connectors while the restarts are taking place. Plus
> there is already a built-in config topic listener mechanism in the
> distributed herder.
>
> I think those are good reasons to use the config topic. But there also is
> not a viable alternative: the status topic has no existing listener
> mechanism in the StatusBackingStore, and the offset topic is clearly not
> applicable.
>

These are good points that in practice make the config topic preferrable.
Even though some sound more like implementation details, I think saying
explicitly why the config topic gives us a good tradeoff (you mostly say
it) and why the status topic is not preferred (some info is missing here,
especially regarding the listener) is valuable.
Also a bit of a nit but mentioning the "tick()" method might not be clear
for the uninitiated with the internals of the Connect framework.
Calling this the main thread loop of the workers might be simpler.


>
> > But are there any implications upon worker restart? I read you
> specifically
> > mention that a worker will read restart requests upon its own restart.
> But
> > is this really desirable?
> >
>
> It will read them, just like all of the config records, target state change
> records, and session key updates. However, upon worker restart the herder
> will ignore any restart requests, just like it ignores session key update
> and task state change records.
>
>
> > The config topic offers a way to persist configurations of the latest
> > running connectors and upon cluster restart to bring these connectors
> back
> > up.
> > However, I'm not sure we want to persist recent restart attempts, the
> same
> > way we persist configurations. Additionally configurations might get
> > additional backups in order to allow for the config topic to be recovered
> > for disaster recovery reasons. To that respect, I'm not sure that the
> > restart requests require the same guarantees.
> >
>
> This is not the first or only non-config record written to this topic, so
> there's already precedent.
>

I acknowledge the precedent but we might need to draw a line at some point
and make sure we maintain more permanent information in the config topic
and more transient in the status topic, modulo any other concerns or
trade-offs.
As discussed above, I agree that the config topic is a better option here.


>
> > Given the above, would the status topic offer a better alternative in
> order
> > to broadcast the restart requests across the workers? Haven't looked
> > closely what would be the implications of writing these new records to
> the
> > status topic, but this topic tends to be more transient.
> > Or is it that you've preferred the config topic for the workers to be
> aware
> > of the interleaving between restart requests and requests for
> > reconfigurations? If that's the case an explicit call out might be
> > worthwhile in the KIP itself.
> >
>
> I guess I thought I already tried to do that. But I can add a bit more
> detail to the KIP to call this out even more.
>

+1. Indeed there's some explanation on that already. Maybe a subsection
header and a juxtaposition with the status topic could help to highlight
the reason behind this selection in a more clear way.


>
>
> >
> > 2) You mention in the KIP the status STOPPED. But this is not currently
> > part of the task or connector states that get persisted to the status
> > topic, as these are defined in AbstractStatus#State enum. Should we
> remove
> > any reference to a STOPPED state to avoid confusion? The only new state
> > seems to be the RESTARTING state.
> >
>
> Good catch. Removed references of the STOPPED state.
>
> And a few minor comments.
> > a) The sentence "We need to keep the existing behavior of the method to
> > just restart the Connector object must be retained for backward
> > compatibility," doesn't read well for me.
> >
>
> Reworded.
>
>
> > b) Subsection title: "Use REST API for Worker-to-Worker Communication
> and"
> > is possibly an unfinished sentence.
> >
>
> Ah, that was just a typo. Fixed.
>
>
> > c) In the paragraph right below, the last sentence "especially when the
> > number of workers " might also be missing something.
> >
>
> Fixed.
>


Thanks for the replies.
+1 overall.

Konstantine





> >
> > Best,
> > Konstantine
> >
> > On Thu, Jun 3, 2021 at 8:42 AM Kalpesh Patel <kpa...@confluent.io.invalid
> >
> > wrote:
> >
> > > Agreed, this would be a great enhancement.
> > >
> > > On Thu, Jun 3, 2021 at 9:26 AM Nikita Kretov <kretov...@gmail.com>
> > wrote:
> > >
> > > > Hello! It's really not intuitive that you need to restart connect and
> > > > tasks separately! I'd like to see this KIP landed in 3.0.0 release!
> > > >
> > > > On 6/2/21 7:40 PM, Randall Hauch wrote:
> > > > > Hello all,
> > > > >
> > > > > Many users struggle with the connector restart REST API only
> > restarting
> > > > the
> > > > > Connector instance rather than everything they associated with a
> > > "named"
> > > > > connector. I've created a KIP that attempts to solve this problem
> via
> > > new
> > > > > query parameters on an existing REST API method, though by default
> it
> > > > > remains backward compatible with older behavior:
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-745%3A+Connect+API+to+restart+connector+and+tasks
> > > > >
> > > > > Please take a look and respond here with questions. I'd love to get
> > > this
> > > > > into AK 3.0.0, and the KIP freeze is coming up soon.
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Randall
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-745: Connect API to restart connector and tasks

Reply via email to