Thanks for the KIP, Chris. This will be a nice improvement on top of the
work you're already doing in
https://issues.apache.org/jira/browse/KAFKA-9374

Overall I like the direction this is going, but I do have some
comments/questions/suggestions, all of which are pretty minor except for
the last three that deal with the proposed new worker config property:

1. Can you clarify in the last sentence of the Background section which
resources may not be cleaned up. IIUC, these are connector-specific
resources and are not worker resources. Is that true?
2. The KIP proposes adding two new MBean names for the connector-specific
metrics (e.g.,
"kafka.connect:type=abandoned-connector-metrics,connector=([-.\w]+)") and
task-specific metrics (e.g.,
"kafka.connect:type=abandoned-task-metrics,connector=([-.\w]+),task=([\d]+)").
Did you consider adding these to the existing MBean names
"kafka.connect:type=connector-metrics,connector=..." and
"kafka.connect:type=connector-task-metrics,connector=...", respectively?
What is the benefit of having these in separate MBeans? In fact, I wonder
if having them in the existing MBeans and naming the metrics similarly to
the worker's new metrics would help correlate them.
3. The new metrics often include the phrase "instances of this
connector/task that the worker has abandoned". Are these really connector
or task instances, or are they requests? Perhaps they are effectively the
same thing, though might it ever be the case that Connect attempts to clean
up connector and task instances, in which case the subsequent
`Connector.stop()` for example might also block.
4. The KIP proposes adding the new configuration property "
connector.shutdown.graceful.timeout.ms", but doesn't say what the
importance is. I presume it would be LOW just like the existing task
timeout, but it'd be good to specify this. It might also be worthwhile to
mention that the default will match the current default of the task timeout.
5. The section about the new configuration property says: "Additionally,
the task.shutdown.graceful.timeout.ms property will be deprecated and
removed in a later release." We will deprecate it in some future release
and then possibly remove it in a subsequent major release. Also, it is
possible that we never actually remove the old configuration, so perhaps
this sentence could be a bit more precise: "Additionally, the
task.shutdown.graceful.timeout.ms property will be deprecated in a later
release."
6. I presume the "connector.shutdown.graceful.timeout.ms" property is being
proposed because the existing "task.shutdown.graceful.timeout.ms" property
is targeted only to tasks. While coopting the existing task-based property
to use for connectors may not be desirable, wouldn't using the new
connector-based property and deprecating the old task-based property cause
a similar mismatch? If so, why go through the hassle of the config changes
in the first place? (Even if you disagree that there is no mismatch and
it's worth the switch, using the existing task-based timeout property for
task and connectors should be a rejected alternative.)
7. Now, I actually think we will prefer to *not* remove many deprecated
methods/configs/classes because doing so will artificially limit the number
of Connect runtime versions into which a particular connector version could
be installed. Given that, and given #6 above, does that change the calculus
in the second rejected alternative about keeping both the task- and
connector-based timeout property? I actually think it might.
8. How much value does the sequence number in the logging context bring? Is
this really worth the risk given the backward compatibility concerns?

I've added this to the AK 2.6.0 release plan (
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=152113430).
If we can't wrap up this KIP and approve/merge a corresponding PR within
the 2.6 timeframe, we can move this KIP to the "postponed" section of the
2.6.

Best regards,

Randall

On Fri, May 8, 2020 at 1:58 PM Christopher Egerton <[email protected]>
wrote:

> Hi all,
>
> I've introduced a KIP to improve how the Connect framework handles
> abandoned connectors and tasks:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-611%3A+Improved+Handling+of+Abandoned+Connectors+and+Tasks
>
> The improvements currently being proposed include new JMX metrics, a slight
> shuffling of worker configuration properties, and some logging additions.
>
> Would like to hear your thoughts on this and see if we can possibly squeeze
> it in in time for the upcoming 2.6 release :)
>
> Cheers,
>
> Chris
>

Reply via email to