[
https://issues.apache.org/jira/browse/KAFKA-15090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Egerton updated KAFKA-15090:
----------------------------------
Description:
Before [https://github.com/apache/kafka/pull/9669], in distributed mode, the
{{SourceTask::stop}} method would be invoked on the herder tick thread, which
is a separate thread from the dedicated thread which was responsible for
polling data from the task and producing it to Kafka.
This aligned with the Javadocs for {{{}SourceTask:poll{}}}, which state:
{quote}The task will be stopped on a separate thread, and when that happens
this method is expected to unblock, quickly finish up any remaining processing,
and return.
{quote}
However, it came with the downside that the herder's tick thread would be
blocked until the invocation of {{SourceTask::stop}} completed, which could
result in major parts of the worker's REST API becoming unavailable and even
the worker falling out of the cluster.
As a result, in [https://github.com/apache/kafka/pull/9669,] we changed the
logic for task shutdown to cause {{SourceTask::stop}} to be invoked on the
dedicated thread for the task (i.e., the one responsible for polling data from
it and producing that data to Kafka).
This altered the semantics for {{SourceTask:poll}} and {{SourceTask::stop}} and
may have broken connectors that block during {{poll}} with the expectation that
{{stop}} can and will be invoked concurrently as a signal that any ongoing
polls should be interrupted immediately.
Although reverting the fix is likely not a viable option (blocking the herder
thread on interactions with user-written plugins is high-risk and we have tried
to eliminate all instances of this where feasible), we may try to restore the
expected contract by spinning up a separate thread exclusively for invoking
{{SourceTask::stop}} separately from the dedicated thread for the task and the
herder's thread.
was:
Before [https://github.com/apache/kafka/pull/9669,] in distributed mode, the
{{SourceTask::stop}} method would be invoked on the herder tick thread, which
is a separate thread from the dedicated thread which was responsible for
polling data from the task and producing it to Kafka.
This aligned with the Javadocs for {{{}SourceTask:poll{}}}, which state:
{quote}The task will be stopped on a separate thread, and when that happens
this method is expected to unblock, quickly finish up any remaining processing,
and return.
{quote}
However, it came with the downside that the herder's tick thread would be
blocked until the invocation of {{SourceTask::stop}} completed, which could
result in major parts of the worker's REST API becoming unavailable and even
the worker falling out of the cluster.
As a result, in [https://github.com/apache/kafka/pull/9669,] we changed the
logic for task shutdown to cause {{SourceTask::stop}} to be invoked on the
dedicated thread for the task (i.e., the one responsible for polling data from
it and producing that data to Kafka).
This altered the semantics for {{SourceTask:poll}} and {{SourceTask::stop}} and
may have broken connectors that block during {{poll}} with the expectation that
{{stop}} can and will be invoked concurrently as a signal that any ongoing
polls should be interrupted immediately.
Although reverting the fix is likely not a viable option (blocking the herder
thread on interactions with user-written plugins is high-risk and we have tried
to eliminate all instances of this where feasible), we may try to restore the
expected contract by spinning up a separate thread exclusively for invoking
{{SourceTask::stop}} separately from the dedicated thread for the task and the
herder's thread.
> Source tasks are no longer stopped on a separate thread
> -------------------------------------------------------
>
> Key: KAFKA-15090
> URL: https://issues.apache.org/jira/browse/KAFKA-15090
> Project: Kafka
> Issue Type: Bug
> Components: KafkaConnect
> Affects Versions: 3.1.0, 3.0.0, 3.0.1, 3.2.0, 3.1.1, 3.3.0, 3.0.2, 3.1.2,
> 3.2.1, 3.4.0, 3.2.2, 3.2.3, 3.3.1, 3.3.2, 3.2.4, 3.1.3, 3.0.3, 3.5.0, 3.4.1,
> 3.3.3, 3.6.0, 3.5.1
> Reporter: Chris Egerton
> Assignee: Chris Egerton
> Priority: Major
>
> Before [https://github.com/apache/kafka/pull/9669], in distributed mode, the
> {{SourceTask::stop}} method would be invoked on the herder tick thread, which
> is a separate thread from the dedicated thread which was responsible for
> polling data from the task and producing it to Kafka.
> This aligned with the Javadocs for {{{}SourceTask:poll{}}}, which state:
> {quote}The task will be stopped on a separate thread, and when that happens
> this method is expected to unblock, quickly finish up any remaining
> processing, and return.
> {quote}
> However, it came with the downside that the herder's tick thread would be
> blocked until the invocation of {{SourceTask::stop}} completed, which could
> result in major parts of the worker's REST API becoming unavailable and even
> the worker falling out of the cluster.
> As a result, in [https://github.com/apache/kafka/pull/9669,] we changed the
> logic for task shutdown to cause {{SourceTask::stop}} to be invoked on the
> dedicated thread for the task (i.e., the one responsible for polling data
> from it and producing that data to Kafka).
> This altered the semantics for {{SourceTask:poll}} and {{SourceTask::stop}}
> and may have broken connectors that block during {{poll}} with the
> expectation that {{stop}} can and will be invoked concurrently as a signal
> that any ongoing polls should be interrupted immediately.
> Although reverting the fix is likely not a viable option (blocking the herder
> thread on interactions with user-written plugins is high-risk and we have
> tried to eliminate all instances of this where feasible), we may try to
> restore the expected contract by spinning up a separate thread exclusively
> for invoking {{SourceTask::stop}} separately from the dedicated thread for
> the task and the herder's thread.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)