[ 
https://issues.apache.org/jira/browse/KAFKA-15090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Egerton updated KAFKA-15090:
----------------------------------
    Description: 
Before [https://github.com/apache/kafka/pull/9669], in distributed mode, the 
{{SourceTask::stop}} method would be invoked on the herder tick thread, which 
is a separate thread from the dedicated thread which was responsible for 
polling data from the task and producing it to Kafka.

This aligned with the Javadocs for {{{}SourceTask:poll{}}}, which state:
{quote}The task will be stopped on a separate thread, and when that happens 
this method is expected to unblock, quickly finish up any remaining processing, 
and return.
{quote}
However, it came with the downside that the herder's tick thread would be 
blocked until the invocation of {{SourceTask::stop}} completed, which could 
result in major parts of the worker's REST API becoming unavailable and even 
the worker falling out of the cluster.

As a result, in 
[https://github.com/apache/kafka/pull/9669|https://github.com/apache/kafka/pull/9669],
 we changed the logic for task shutdown to cause {{SourceTask::stop}} to be 
invoked on the dedicated thread for the task (i.e., the one responsible for 
polling data from it and producing that data to Kafka).

This altered the semantics for {{SourceTask:poll}} and {{SourceTask::stop}} and 
may have broken connectors that block during {{poll}} with the expectation that 
{{stop}} can and will be invoked concurrently as a signal that any ongoing 
polls should be interrupted immediately.

Although reverting the fix is likely not a viable option (blocking the herder 
thread on interactions with user-written plugins is high-risk and we have tried 
to eliminate all instances of this where feasible), we may try to restore the 
expected contract by spinning up a separate thread exclusively for invoking 
{{SourceTask::stop}} separately from the dedicated thread for the task and the 
herder's thread.

  was:
Before [https://github.com/apache/kafka/pull/9669], in distributed mode, the 
{{SourceTask::stop}} method would be invoked on the herder tick thread, which 
is a separate thread from the dedicated thread which was responsible for 
polling data from the task and producing it to Kafka.

This aligned with the Javadocs for {{{}SourceTask:poll{}}}, which state:
{quote}The task will be stopped on a separate thread, and when that happens 
this method is expected to unblock, quickly finish up any remaining processing, 
and return.
{quote}
However, it came with the downside that the herder's tick thread would be 
blocked until the invocation of {{SourceTask::stop}} completed, which could 
result in major parts of the worker's REST API becoming unavailable and even 
the worker falling out of the cluster.

As a result, in 
[https://github.com/apache/kafka/pull/9669|https://github.com/apache/kafka/pull/9669,],
 we changed the logic for task shutdown to cause {{SourceTask::stop}} to be 
invoked on the dedicated thread for the task (i.e., the one responsible for 
polling data from it and producing that data to Kafka).

This altered the semantics for {{SourceTask:poll}} and {{SourceTask::stop}} and 
may have broken connectors that block during {{poll}} with the expectation that 
{{stop}} can and will be invoked concurrently as a signal that any ongoing 
polls should be interrupted immediately.

Although reverting the fix is likely not a viable option (blocking the herder 
thread on interactions with user-written plugins is high-risk and we have tried 
to eliminate all instances of this where feasible), we may try to restore the 
expected contract by spinning up a separate thread exclusively for invoking 
{{SourceTask::stop}} separately from the dedicated thread for the task and the 
herder's thread.


> Source tasks are no longer stopped on a separate thread
> -------------------------------------------------------
>
>                 Key: KAFKA-15090
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15090
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>    Affects Versions: 3.1.0, 3.0.0, 3.0.1, 3.2.0, 3.1.1, 3.3.0, 3.0.2, 3.1.2, 
> 3.2.1, 3.4.0, 3.2.2, 3.2.3, 3.3.1, 3.3.2, 3.2.4, 3.1.3, 3.0.3, 3.5.0, 3.4.1, 
> 3.3.3, 3.6.0, 3.5.1
>            Reporter: Chris Egerton
>            Assignee: Chris Egerton
>            Priority: Major
>
> Before [https://github.com/apache/kafka/pull/9669], in distributed mode, the 
> {{SourceTask::stop}} method would be invoked on the herder tick thread, which 
> is a separate thread from the dedicated thread which was responsible for 
> polling data from the task and producing it to Kafka.
> This aligned with the Javadocs for {{{}SourceTask:poll{}}}, which state:
> {quote}The task will be stopped on a separate thread, and when that happens 
> this method is expected to unblock, quickly finish up any remaining 
> processing, and return.
> {quote}
> However, it came with the downside that the herder's tick thread would be 
> blocked until the invocation of {{SourceTask::stop}} completed, which could 
> result in major parts of the worker's REST API becoming unavailable and even 
> the worker falling out of the cluster.
> As a result, in 
> [https://github.com/apache/kafka/pull/9669|https://github.com/apache/kafka/pull/9669],
>  we changed the logic for task shutdown to cause {{SourceTask::stop}} to be 
> invoked on the dedicated thread for the task (i.e., the one responsible for 
> polling data from it and producing that data to Kafka).
> This altered the semantics for {{SourceTask:poll}} and {{SourceTask::stop}} 
> and may have broken connectors that block during {{poll}} with the 
> expectation that {{stop}} can and will be invoked concurrently as a signal 
> that any ongoing polls should be interrupted immediately.
> Although reverting the fix is likely not a viable option (blocking the herder 
> thread on interactions with user-written plugins is high-risk and we have 
> tried to eliminate all instances of this where feasible), we may try to 
> restore the expected contract by spinning up a separate thread exclusively 
> for invoking {{SourceTask::stop}} separately from the dedicated thread for 
> the task and the herder's thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to