[ https://issues.apache.org/jira/browse/KAFKA-15090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Egerton updated KAFKA-15090: ---------------------------------- Description: Before [https://github.com/apache/kafka/pull/9669], in distributed mode, the {{SourceTask::stop}} method would be invoked on the herder tick thread, which is a separate thread from the dedicated thread which was responsible for polling data from the task and producing it to Kafka. This aligned with the Javadocs for {{{}SourceTask:poll{}}}, which state: {quote}The task will be stopped on a separate thread, and when that happens this method is expected to unblock, quickly finish up any remaining processing, and return. {quote} However, it came with the downside that the herder's tick thread would be blocked until the invocation of {{SourceTask::stop}} completed, which could result in major parts of the worker's REST API becoming unavailable and even the worker falling out of the cluster. As a result, in [https://github.com/apache/kafka/pull/9669|https://github.com/apache/kafka/pull/9669], we changed the logic for task shutdown to cause {{SourceTask::stop}} to be invoked on the dedicated thread for the task (i.e., the one responsible for polling data from it and producing that data to Kafka). This altered the semantics for {{SourceTask:poll}} and {{SourceTask::stop}} and may have broken connectors that block during {{poll}} with the expectation that {{stop}} can and will be invoked concurrently as a signal that any ongoing polls should be interrupted immediately. Although reverting the fix is likely not a viable option (blocking the herder thread on interactions with user-written plugins is high-risk and we have tried to eliminate all instances of this where feasible), we may try to restore the expected contract by spinning up a separate thread exclusively for invoking {{SourceTask::stop}} separately from the dedicated thread for the task and the herder's thread. was: Before [https://github.com/apache/kafka/pull/9669], in distributed mode, the {{SourceTask::stop}} method would be invoked on the herder tick thread, which is a separate thread from the dedicated thread which was responsible for polling data from the task and producing it to Kafka. This aligned with the Javadocs for {{{}SourceTask:poll{}}}, which state: {quote}The task will be stopped on a separate thread, and when that happens this method is expected to unblock, quickly finish up any remaining processing, and return. {quote} However, it came with the downside that the herder's tick thread would be blocked until the invocation of {{SourceTask::stop}} completed, which could result in major parts of the worker's REST API becoming unavailable and even the worker falling out of the cluster. As a result, in [https://github.com/apache/kafka/pull/9669|https://github.com/apache/kafka/pull/9669,], we changed the logic for task shutdown to cause {{SourceTask::stop}} to be invoked on the dedicated thread for the task (i.e., the one responsible for polling data from it and producing that data to Kafka). This altered the semantics for {{SourceTask:poll}} and {{SourceTask::stop}} and may have broken connectors that block during {{poll}} with the expectation that {{stop}} can and will be invoked concurrently as a signal that any ongoing polls should be interrupted immediately. Although reverting the fix is likely not a viable option (blocking the herder thread on interactions with user-written plugins is high-risk and we have tried to eliminate all instances of this where feasible), we may try to restore the expected contract by spinning up a separate thread exclusively for invoking {{SourceTask::stop}} separately from the dedicated thread for the task and the herder's thread. > Source tasks are no longer stopped on a separate thread > ------------------------------------------------------- > > Key: KAFKA-15090 > URL: https://issues.apache.org/jira/browse/KAFKA-15090 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect > Affects Versions: 3.1.0, 3.0.0, 3.0.1, 3.2.0, 3.1.1, 3.3.0, 3.0.2, 3.1.2, > 3.2.1, 3.4.0, 3.2.2, 3.2.3, 3.3.1, 3.3.2, 3.2.4, 3.1.3, 3.0.3, 3.5.0, 3.4.1, > 3.3.3, 3.6.0, 3.5.1 > Reporter: Chris Egerton > Assignee: Chris Egerton > Priority: Major > > Before [https://github.com/apache/kafka/pull/9669], in distributed mode, the > {{SourceTask::stop}} method would be invoked on the herder tick thread, which > is a separate thread from the dedicated thread which was responsible for > polling data from the task and producing it to Kafka. > This aligned with the Javadocs for {{{}SourceTask:poll{}}}, which state: > {quote}The task will be stopped on a separate thread, and when that happens > this method is expected to unblock, quickly finish up any remaining > processing, and return. > {quote} > However, it came with the downside that the herder's tick thread would be > blocked until the invocation of {{SourceTask::stop}} completed, which could > result in major parts of the worker's REST API becoming unavailable and even > the worker falling out of the cluster. > As a result, in > [https://github.com/apache/kafka/pull/9669|https://github.com/apache/kafka/pull/9669], > we changed the logic for task shutdown to cause {{SourceTask::stop}} to be > invoked on the dedicated thread for the task (i.e., the one responsible for > polling data from it and producing that data to Kafka). > This altered the semantics for {{SourceTask:poll}} and {{SourceTask::stop}} > and may have broken connectors that block during {{poll}} with the > expectation that {{stop}} can and will be invoked concurrently as a signal > that any ongoing polls should be interrupted immediately. > Although reverting the fix is likely not a viable option (blocking the herder > thread on interactions with user-written plugins is high-risk and we have > tried to eliminate all instances of this where feasible), we may try to > restore the expected contract by spinning up a separate thread exclusively > for invoking {{SourceTask::stop}} separately from the dedicated thread for > the task and the herder's thread. -- This message was sent by Atlassian Jira (v8.20.10#820010)