[
https://issues.apache.org/jira/browse/KAFKA-9051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978074#comment-16978074
]
ASF GitHub Bot commented on KAFKA-9051:
---------------------------------------
rhauch commented on pull request #7532: KAFKA-9051: Prematurely complete source
offset read requests for stopped tasks
URL: https://github.com/apache/kafka/pull/7532
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Source task source offset reads can block graceful shutdown
> -----------------------------------------------------------
>
> Key: KAFKA-9051
> URL: https://issues.apache.org/jira/browse/KAFKA-9051
> Project: Kafka
> Issue Type: Bug
> Components: KafkaConnect
> Affects Versions: 1.0.2, 1.1.1, 2.0.1, 2.1.1, 2.3.0, 2.2.1, 2.4.0, 2.5.0
> Reporter: Chris Egerton
> Assignee: Chris Egerton
> Priority: Major
>
> When source tasks request source offsets from the framework, this results in
> a call to
> [Future.get()|https://github.com/apache/kafka/blob/8966d066bd2f80c6d8f270423e7e9982097f97b9/connect/runtime/src/main/java/org/apache/kafka/connect/storage/OffsetStorageReaderImpl.java#L79]
> with no timeout. In distributed workers, the future is blocked on a
> successful [read to the
> end|https://github.com/apache/kafka/blob/8966d066bd2f80c6d8f270423e7e9982097f97b9/connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaOffsetBackingStore.java#L136]
> of the source offsets topic, which in turn will [poll that topic
> indefinitely|https://github.com/apache/kafka/blob/8966d066bd2f80c6d8f270423e7e9982097f97b9/connect/runtime/src/main/java/org/apache/kafka/connect/util/KafkaBasedLog.java#L287]
> until the latest messages for every partition of that topic have been
> consumed.
> This normally completes in a reasonable amount of time. However, if the
> connectivity between the Connect worker and the Kafka cluster is degraded or
> dropped in the middle of one of these reads, it will block until connectivity
> is restored and the request completes successfully.
> If a task is stopped (due to a manual restart via the REST API, a rebalance,
> worker shutdown, etc.) while blocked on a read of source offsets during its
> {{start}} method, not only will it fail to gracefully stop, but the framework
> [will not even invoke its stop
> method|https://github.com/apache/kafka/blob/8966d066bd2f80c6d8f270423e7e9982097f97b9/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L183]
> until its {{start}} method (and, as a result, the source offset read
> request) [has
> completed|https://github.com/apache/kafka/blob/8966d066bd2f80c6d8f270423e7e9982097f97b9/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java#L202-L206].
> This prevents the task from being able to clean up any resources it has
> allocated and can lead to OOM errors, excessive thread creation, and other
> problems.
>
> I've confirmed that this affects every release of Connect back through 1.0 at
> least; I've tagged the most recent bug fix release of every major/minor
> version from then on in the {{Affects Version/s}} field to avoid just putting
> every version in that field.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)