[
https://issues.apache.org/jira/browse/KAFKA-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Egerton updated KAFKA-9374:
---------------------------------
Description:
If a connector hangs during any of its {{initialize}}, {{start}}, {{stop}},
\{taskConfigs}}, {{taskClass}}, {{version}}, {{config}}, or {{validate}}
methods, the worker will be disabled for some types of requests thereafter,
including connector creation, connector reconfiguration, and connector deletion.
-This only occurs in distributed mode and is due to the threading model used
by the
[DistributedHerder|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java]
class.- This affects both distributed and standalone mode. Distributed herders
perform some connector work synchronously in their {{tick}} thread, which also
handles group membership and some REST requests. The majority of the herder
methods for the standalone herder are {{synchronized}}, including those for
creating, updating, and deleting connectors; as long as one of those methods
blocks, all subsequent calls to any of these methods will also be blocked.
One potential solution could be to treat connectors that fail to start, stop,
etc. in time similarly to tasks that fail to stop within the [task graceful
shutdown timeout
period|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConfig.java#L121-L126]
by handling all connector interactions on a separate thread, waiting for them
to complete within a timeout, and abandoning the thread (and transitioning the
connector to the {{FAILED}} state, if it has been created at all) if that
timeout expires.
was:
If a connector hangs during any of its {{initialize}}, {{start}}, {{stop}},
\{taskConfigs}}, {{taskClass}}, {{version}}, {{config}}, or {{validate}}
methods, the worker will be disabled for some types of requests thereafter,
including connector creation, connector reconfiguration, and connector deletion.
This only occurs in distributed mode and is due to the threading model used by
the
[DistributedHerder|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java]
class.
One potential solution could be to treat connectors that fail to start, stop,
etc. in time similarly to tasks that fail to stop within the [task graceful
shutdown timeout
period|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConfig.java#L121-L126]
by handling all connector interactions on a separate thread, waiting for them
to complete within a timeout, and abandoning the thread (and transitioning the
connector to the {{FAILED}} state, if it has been created at all) if that
timeout expires.
> Worker can be disabled by blocked connectors
> --------------------------------------------
>
> Key: KAFKA-9374
> URL: https://issues.apache.org/jira/browse/KAFKA-9374
> Project: Kafka
> Issue Type: Bug
> Components: KafkaConnect
> Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, 2.0.0, 2.0.1, 2.1.0,
> 2.2.0, 2.1.1, 2.3.0, 2.2.1, 2.2.2, 2.4.0, 2.3.1
> Reporter: Chris Egerton
> Assignee: Chris Egerton
> Priority: Major
>
> If a connector hangs during any of its {{initialize}}, {{start}}, {{stop}},
> \{taskConfigs}}, {{taskClass}}, {{version}}, {{config}}, or {{validate}}
> methods, the worker will be disabled for some types of requests thereafter,
> including connector creation, connector reconfiguration, and connector
> deletion.
> -This only occurs in distributed mode and is due to the threading model used
> by the
> [DistributedHerder|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java]
> class.- This affects both distributed and standalone mode. Distributed
> herders perform some connector work synchronously in their {{tick}} thread,
> which also handles group membership and some REST requests. The majority of
> the herder methods for the standalone herder are {{synchronized}}, including
> those for creating, updating, and deleting connectors; as long as one of
> those methods blocks, all subsequent calls to any of these methods will also
> be blocked.
>
> One potential solution could be to treat connectors that fail to start, stop,
> etc. in time similarly to tasks that fail to stop within the [task graceful
> shutdown timeout
> period|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConfig.java#L121-L126]
> by handling all connector interactions on a separate thread, waiting for
> them to complete within a timeout, and abandoning the thread (and
> transitioning the connector to the {{FAILED}} state, if it has been created
> at all) if that timeout expires.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)