[ 
https://issues.apache.org/jira/browse/KAFKA-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Egerton updated KAFKA-9374:
---------------------------------
    Description: 
If a connector hangs during any of its {{initialize}}, {{start}}, {{stop}}, 
\{taskConfigs}}, {{taskClass}}, {{version}}, {{config}}, or {{validate}} 
methods, the worker will be disabled for some types of requests thereafter, 
including connector creation, connector reconfiguration, and connector deletion.
 -This only occurs in distributed mode and is due to the threading model used 
by the 
[DistributedHerder|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java]
 class.- This affects both distributed and standalone mode. Distributed herders 
perform some connector work synchronously in their {{tick}} thread, which also 
handles group membership and some REST requests. The majority of the herder 
methods for the standalone herder are {{synchronized}}, including those for 
creating, updating, and deleting connectors; as long as one of those methods 
blocks, all subsequent calls to any of these methods will also be blocked.

 

One potential solution could be to treat connectors that fail to start, stop, 
etc. in time similarly to tasks that fail to stop within the [task graceful 
shutdown timeout 
period|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConfig.java#L121-L126]
 by handling all connector interactions on a separate thread, waiting for them 
to complete within a timeout, and abandoning the thread (and transitioning the 
connector to the {{FAILED}} state, if it has been created at all) if that 
timeout expires.

  was:
If a connector hangs during any of its {{initialize}}, {{start}}, {{stop}}, 
\{taskConfigs}}, {{taskClass}}, {{version}}, {{config}}, or {{validate}} 
methods, the worker will be disabled for some types of requests thereafter, 
including connector creation, connector reconfiguration, and connector deletion.
 This only occurs in distributed mode and is due to the threading model used by 
the 
[DistributedHerder|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java]
 class.

 

One potential solution could be to treat connectors that fail to start, stop, 
etc. in time similarly to tasks that fail to stop within the [task graceful 
shutdown timeout 
period|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConfig.java#L121-L126]
 by handling all connector interactions on a separate thread, waiting for them 
to complete within a timeout, and abandoning the thread (and transitioning the 
connector to the {{FAILED}} state, if it has been created at all) if that 
timeout expires.


> Worker can be disabled by blocked connectors
> --------------------------------------------
>
>                 Key: KAFKA-9374
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9374
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>    Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, 2.0.0, 2.0.1, 2.1.0, 
> 2.2.0, 2.1.1, 2.3.0, 2.2.1, 2.2.2, 2.4.0, 2.3.1
>            Reporter: Chris Egerton
>            Assignee: Chris Egerton
>            Priority: Major
>
> If a connector hangs during any of its {{initialize}}, {{start}}, {{stop}}, 
> \{taskConfigs}}, {{taskClass}}, {{version}}, {{config}}, or {{validate}} 
> methods, the worker will be disabled for some types of requests thereafter, 
> including connector creation, connector reconfiguration, and connector 
> deletion.
>  -This only occurs in distributed mode and is due to the threading model used 
> by the 
> [DistributedHerder|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java]
>  class.- This affects both distributed and standalone mode. Distributed 
> herders perform some connector work synchronously in their {{tick}} thread, 
> which also handles group membership and some REST requests. The majority of 
> the herder methods for the standalone herder are {{synchronized}}, including 
> those for creating, updating, and deleting connectors; as long as one of 
> those methods blocks, all subsequent calls to any of these methods will also 
> be blocked.
>  
> One potential solution could be to treat connectors that fail to start, stop, 
> etc. in time similarly to tasks that fail to stop within the [task graceful 
> shutdown timeout 
> period|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConfig.java#L121-L126]
>  by handling all connector interactions on a separate thread, waiting for 
> them to complete within a timeout, and abandoning the thread (and 
> transitioning the connector to the {{FAILED}} state, if it has been created 
> at all) if that timeout expires.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to