Chris Egerton created KAFKA-14858:
-------------------------------------

             Summary: Standalone herder does not handle exceptions thrown from 
connector taskConfigs method
                 Key: KAFKA-14858
                 URL: https://issues.apache.org/jira/browse/KAFKA-14858
             Project: Kafka
          Issue Type: Bug
          Components: KafkaConnect
            Reporter: Chris Egerton


In distributed mode, if a connector throws an exception from its 
{{taskConfigs}} method (invoked by the herder, through the {{Worker}} class, 
[here|https://github.com/apache/kafka/blob/f3e4dd922933bf28b2c091e846cbc4e5255dd1d5/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1960]),
 we wait for an exponential backoff period (see KAFKA-14732) and then [retry 
the 
operation|https://github.com/apache/kafka/blob/f3e4dd922933bf28b2c091e846cbc4e5255dd1d5/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1907-L1911].

However, in standalone mode, not only do we not retry the operation, we do not 
even log the exception. In addition, when REST calls are made that require 
generating new task configs for a connector (which include creating and 
reconfiguring a connector), if the connector's {{taskConfigs}} method throws an 
exception, those requests will time out since the 
[callback|https://github.com/apache/kafka/blob/f3e4dd922933bf28b2c091e846cbc4e5255dd1d5/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/standalone/StandaloneHerder.java#L183]
 we use to respond to those requests never gets invoked.

At a bare minimum, we should:
 * Log any exceptions thrown from the {{taskConfigs}} method at {{ERROR}} level
 * Invoke any callbacks passed in to the relevant {{StandaloneHerder}} methods 
with any exceptions thrown by the {{taskConfigs}} method

We might also consider introducing the same kind of exponential backoff retry 
logic used by distributed mode, but this can be addressed separately since it 
would be a much larger change in behavior and may break existing user's 
deployments.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to