C0urante commented on code in PR #13185: URL: https://github.com/apache/kafka/pull/13185#discussion_r1123467255
########## connect/runtime/src/main/java/org/apache/kafka/connect/runtime/standalone/StandaloneHerder.java: ########## @@ -243,7 +248,11 @@ public synchronized void requestTaskReconfiguration(String connName) { log.error("Task that requested reconfiguration does not exist: {}", connName); return; } - updateConnectorTasks(connName); + try { + updateConnectorTasks(connName); + } catch (Exception e) { + log.error("Unable to generate task configs for {}", connName, e); + } Review Comment: Okay, a lot to unpack here! The more I think about it, the more I like the existing behavior for handling failures in task config generation. We automatically retry in distributed mode in order to absorb the risk of writing to the config topic or issuing a REST request to the leader, but since neither of those take place in standalone mode, it's fine to just throw the exception back to the caller (either a connector invoking `ConnectorContext::requestTaskReconfiguration`, or a REST API call to restart the connector) since the likeliest cause is a failed call to `Connector::taskConfigs` and automatic retries are less likely to be useful. I think we should basically just preserve existing behavior here, with the one exception of fixing how we handle failed calls to `requestTaskReconfiguration` that occur during a call to `restartConnector`. Right now we don't handle any of those and, IIUC, just cause the REST request to time out after 90 seconds. Instead of timing out, we should return a 500 response in that case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org