vamossagar12 commented on code in PR #12802: URL: https://github.com/apache/kafka/pull/12802#discussion_r1045259866
########## connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java: ########## @@ -1645,6 +1646,8 @@ private void startAndStop(Collection<Callable<Void>> callables) { startAndStopExecutor.invokeAll(callables); } catch (InterruptedException e) { // ignore + } catch (RejectedExecutionException re) { + log.error("startAndStopExecutor already shutdown or full. Not invoking explicit connector/task shutdown"); Review Comment: @C0urante , 20s is a very rough estimate on my part. I looked at the other timeouts in this class, the max seemed to be 10s and chose a value which is still higher than that. TBH, I don't know what's a good value for this config because IMO there's no one size fits all value here. I see that for each connector/task callable, the graceful shutdown is set to 5s and the `startAndStopExecutor` has a pool size of 8. It could depend upon factors like how many connectors/tasks are running on the worker for example. And this is when the control enters `halt` method. But let's say for some reason, the connector wants to stop when it just enters the tick method upon startup. At that point, it might need to read through config topic and other things. I am not suggesting that reading the config topic is a time consuming operation but the point I am trying to make is that it's not deterministic and depends upon the cluster where it's running. I can bump it upto a minute but that's still a guesstimate. That's why, I still feel we should have the logic to handle `RejectedException` thrown upon stopping along with this high timeout value set. That ways, we could cover all scenarios- the most common ones taken care by the higher timeout AND for the outliers, we still give the worker the opportunity to stop the remaining tasks which couldn't be stopped. There's no guarantee that it would still successfully be able to shut everything down, but still gets another chance. Also, your point about user configuring than `task.graceful.shutdown.timeout.ms` would be ignored by what I did makes sense. I have set the `awaitTermination` timeout to the max between the 2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org