[GitHub] [kafka] vamossagar12 commented on a diff in pull request #12802: KAFKA-14311: Connect Worker clean shutdown does not cleanly stop connectors/tasks

GitBox Sun, 11 Dec 2022 23:10:39 -0800


vamossagar12 commented on code in PR #12802:
URL: https://github.com/apache/kafka/pull/12802#discussion_r1045259866



##########
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java:
##########
@@ -1645,6 +1646,8 @@ private void startAndStop(Collection<Callable<Void>> 
callables) {
             startAndStopExecutor.invokeAll(callables);
         } catch (InterruptedException e) {
             // ignore
+        } catch (RejectedExecutionException re) {
+            log.error("startAndStopExecutor already shutdown or full. Not 
invoking explicit connector/task shutdown");

Review Comment:
   @C0urante , 20s is a very rough estimate on my part. I looked at the other 
timeouts in this class, the max seemed to be 10s and chose a value which is 
still higher than that. 
   
   TBH, I don't know what's a good value for this config because IMO there's no 
one size fits all value here. I see that for each connector/task callable, the 
graceful shutdown is set to 5s and the `startAndStopExecutor` has a pool size 
of 8. It could depend upon factors like how many connectors/tasks are running 
on the worker for example. And this is when the control enters `halt` method. 
But let's say for some reason, the connector wants to stop when it just enters 
the tick method upon startup. At that point, it might need to read through 
config topic and other things. I am not suggesting that reading the config 
topic is a time consuming operation but the point I am trying to make is that 
it's not deterministic and depends upon the cluster where it's running. I can 
bump it upto a minute but that's still a guesstimate. 
   That's why, I still feel we should have the logic to handle 
`RejectedException` thrown upon stopping along with this high timeout value 
set. That ways, we could cover all scenarios- the most common ones taken care 
by the higher timeout AND for the outliers, we still give the worker the 
opportunity to stop the remaining tasks which couldn't be stopped. There's no 
guarantee that it would still successfully be able to shut everything down, but 
still gets another chance.
   
   Also, your point about user configuring than 
`task.graceful.shutdown.timeout.ms` would be ignored by what I did makes sense. 
I have set the `awaitTermination` timeout to the max between the 2.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [kafka] vamossagar12 commented on a diff in pull request #12802: KAFKA-14311: Connect Worker clean shutdown does not cleanly stop connectors/tasks

Reply via email to