Re: [PR] RTR, HRTR: Fix incorrect maxLazyWorkers check in markLazyWorkers. (druid)

via GitHub Fri, 07 Jul 2023 03:53:14 -0700


kfaraz commented on code in PR #14545:
URL: https://github.com/apache/druid/pull/14545#discussion_r1255641455



##########
indexing-service/src/main/java/org/apache/druid/indexing/overlord/RemoteTaskRunner.java:
##########
@@ -1401,29 +1401,31 @@ public Collection<Worker> 
markWorkersLazy(Predicate<ImmutableWorkerInfo> isLazyW
   {
     // skip the lock and bail early if we should not mark any workers lazy 
(e.g. number
     // of current workers is at or below the minNumWorkers of autoscaler 
config)
-    if (maxLazyWorkers < 1) {
-      return Collections.emptyList();
+    if (lazyWorkers.size() >= maxLazyWorkers) {
+      return getLazyWorkers();

Review Comment:
   Hmm, I see. I guess it makes sense to kill it then. 
   
   There is however `HttpRemoteTaskRunner.syncMonitoring` where we reset 
workers which seem to be acting weird. That flow does remove items from the 
`lazy` list, to do another fresh retry of syncing to the worker. In the absence 
of the reset logic, the lazy flow seems a little self-fulfilling and would lead 
to eager terminations. If we mark a worker as lazy, and never assign it 
anything, it will always be lazy (going by the logic in 
`ProvisioningUtil.createLazyWorkerPredicate(config)` as an example).
   
   Would it be better to have a time out after which we retry submitting tasks 
to the worker, and only after a few repeated retries, we truly mark the worker 
for termination? 
   So `lazy` becomes more of a temporary state and a repeatedly lazy servers 
finally gets black-listed.
   
   (`HttpRemoteTaskRunner` also already has a list of `blacklistedWorkers` 
which seems to be doing its own thing. 😅).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] RTR, HRTR: Fix incorrect maxLazyWorkers check in markLazyWorkers. (druid)

Reply via email to