imply-cheddar commented on code in PR #14239:
URL: https://github.com/apache/druid/pull/14239#discussion_r1192105597
##########
indexing-service/src/main/java/org/apache/druid/indexing/overlord/ForkingTaskRunner.java:
##########
@@ -477,6 +429,8 @@ public TaskStatus call()
portFinder.markPortUnused(tlsChildPort);
}
+ getTracker().returnStorageSlot(storageSlot);
Review Comment:
Hrm, actually, I think I got the order wrong in BaseRestorableTaskRunner.
If it gets not-returned, because the files are still there, then eventually the
MM will just fill up with zombie tasks and then be unschedulable, it'll be
really hard to figure out that it got in that state. But if it returns even if
it cannot delete, then we will maybe, possibly, get a disk full error telling
us that there are disk usage issues rather than zombies and jobs not running.
Fwiw, here in ForkingTaskRunner, I picked the location to be after the ports
are returned (which is what is currently used as an indication of whether there
are tasks available, and those ports are returned before the task dir is
deleted)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]