meetjain74 commented on issue #8708: URL: https://github.com/apache/storm/issues/8708#issuecomment-4533046821
I can see this in the first commit for IsolationScheduler https://github.com/apache/storm/commit/d44970afeb3f6166b6c93b6ce39e5cb49b61511 by Nathan Marz on Dec 20, 2012, `getAssignableSlots` is used in `host-assignable-slots` ```clj ;; returns list of list of slots, reverse sorted by number of slots (defn- host-assignable-slots [^Cluster cluster] (-<> cluster .getAssignableSlots (group-by #(.getHost cluster (.getNodeId ^WorkerSlot %)) <>) (dissoc <> nil) (sort-by #(-> % second count -) <>) (LinkedList. <>) )) ``` The only valid explanation that seems to me is this: `getAssignableSlots` is intentional because the IsolationScheduler is an eviction-by-design scheduler. When it picks a host for an isolated topology, the very next thing it does is forcibly free every used slot on that host. So "currently free slots" is not a feasibility constraint — what matters is the total non-blacklisted capacity of the host. In IsolationScheduler.schedule(...), after the first pass marks "already correctly-isolated" hosts as blacklisted, it picks a host and then evicts everything on it before assigning isolated workers: ```java if (slot != null && slot.size() >= num) { hss.poll(); cluster.freeSlots(hostUsedSlots.get(hostSlots.getHostName())); for (WorkerSlot tmpSlot : slot.subList(0, num)) { Set<ExecutorDetails> executor = removeElemFromExecutorsSet(executorSet); cluster.assign(tmpSlot, topologyId, executor); } cluster.blacklistHost(hostSlots.getHostName()); } ``` That cluster.freeSlots(hostUsedSlots.get(...)) call is the key: any non-isolated worker on the chosen host is killed to make room. Once we know that's the policy, sorting by currently free slots would actually be misleading — we'd skip a 16-slot host with 2 non-isolated workers in favor of a sparse 8-slot host, even though the 16-slot host is the strictly better placement for an isolated topology that needs 10 workers. > The sort maximizes a single iso-topology landing on as few large machines as possible (which is the whole point of the isolation distribution computed in machineDistribution). But I am still in favor of adding a secondary check on basis of `getAvailableSlots` so as to have less number of evictions on machines with same number of slots. Adding `getAvailableSlots` as a secondary key would deterministically prefer hosts that need fewer evictions, without changing the isolation contract — large hosts still win primarily, we just break ties in favor of less disruption. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
