meetjain74 commented on issue #8708:
URL: https://github.com/apache/storm/issues/8708#issuecomment-4533046821

   I can see this in the first commit for IsolationScheduler 
https://github.com/apache/storm/commit/d44970afeb3f6166b6c93b6ce39e5cb49b61511 
by Nathan Marz on Dec 20, 2012, `getAssignableSlots` is used in 
`host-assignable-slots`
   
   ```clj
   ;; returns list of list of slots, reverse sorted by number of slots
   (defn- host-assignable-slots [^Cluster cluster]
     (-<> cluster
          .getAssignableSlots
          (group-by #(.getHost cluster (.getNodeId ^WorkerSlot %)) <>)
          (dissoc <> nil)
          (sort-by #(-> % second count -) <>)
          (LinkedList. <>)
          ))
   ```
   
   The only valid explanation that seems to me is this:
   
    
   `getAssignableSlots` is intentional because the IsolationScheduler is an 
eviction-by-design scheduler. When it picks a host for an isolated topology, 
the very next thing it does is forcibly free every used slot on that host. So 
"currently free slots" is not a feasibility constraint — what matters is the 
total non-blacklisted capacity of the host.
   
   In IsolationScheduler.schedule(...), after the first pass marks "already 
correctly-isolated" hosts as blacklisted, it picks a host and then evicts 
everything on it before assigning isolated workers:
   ```java
   if (slot != null && slot.size() >= num) {
       hss.poll();
       cluster.freeSlots(hostUsedSlots.get(hostSlots.getHostName()));
       for (WorkerSlot tmpSlot : slot.subList(0, num)) {
           Set<ExecutorDetails> executor = 
removeElemFromExecutorsSet(executorSet);
           cluster.assign(tmpSlot, topologyId, executor);
       }
       cluster.blacklistHost(hostSlots.getHostName());
   }
   ```
   That cluster.freeSlots(hostUsedSlots.get(...)) call is the key: any 
non-isolated worker on the chosen host is killed to make room. Once we know 
that's the policy, sorting by currently free slots would actually be misleading 
— we'd skip a 16-slot host with 2 non-isolated workers in favor of a sparse 
8-slot host, even though the 16-slot host is the strictly better placement for 
an isolated topology that needs 10 workers.
   
   > The sort maximizes a single iso-topology landing on as few large machines 
as possible (which is the whole point of the isolation distribution computed in 
machineDistribution).
   
   But I am still in favor of adding a secondary check on basis of 
`getAvailableSlots` so as to have less number of evictions on machines with 
same number of slots. 
   Adding `getAvailableSlots` as a secondary key would deterministically prefer 
hosts that need fewer evictions, without changing the isolation contract — 
large hosts still win primarily, we just break ties in favor of less disruption.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to