[
https://issues.apache.org/jira/browse/STORM-132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095787#comment-14095787
]
ASF GitHub Bot commented on STORM-132:
--------------------------------------
Github user d2r commented on the pull request:
https://github.com/apache/incubator-storm/pull/36#issuecomment-52083754
> It's the reason that cause the test passed without any code change. I
just change the assertion order of the test case and the case is failed without
code change.
OK, I think what is going on here is that the sorting is not stable, and so
the test may pass or fail depending on things like which version of Java is in
use (I know there was some change 7 -> 8 with un-/stable sorting.)
I will make comments in-line.
> I tired run tests on master and I also did not see any tests fail.
However I find failure in the test reports file
storm-core/target/test-reports/backtype.storm.scheduler-test.xml. Is there
anything wrong with the test code?
This could happen if the test report is left over from a previous run. But
I will double-check it.
> Default Storm scheduler should evenly use resources when managing multiple
> topologies
> -------------------------------------------------------------------------------------
>
> Key: STORM-132
> URL: https://issues.apache.org/jira/browse/STORM-132
> Project: Apache Storm (Incubating)
> Issue Type: Improvement
> Reporter: James Xu
>
> https://github.com/nathanmarz/storm/issues/359
> Currently, a single topology is evenly spread across the cluster, but this is
> not the case for multiple topologies (it targets one node first, then the
> rest). The default scheduler should order the hosts for scheduling in terms
> of which has the least slots used.
> ----------
> lyogavin: To confirm, we want to firstly look at the number of used slots on
> each host. Firstly try to balance the used slot number across all the hosts.
> For the assignment inside each host, we also try to evenly balance the usage
> of each slot. Am I understanding correctly?
> When i coding, i realized it's actually a little complicated. Looks like
> there are many policies here we want to consider:
> 1. Evenness of resource usage. (Do we want to evaluate evenness according to
> number of slots used in each host or the number of executors? Maybe number of
> executors is better, but also make it very complicated)
> 2. Least rescheduling. We probabaly also want to make the assignment change
> as less as possible. Looks like this is why DefaultScheduler.bad-slots is
> coded the current way.
> 3. Number of workers.
> Then the question is how do we prioritize those policies. Sometimes they
> conflict to each other. For example, sometimes the most even distribution may
> need the most reassignment.
> Any thoughts?
> ----------
> xumingming: @lyogavin I think you might have over-thought this, as
> @nathanmarz already confirmed, you just need to update
> EvenScheduler.sort-slots to make sure the slots in the least used node appear
> first in the available-slots list.
> ----------
> lyogavin: Thanks James. I got what you mean. So i'll not worry about the
> usage of the executor usage. Only consider the balance of slots usage.
> But I think just simply change the sort-slots to sort the slots based on
> slots usage may still not work too well. For example, let's say there are 2
> hosts, with 10 slots on each. Say 1st one used 1 slots, 2nd one used 2 slots,
> and we want to assign another topology with 8 workers. If we simply sort the
> slots based on usage, we'll end up with the list [0 0 0 0 0 0 0 0 1 1], then
> the new assignment we'll get would be all from the 1st host. Not balanced.
> So in the above pull request, I implemented a solution in the way similar to
> watershed algorithm. It would firstly pick the slots from least used host,
> until that host uses the same number of slots as the second least used slots.
> Then it evenly picks slots from the 2 least used hosts until reaches the 3rd
> one. Iterating this way, we can get the best balanced assignment.
> What do you think?
--
This message was sent by Atlassian JIRA
(v6.2#6252)