[ https://issues.apache.org/jira/browse/STORM-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041698#comment-15041698 ]
ASF GitHub Bot commented on STORM-1370: --------------------------------------- GitHub user jerrypeng opened a pull request: https://github.com/apache/storm/pull/923 [STORM-1370] - Bug fixes for MultitenantScheduler You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerrypeng/storm STORM-1370 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/923.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #923 ---- commit 82f9d969446898bc6bdbdb03f2b927a55174f97c Author: Boyang Jerry Peng <jerryp...@yahoo-inc.com> Date: 2015-12-04T16:23:10Z [STORM-1370] - Bug fixes for MultitenantScheduler ---- > Bug fixes for MultitenantScheduler > ---------------------------------- > > Key: STORM-1370 > URL: https://issues.apache.org/jira/browse/STORM-1370 > Project: Apache Storm > Issue Type: Bug > Reporter: Boyang Jerry Peng > Assignee: Boyang Jerry Peng > > Bug 1: > Sort nodes by slots used when scheduing isolated > Because nimbus removes "dead" slots (slots for which their workers have > not yet sent a heartbeat) before schedule is called, we cannot rely on > teh number of free slots on a node. This will break for clusters whose > nodes have a heterogenious number of slots configured. > Derive the effective number of hosts by taking the minimum of the > config's value and the number of executors in the topology. > If the user requests the topology be scheduled among a number of hosts, > then retry scheduling when the effective number does not match the > scheduled number. > Bug 2: > Nimbus crashes from an exception being thrown by the multitenant scheduler > trying to assign executors from an isolated topology to a node that is full. > Error in nimbus.log: > java.lang.IllegalStateException: Trying to assign to a full node xxxxxxxxxxxxx > at backtype.storm.scheduler.multitenant.Node.assign(Node.java:232) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > backtype.storm.scheduler.multitenant.NodePool$RoundRobinSlotScheduler.assignSlotTo(NodePool.java:171) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > backtype.storm.scheduler.multitenant.IsolatedPool.scheduleAsNeeded(IsolatedPool.java:164) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > backtype.storm.scheduler.multitenant.MultitenantScheduler.schedule(MultitenantScheduler.java:96) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) ~[?:?] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_40] > at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_40] > at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) > ~[clojure-1.6.0.jar:?] > at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) > ~[clojure-1.6.0.jar:?] > at > backtype.storm.daemon.nimbus$compute_new_scheduler_assignments.invoke(nimbus.clj:750) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at backtype.storm.daemon.nimbus$mk_assignments.doInvoke(nimbus.clj:806) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at clojure.lang.RestFn.invoke(RestFn.java:410) ~[clojure-1.6.0.jar:?] > at > backtype.storm.daemon.nimbus$fn_6009$exec_fn1502auto__6010$fn6020$fn_6021.invoke(nimbus.clj:1245) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at > backtype.storm.daemon.nimbus$fn_6009$exec_fn1502auto__6010$fn_6020.invoke(nimbus.clj:1244) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at backtype.storm.timer$schedule_recurring$this__4635.invoke(timer.clj:105) > ~[storm-core-0.10.1.y.jar:0.10.1.y] > at backtype.storm.timer$mk_timer$fn_4618$fn_4619.invoke(timer.clj:50) > [storm-core-0.10.1.y.jar:0.10.1.y] > at backtype.storm.timer$mk_timer$fn__4618.invoke(timer.clj:42) > [storm-core-0.10.1.y.jar:0.10.1.y] > at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?] > at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40] -- This message was sent by Atlassian JIRA (v6.3.4#6332)