[ 
https://issues.apache.org/jira/browse/STORM-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041698#comment-15041698
 ] 

ASF GitHub Bot commented on STORM-1370:
---------------------------------------

GitHub user jerrypeng opened a pull request:

    https://github.com/apache/storm/pull/923

    [STORM-1370] - Bug fixes for MultitenantScheduler

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jerrypeng/storm STORM-1370

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/923.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #923
    
----
commit 82f9d969446898bc6bdbdb03f2b927a55174f97c
Author: Boyang Jerry Peng <jerryp...@yahoo-inc.com>
Date:   2015-12-04T16:23:10Z

    [STORM-1370] - Bug fixes for MultitenantScheduler

----


> Bug fixes for MultitenantScheduler
> ----------------------------------
>
>                 Key: STORM-1370
>                 URL: https://issues.apache.org/jira/browse/STORM-1370
>             Project: Apache Storm
>          Issue Type: Bug
>            Reporter: Boyang Jerry Peng
>            Assignee: Boyang Jerry Peng
>
> Bug 1:
> Sort nodes by slots used when scheduing isolated
> Because nimbus removes "dead" slots (slots for which their workers have
> not yet sent a heartbeat) before schedule is called, we cannot rely on
> teh number of free slots on a node.  This will break for clusters whose
> nodes have a heterogenious number of slots configured.
> Derive the effective number of hosts by taking the minimum of the
> config's value and the number of executors in the topology.
> If the user requests the topology be scheduled among a number of hosts,
> then retry scheduling when the effective number does not match the
> scheduled number.
> Bug 2:
> Nimbus crashes from an exception being thrown by the multitenant scheduler 
> trying to assign executors from an isolated topology to a node that is full.
> Error in nimbus.log:
> java.lang.IllegalStateException: Trying to assign to a full node xxxxxxxxxxxxx
> at backtype.storm.scheduler.multitenant.Node.assign(Node.java:232) 
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at 
> backtype.storm.scheduler.multitenant.NodePool$RoundRobinSlotScheduler.assignSlotTo(NodePool.java:171)
>  ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at 
> backtype.storm.scheduler.multitenant.IsolatedPool.scheduleAsNeeded(IsolatedPool.java:164)
>  ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at 
> backtype.storm.scheduler.multitenant.MultitenantScheduler.schedule(MultitenantScheduler.java:96)
>  ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) ~[?:?]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_40]
> at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_40]
> at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) 
> ~[clojure-1.6.0.jar:?]
> at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) 
> ~[clojure-1.6.0.jar:?]
> at 
> backtype.storm.daemon.nimbus$compute_new_scheduler_assignments.invoke(nimbus.clj:750)
>  ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at backtype.storm.daemon.nimbus$mk_assignments.doInvoke(nimbus.clj:806) 
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at clojure.lang.RestFn.invoke(RestFn.java:410) ~[clojure-1.6.0.jar:?]
> at 
> backtype.storm.daemon.nimbus$fn_6009$exec_fn1502auto__6010$fn6020$fn_6021.invoke(nimbus.clj:1245)
>  ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at 
> backtype.storm.daemon.nimbus$fn_6009$exec_fn1502auto__6010$fn_6020.invoke(nimbus.clj:1244)
>  ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at backtype.storm.timer$schedule_recurring$this__4635.invoke(timer.clj:105) 
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at backtype.storm.timer$mk_timer$fn_4618$fn_4619.invoke(timer.clj:50) 
> [storm-core-0.10.1.y.jar:0.10.1.y]
> at backtype.storm.timer$mk_timer$fn__4618.invoke(timer.clj:42) 
> [storm-core-0.10.1.y.jar:0.10.1.y]
> at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to