What may be the problem ? Moreover, i reduce the worker numbers from 150 to 60, it works. However, storm-netty-client throws some negative timeout exceptions which is probably the same problem discussed in mails "netty errors, chain reactions, topology breaks down", and a newly pull request
https://github.com/apache/incubator-storm/pull/41 2014-03-05 11:41 GMT+08:00 Andrew Feng <[email protected]>: > Please create a Jira ticket. We will submit a pull request with a fix > > Andy Feng > > Sent from my iPhone > > > On Mar 4, 2014, at 6:32 PM, "李家宏" <[email protected]> wrote: > > > > hi , Andy Feng, > > there are 150 workers and 450 executors in my topology. > > > > Thanks for your reply > > > > > > 2014-03-04 23:13 GMT+08:00 Andrew Feng <[email protected]>: > > > >> How many workers do you have in your topology? > >> > >> Andy Feng > >> > >> Sent from my iPhone > >> > >>> On Mar 4, 2014, at 5:21 AM, "李家宏" <[email protected]> wrote: > >>> > >>> hi, all > >>> > >>> When I submit a topology to a storm cluster of 0.9.0.1, the following > >> error > >>> occurs: > >> > ---------------------------------------------------------------------------------------------------------------------- > >>> [INFO] Starting > >>> 2014-03-04 20:24:13 o.a.z.ZooKeeper [INFO] Initiating client > >> connection, > >>> connectString=10.207.52.82:2181,10.207.52.83:2181,10.207.52.84:2181 > >> sessionTimeout=20000 > >>> watcher=com.netflix.curator.ConnectionState@796cefa8 > >>> 2014-03-04 20:24:13 o.a.z.ClientCnxn [INFO] Opening socket connection > >> to > >>> server /10.207.52.83:2181 > >>> 2014-03-04 20:24:13 o.a.z.ClientCnxn [INFO] Socket connection > >>> established to > >>> storm010207052083.cm3.tbsite.net/10.207.52.83:2181, initiating > session > >>> 2014-03-04 20:24:13 o.a.z.ClientCnxn [INFO] Session establishment > >>> complete on server > >>> storm010207052083.cm3.tbsite.net/10.207.52.83:2181, sessionid = > >>> 0x2423f964207c973, negotiated timeout = 20000 > >>> 2014-03-04 20:24:13 b.s.zookeeper [INFO] Zookeeper state update: > >>> :connected:none > >>> 2014-03-04 20:24:13 o.a.z.ZooKeeper [INFO] Session: 0x2423f964207c973 > >>> closed > >>> 2014-03-04 20:24:13 o.a.z.ClientCnxn [INFO] EventThread shut down > >>> 2014-03-04 20:24:13 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting > >>> 2014-03-04 20:24:13 o.a.z.ZooKeeper [INFO] Initiating client > >> connection, > >>> connectString=10.207.52.82:2181,10.207.52.83:2181, > >>> 10.207.52.84:2181/tmp/storm-0.9.0.1 sessionTimeout=20000 > >>> watcher=com.netflix.curator.ConnectionState@58f41393 > >>> 2014-03-04 20:24:13 o.a.z.ClientCnxn [INFO] Opening socket connection > >> to > >>> server /10.207.52.82:2181 > >>> 2014-03-04 20:24:13 o.a.z.ClientCnxn [INFO] Socket connection > >>> established to > >>> storm010207052082.cm3.tbsite.net/10.207.52.82:2181, initiating > session > >>> 2014-03-04 20:24:13 o.a.z.ClientCnxn [INFO] Session establishment > >>> complete on server > >>> storm010207052082.cm3.tbsite.net/10.207.52.82:2181, sessionid = > >>> 0x1423f964209c65f, negotiated timeout = 20000 > >>> 2014-03-04 20:24:14 b.s.m.TransportFactory [INFO] Storm peer transport > >>> plugin:backtype.storm.messaging.netty.Context > >>> 2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1] > >>> 2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1] > >>> 2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1] > >>> 2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1] > >>> 2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1] > >>> 2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1] > >>> 2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1] > >>> 2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [2] > >>> 2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1] > >>> 2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1] > >>> 2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1] > >>> 2014-03-04 20:24:14 b.s.d.worker [ERROR] Error on initialization of > >>> server mk-worker > >>> org.jboss.netty.channel.ChannelException: Failed to create a selector. > >>> at > >> > org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(AbstractNioSelector.java:337) > >>> ~[netty-3.6.3.Final.jar:na] > >>> at > >> > org.jboss.netty.channel.socket.nio.AbstractNioSelector.(AbstractNioSelector.java:95) > >>> ~[netty-3.6.3.Final.jar:na] > >>> at > >> > org.jboss.netty.channel.socket.nio.AbstractNioWorker.(AbstractNioWorker.java:51) > >>> ~[netty-3.6.3.Final.jar:na] > >>> at org.jboss.netty.channel.socket.nio.NioWorker.(NioWorker.java:45) > >>> ~[netty-3.6.3.Final.jar:na] > >>> at > >> > org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:45) > >>> ~[netty-3.6.3.Final.jar:na] > >>> at > >> > org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:28) > >>> ~[netty-3.6.3.Final.jar:na] > >>> at > >> > org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.newWorker(AbstractNioWorkerPool.java:99) > >>> ~[netty-3.6.3.Final.jar:na] > >>> at > >> > org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.init(AbstractNioWorkerPool.java:69) > >>> ~[netty-3.6.3.Final.jar:na] > >>> at > >>> > org.jboss.netty.channel.socket.nio.NioWorkerPool.(NioWorkerPool.java:39) > >>> ~[netty-3.6.3.Final.jar:na] > >>> at > >>> > org.jboss.netty.channel.socket.nio.NioWorkerPool.(NioWorkerPool.java:33) > >>> ~[netty-3.6.3.Final.jar:na] > >>> at > >> > org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.(NioClientSocketChannelFactory.java:152) > >>> ~[netty-3.6.3.Final.jar:na] > >>> at > >> > org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.(NioClientSocketChannelFactory.java:134) > >>> ~[netty-3.6.3.Final.jar:na] > >>> at backtype.storm.messaging.netty.Client.(Client.java:54) > >>> ~[storm-netty-0.9.0.1.jar:na] > >>> at backtype.storm.messaging.netty.Context.connect(Context.java:36) > >>> ~[storm-netty-0.9.0.1.jar:na] > >>> at > >> > backtype.storm.daemon.worker$mk_refresh_connections$this__5827$iter__5834__5838$fn__5839.invoke(worker.clj:250) > >>> ~[storm-core-0.9.0.1.jar:na] > >>> at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.4.0.jar:na] > >>> at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.4.0.jar:na] > >>> at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.4.0.jar:na] > >>> at clojure.lang.RT.next(RT.java:587) ~[clojure-1.4.0.jar:na] > >>> at clojure.core$next.invoke(core.clj:64) ~[clojure-1.4.0.jar:na] > >>> at clojure.core$dorun.invoke(core.clj:2726) ~[clojure-1.4.0.jar:na] > >>> at clojure.core$doall.invoke(core.clj:2741) ~[clojure-1.4.0.jar:na] > >>> at > >> > backtype.storm.daemon.worker$mk_refresh_connections$this__5827.invoke(worker.clj:244) > >>> ~[storm-core-0.9.0.1.jar:na] > >>> at > >> > backtype.storm.daemon.worker$fn__5882$exec_fn__1229__auto____5883.invoke(worker.clj:357) > >>> ~[storm-core-0.9.0.1.jar:na] > >>> at clojure.lang.AFn.applyToHelper(AFn.java:185) [clojure-1.4.0.jar:na] > >>> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na] > >>> at clojure.core$apply.invoke(core.clj:601) ~[clojure-1.4.0.jar:na] > >>> at > >> > backtype.storm.daemon.worker$fn__5882$mk_worker__5938.doInvoke(worker.clj:329) > >>> [storm-core-0.9.0.1.jar:na] > >>> at clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.4.0.jar:na] > >>> at backtype.storm.daemon.worker$_main.invoke(worker.clj:439) > >>> [storm-core-0.9.0.1.jar:na] > >>> at clojure.lang.AFn.applyToHelper(AFn.java:172) [clojure-1.4.0.jar:na] > >>> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na] > >>> at backtype.storm.daemon.worker.main(Unknown Source) > >>> [storm-core-0.9.0.1.jar:na] > >>> Caused by: java.io.IOException: Too many open files > >>> at sun.nio.ch.IOUtil.initPipe(Native Method) ~[na:1.6.0_38] > >>> at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:49) > >>> ~[na:1.6.0_38] > >>> at > >> > sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18) > >>> ~[na:1.6.0_38] > >>> at java.nio.channels.Selector.open(Selector.java:209) ~[na:1.6.0_38] > >>> at > >> > org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(AbstractNioSelector.java:335) > >>> ~[netty-3.6.3.Final.jar:na] > >>> ... 32 common frames omitted > >>> 2014-03-04 20:24:14 b.s.util [INFO] Halting process: ("Error on > >>> initialization") > >> > -------------------------------------------------------------------------------------------------------------------- > >>> > >>> This topology works fine with storm cluster of 0.8.0. > >>> And: > >>> ulimit -n => 131072; > >>> sudo losf | grep java | wc -l => 5000 > >>> it seems like opened fds do not reaching limits > >>> > >>> What's the problem ? > >>> > >>> Regards > >>> > >>> -- > >>> > >>> ====================================================== > >>> > >>> Gvain > >>> > >>> Email: [email protected] > > > > > > > > -- > > > > ====================================================== > > > > Gvain > > > > Email: [email protected] > -- ====================================================== Gvain Email: [email protected]
