Yes and no.  Storm establishes the connections based off of the compiled
topology so even though in theory after about 360 workers the ports would
be exhausted in practice it is a bit harder to do.  However, that being
said it is still possible to have this happen.  For example if you had a
topology with 800 workers, 400 spouts, 400 bolts and a shuffle grouping
between the two you would probably run into this problem. The only real
way to avoid this is to have your topology not create a fully connected
graph. We could try to make Netty really lazy about establishing the
actual connection, and have the option of tearing down unused connections,
but that would only work for groupings that have a skewed access pattern,
shuffle tries very hard to make it even.  It would also slow down the
topology a lot, potentially.

If this is an issue you are running into there are things we can try to
look at.

—Bobby


On 4/16/14, 11:58 PM, "李家宏" <[email protected]> wrote:

>hi , evans
>
>I tried out the latest version of storm, it uses a shared threadpool which
>is non-blocking for every netty-client and thus reduced large number of
>threads, as well as pipes. And for now, the "too many open file
>exceptions"
>is never thrown.
>
>One more thing:
> To my knowledge, as worker number increases, the number of tcp port used
>per worker increases largely, and the max tcp port usage per worker is
>twice the number of workers. What's more, one machine will host several
>workers, the total tcp port usage per machine would be multiplied, and
>thus
>will exhaust tcp ports(less than 65536) of the machine.
>
>Thanks for your advice.
>
>
>2014-04-16 10:36 GMT+08:00 李家宏 <[email protected]>:
>
>> ​Although you reduced the Selector instances, netty still leaks open
>>file
>> descriptors. As topology expands much larger, the "too many open files
>> exception" will inevitably throw.
>>
>>
>> 2014-04-16 0:17 GMT+08:00 Bobby Evans <[email protected]>:
>>
>> I am rather stumped here. The code is blowing up creating a pipe as part
>>> of an nio EpollSelector for netty to use.  My best advice right now is
>>>to
>>> try and upgrade to the latest version of storm.  We have merged in two
>>> fixes, one that relates to closing config files, and one that relates
>>>to
>>> netty.  The fix makes it so that it uses less threads, but as a part of
>>> that I believe that the number of Selector instances will be smaller
>>>too,
>>> although this stake trace is for the client side, not the server side.
>>>
>>> ―Bobby
>>>
>>> On 4/14/14, 10:38 PM, "李家宏" <[email protected]> wrote:
>>>
>>> >Hi, all
>>> >I'm running a topology on storm cluster of 0.9.0.1 with netty as
>>> transport
>>> >layer, this error occurs :
>>> >Netty client failed to create a selector due to* too many open files
>>> >exception*, the worker continuously halting with initialization error.
>>> >
>>> >I checked the ulimit -n(> 130000) which is much bigger than currently
>>> >opened fds (sudo lsof | grep java | wc -l) which is about 6000 at
>>>most.
>>> >
>>> >By the way,this topology works fine with storm cluster of 0.8.0.
>>> >
>>> >What's the problem?
>>> >
>>> >here is the stack trace:
>>> >-------------------------------------------------------------
>>> >2014-03-04 20:24:14 b.s.m.TransportFactory [INFO] Storm peer transport
>>> >plugin:backtype.storm.messaging.netty.Context
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [2]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>>> >   2014-03-04 20:24:14 b.s.d.worker [ERROR] Error on initialization of
>>> >server mk-worker
>>> >   org.jboss.netty.channel.ChannelException: Failed to create a
>>>selector.
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(Abs
>>>>tra
>>> >ctNioSelector.java:337)
>>> >   ~[netty-3.6.3.Final.jar:na]
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.AbstractNioSelector.(AbstractNioSele
>>>>cto
>>> >r.java:95)
>>> >~[netty-3.6.3.Final.jar:na]
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.AbstractNioWorker.(AbstractNioWorker
>>>>.ja
>>> >va:51)
>>> >~[netty-3.6.3.Final.jar:na]
>>> >   at org.jboss.netty.channel.socket.nio.NioWorker.(NioWorker.java:45)
>>> >~[netty-3.6.3.Final.jar:na]
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorker
>>>>Poo
>>> >l.java:45)
>>> >~[netty-3.6.3.Final.jar:na]
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorker
>>>>Poo
>>> >l.java:28)
>>> >~[netty-3.6.3.Final.jar:na]
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.newWorker(Abst
>>>>rac
>>> >tNioWorkerPool.java:99)
>>> >   ~[netty-3.6.3.Final.jar:na]
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.init(AbstractN
>>>>ioW
>>> >orkerPool.java:69)
>>> >   ~[netty-3.6.3.Final.jar:na]
>>> >   at
>>> 
>>>>org.jboss.netty.channel.socket.nio.NioWorkerPool.(NioWorkerPool.java:39
>>>>)
>>> >~[netty-3.6.3.Final.jar:na]
>>> >   at
>>> 
>>>>org.jboss.netty.channel.socket.nio.NioWorkerPool.(NioWorkerPool.java:33
>>>>)
>>> >~[netty-3.6.3.Final.jar:na]
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.(NioCl
>>>>ien
>>> >tSocketChannelFactory.java:152)
>>> >   ~[netty-3.6.3.Final.jar:na]
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.(NioCl
>>>>ien
>>> >tSocketChannelFactory.java:134)
>>> >   ~[netty-3.6.3.Final.jar:na]
>>> >   at backtype.storm.messaging.netty.Client.(Client.java:54)
>>> >~[storm-netty-0.9.0.1.jar:na]
>>> >   at backtype.storm.messaging.netty.Context.connect(Context.java:36)
>>> >~[storm-netty-0.9.0.1.jar:na]
>>> >   at
>>>
>>> 
>>>>backtype.storm.daemon.worker$mk_refresh_connections$this__5827$iter__58
>>>>34_
>>> >_5838$fn__5839.invoke(worker.clj:250)
>>> >   ~[storm-core-0.9.0.1.jar:na]
>>> >   at clojure.lang.LazySeq.sval(LazySeq.java:42)
>>>~[clojure-1.4.0.jar:na]
>>> >   at clojure.lang.LazySeq.seq(LazySeq.java:60)
>>>~[clojure-1.4.0.jar:na]
>>> >   at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.4.0.jar:na]
>>> >   at clojure.lang.RT.next(RT.java:587) ~[clojure-1.4.0.jar:na]
>>> >   at clojure.core$next.invoke(core.clj:64) ~[clojure-1.4.0.jar:na]
>>> >   at clojure.core$dorun.invoke(core.clj:2726) ~[clojure-1.4.0.jar:na]
>>> >   at clojure.core$doall.invoke(core.clj:2741) ~[clojure-1.4.0.jar:na]
>>> >   at
>>>
>>> 
>>>>backtype.storm.daemon.worker$mk_refresh_connections$this__5827.invoke(w
>>>>ork
>>> >er.clj:244)
>>> >~[storm-core-0.9.0.1.jar:na]
>>> >   at
>>>
>>> 
>>>>backtype.storm.daemon.worker$fn__5882$exec_fn__1229__auto____5883.invok
>>>>e(w
>>> >orker.clj:357)
>>> >   ~[storm-core-0.9.0.1.jar:na]
>>> >   at clojure.lang.AFn.applyToHelper(AFn.java:185)
>>>[clojure-1.4.0.jar:na]
>>> >   at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
>>> >   at clojure.core$apply.invoke(core.clj:601) ~[clojure-1.4.0.jar:na]
>>> >   at
>>>
>>> 
>>>>backtype.storm.daemon.worker$fn__5882$mk_worker__5938.doInvoke(worker.c
>>>>lj:
>>> >329)
>>> >[storm-core-0.9.0.1.jar:na]
>>> >   at clojure.lang.RestFn.invoke(RestFn.java:512)
>>>[clojure-1.4.0.jar:na]
>>> >   at backtype.storm.daemon.worker$_main.invoke(worker.clj:439)
>>> >[storm-core-0.9.0.1.jar:na]
>>> >   at clojure.lang.AFn.applyToHelper(AFn.java:172)
>>>[clojure-1.4.0.jar:na]
>>> >   at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
>>> >   at backtype.storm.daemon.worker.main(Unknown Source)
>>> >[storm-core-0.9.0.1.jar:na]
>>> >
>>> >  * Caused by: java.io.IOException: Too many open files*
>>> >
>>> >   at sun.nio.ch.IOUtil.initPipe(Native Method) ~[na:1.6.0_38]
>>> >   at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:49)
>>> >~[na:1.6.0_38]
>>> >   at
>>>
>>> 
>>>>sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.jav
>>>>a:1
>>> >8)
>>> >~[na:1.6.0_38]
>>> >   at java.nio.channels.Selector.open(Selector.java:209)
>>>~[na:1.6.0_38]
>>> >   at
>>>
>>> 
>>>>org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(Abs
>>>>tra
>>> >ctNioSelector.java:335)
>>> >   ~[netty-3.6.3.Final.jar:na]
>>> >   ... 32 common frames omitted
>>> >   2014-03-04 20:24:14 b.s.util [INFO] Halting process: ("Error on
>>> >initialization")
>>>
>>> 
>>>>-----------------------------------------------------------------------
>>>>---
>>> >------------------------------------------
>>> >
>>> >Thanks
>>> >
>>> >--
>>> >
>>> >======================================================
>>> >
>>> >Gvain
>>> >
>>> >Email: [email protected]
>>>
>>>
>>
>>
>> --
>>
>> ======================================================
>>
>> Gvain
>>
>> Email: [email protected]
>>
>
>
>
>-- 
>
>======================================================
>
>Gvain
>
>Email: [email protected]

Reply via email to