hi , evans

I tried out the latest version of storm, it uses a shared threadpool which
is non-blocking for every netty-client and thus reduced large number of
threads, as well as pipes. And for now, the "too many open file exceptions"
is never thrown.

One more thing:
 To my knowledge, as worker number increases, the number of tcp port used
per worker increases largely, and the max tcp port usage per worker is
twice the number of workers. What's more, one machine will host several
workers, the total tcp port usage per machine would be multiplied, and thus
will exhaust tcp ports(less than 65536) of the machine.

Thanks for your advice.


2014-04-16 10:36 GMT+08:00 李家宏 <[email protected]>:

> ​Although you reduced the Selector instances, netty still leaks open file
> descriptors. As topology expands much larger, the "too many open files
> exception" will inevitably throw.
>
>
> 2014-04-16 0:17 GMT+08:00 Bobby Evans <[email protected]>:
>
> I am rather stumped here. The code is blowing up creating a pipe as part
>> of an nio EpollSelector for netty to use.  My best advice right now is to
>> try and upgrade to the latest version of storm.  We have merged in two
>> fixes, one that relates to closing config files, and one that relates to
>> netty.  The fix makes it so that it uses less threads, but as a part of
>> that I believe that the number of Selector instances will be smaller too,
>> although this stake trace is for the client side, not the server side.
>>
>> ―Bobby
>>
>> On 4/14/14, 10:38 PM, "李家宏" <[email protected]> wrote:
>>
>> >Hi, all
>> >I'm running a topology on storm cluster of 0.9.0.1 with netty as
>> transport
>> >layer, this error occurs :
>> >Netty client failed to create a selector due to* too many open files
>> >exception*, the worker continuously halting with initialization error.
>> >
>> >I checked the ulimit -n(> 130000) which is much bigger than currently
>> >opened fds (sudo lsof | grep java | wc -l) which is about 6000 at most.
>> >
>> >By the way,this topology works fine with storm cluster of 0.8.0.
>> >
>> >What's the problem?
>> >
>> >here is the stack trace:
>> >-------------------------------------------------------------
>> >2014-03-04 20:24:14 b.s.m.TransportFactory [INFO] Storm peer transport
>> >plugin:backtype.storm.messaging.netty.Context
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [2]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.m.n.Client [INFO] Reconnect ... [1]
>> >   2014-03-04 20:24:14 b.s.d.worker [ERROR] Error on initialization of
>> >server mk-worker
>> >   org.jboss.netty.channel.ChannelException: Failed to create a selector.
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(Abstra
>> >ctNioSelector.java:337)
>> >   ~[netty-3.6.3.Final.jar:na]
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.AbstractNioSelector.(AbstractNioSelecto
>> >r.java:95)
>> >~[netty-3.6.3.Final.jar:na]
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.AbstractNioWorker.(AbstractNioWorker.ja
>> >va:51)
>> >~[netty-3.6.3.Final.jar:na]
>> >   at org.jboss.netty.channel.socket.nio.NioWorker.(NioWorker.java:45)
>> >~[netty-3.6.3.Final.jar:na]
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPoo
>> >l.java:45)
>> >~[netty-3.6.3.Final.jar:na]
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPoo
>> >l.java:28)
>> >~[netty-3.6.3.Final.jar:na]
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.newWorker(Abstrac
>> >tNioWorkerPool.java:99)
>> >   ~[netty-3.6.3.Final.jar:na]
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.init(AbstractNioW
>> >orkerPool.java:69)
>> >   ~[netty-3.6.3.Final.jar:na]
>> >   at
>> >org.jboss.netty.channel.socket.nio.NioWorkerPool.(NioWorkerPool.java:39)
>> >~[netty-3.6.3.Final.jar:na]
>> >   at
>> >org.jboss.netty.channel.socket.nio.NioWorkerPool.(NioWorkerPool.java:33)
>> >~[netty-3.6.3.Final.jar:na]
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.(NioClien
>> >tSocketChannelFactory.java:152)
>> >   ~[netty-3.6.3.Final.jar:na]
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.(NioClien
>> >tSocketChannelFactory.java:134)
>> >   ~[netty-3.6.3.Final.jar:na]
>> >   at backtype.storm.messaging.netty.Client.(Client.java:54)
>> >~[storm-netty-0.9.0.1.jar:na]
>> >   at backtype.storm.messaging.netty.Context.connect(Context.java:36)
>> >~[storm-netty-0.9.0.1.jar:na]
>> >   at
>>
>> >backtype.storm.daemon.worker$mk_refresh_connections$this__5827$iter__5834_
>> >_5838$fn__5839.invoke(worker.clj:250)
>> >   ~[storm-core-0.9.0.1.jar:na]
>> >   at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.4.0.jar:na]
>> >   at clojure.lang.LazySeq.seq(LazySeq.java:60) ~[clojure-1.4.0.jar:na]
>> >   at clojure.lang.Cons.next(Cons.java:39) ~[clojure-1.4.0.jar:na]
>> >   at clojure.lang.RT.next(RT.java:587) ~[clojure-1.4.0.jar:na]
>> >   at clojure.core$next.invoke(core.clj:64) ~[clojure-1.4.0.jar:na]
>> >   at clojure.core$dorun.invoke(core.clj:2726) ~[clojure-1.4.0.jar:na]
>> >   at clojure.core$doall.invoke(core.clj:2741) ~[clojure-1.4.0.jar:na]
>> >   at
>>
>> >backtype.storm.daemon.worker$mk_refresh_connections$this__5827.invoke(work
>> >er.clj:244)
>> >~[storm-core-0.9.0.1.jar:na]
>> >   at
>>
>> >backtype.storm.daemon.worker$fn__5882$exec_fn__1229__auto____5883.invoke(w
>> >orker.clj:357)
>> >   ~[storm-core-0.9.0.1.jar:na]
>> >   at clojure.lang.AFn.applyToHelper(AFn.java:185) [clojure-1.4.0.jar:na]
>> >   at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
>> >   at clojure.core$apply.invoke(core.clj:601) ~[clojure-1.4.0.jar:na]
>> >   at
>>
>> >backtype.storm.daemon.worker$fn__5882$mk_worker__5938.doInvoke(worker.clj:
>> >329)
>> >[storm-core-0.9.0.1.jar:na]
>> >   at clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.4.0.jar:na]
>> >   at backtype.storm.daemon.worker$_main.invoke(worker.clj:439)
>> >[storm-core-0.9.0.1.jar:na]
>> >   at clojure.lang.AFn.applyToHelper(AFn.java:172) [clojure-1.4.0.jar:na]
>> >   at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.4.0.jar:na]
>> >   at backtype.storm.daemon.worker.main(Unknown Source)
>> >[storm-core-0.9.0.1.jar:na]
>> >
>> >  * Caused by: java.io.IOException: Too many open files*
>> >
>> >   at sun.nio.ch.IOUtil.initPipe(Native Method) ~[na:1.6.0_38]
>> >   at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:49)
>> >~[na:1.6.0_38]
>> >   at
>>
>> >sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:1
>> >8)
>> >~[na:1.6.0_38]
>> >   at java.nio.channels.Selector.open(Selector.java:209) ~[na:1.6.0_38]
>> >   at
>>
>> >org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(Abstra
>> >ctNioSelector.java:335)
>> >   ~[netty-3.6.3.Final.jar:na]
>> >   ... 32 common frames omitted
>> >   2014-03-04 20:24:14 b.s.util [INFO] Halting process: ("Error on
>> >initialization")
>>
>> >--------------------------------------------------------------------------
>> >------------------------------------------
>> >
>> >Thanks
>> >
>> >--
>> >
>> >======================================================
>> >
>> >Gvain
>> >
>> >Email: [email protected]
>>
>>
>
>
> --
>
> ======================================================
>
> Gvain
>
> Email: [email protected]
>



-- 

======================================================

Gvain

Email: [email protected]

Reply via email to