Supervisors don't open the ports. The workers do. The supervisors *launch* the workers.
- Erik On Fri, Jul 29, 2016 at 8:46 AM, Arjun Rao <[email protected]> wrote: > The behavior is similar across any set of port ranges assigned as the > supervisor slots ports. I tried with 6700-6703 and it's the same issue of > address in use. I might be wrong but is it possible that the supervisor is > not opening the ports as opposed to some other process using that port? > > Sent from my iPhone > > > On Jul 28, 2016, at 8:56 PM, Erik Weathers <[email protected]> > wrote: > > > > I'm a bit confused as to what all of those cmds are showing / proving. > > > > But one thing I will point out is that you probably shouldn't be using > > ports between 32768-61000 for your workers, because those ports are for > > ephemeral usage, so could be used by another process randomly. (That's > the > > default on linux at least.) > > > > - Erik > > > >> On Thu, Jul 28, 2016 at 5:47 PM, Arjun Rao <[email protected]> > wrote: > >> > >> Thanks for the reply Erik. I ran nc -l 59027 on the supervisor host, but > >> i think it is able to connect successfully. i ran the strace in any case > >> and the output is attached in the file. I ran a couple of other > commands as > >> well and this is what i found. > >> > >> *With the supervisor running* > >> > >> > >> > >> nc -v devctsl001 59027 > >> > >> nc: connect to devctsl001 port 59027 (tcp) failed: Connection refused > >> > >> > >> > >> telnet devctsl001 59027 > >> > >> Trying 45.32.96.34... > >> > >> telnet: connect to address xx.xx.xx.xx: Connection refused > >> > >> > >> > >> nc -l 59027 > >> > >> {No address already in use error. Connection seems to be open} > >> > >> > >> *With the UI running ( the storm ui connects on 59031. The UI comes up > >> successfully without any issues)* > >> > >> > >> > >> nc -v devctsl001 59031 > >> > >> Connection to devctsl001 59031 port [tcp/*] succeeded! > >> > >> > >> telnet devctsl001 59031 > >> > >> Trying xx.xx.xx.xx... > >> > >> Connected to devctsl001. > >> > >> Escape character is '^]'. > >> > >> > >> nc -l 59031 > >> > >> nc: Address already in use > >> > >> > >> > >> > >> Might be a red herring, but thought i'd share what i have done so far. > >> > >> > >> Best, > >> > >> Arjun > >> > >> On Thu, Jul 28, 2016 at 7:35 PM, Erik Weathers < > >> [email protected]> wrote: > >> > >>> Somehow the OS is denying your application's request to create a > socket. > >>> Either the port really is bound to another process despite your netstat > >>> cmd > >>> not revealing that, or you are hitting some other limit. The thread > you > >>> linked doesn't seem useful towards determining what your problem's root > >>> cause is. > >>> > >>> I would run: `nc -l 59027` in order to see if anything can bind to > that > >>> port. > >>> Assuming it fails, then follow that up with an `strace nc -l 59027` to > see > >>> if there's any other evidence of why it's failing to bind. > >>> > >>> - Erik > >>> > >>> On Thu, Jul 28, 2016 at 3:46 PM, Arjun Rao <[email protected]> > >>> wrote: > >>> > >>>> Hi all, > >>>> > >>>> We are active users of storm in production. One of our pre-prod > clusters > >>>> however, is not functional at the moment. The storm daemons ( nimbus, > >>> ui, > >>>> logviewer, supervisor ) start up fine, but the storm workers are not > get > >>>> instantiated, when we submit topologies. We see the following error in > >>> the > >>>> worker logs: > >>>> > >>>> 2016-07-28 18:33:59 [main] b.s.d.worker [INFO] Reading Assignments. > >>>> 2016-07-28 18:34:00 [main] b.s.m.TransportFactory [INFO] Storm peer > >>>> transport plugin:backtype.storm.messaging.netty.Context > >>>> 2016-07-28 18:34:00 [main] b.s.d.worker [INFO] Launching > receive-thread > >>>> for b4560ed4-d257-4151-9764-633707282a1f:59027 > >>>> 2016-07-28 18:34:00 [main] b.s.m.n.Server [INFO] Create Netty Server > >>>> Netty-server-localhost-59027, buffer_size: 5242880, maxWorkers: 1 > >>>> 2016-07-28 18:34:00 [main] b.s.d.worker [ERROR] Error on > initialization > >>> of > >>>> server mk-worker > >>>> org.apache.storm.netty.channel.ChannelException: Failed to bind to: > >>>> 0.0.0.0/0.0.0.0:59027 > >>>> at > >>> > org.apache.storm.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) > >>>> ~[storm-core-0.9.6.jar:0.9.6] > >>>> at > backtype.storm.messaging.netty.Server.<init>(Server.java:130) > >>>> ~[storm-core-0.9.6.jar:0.9.6] > >>>> at backtype.storm.messaging.netty.Context.bind(Context.java:73) > >>>> ~[storm-core-0.9.6.jar:0.9.6] > >>>> at > >>> > backtype.storm.messaging.loader$launch_receive_thread_BANG_.doInvoke(loader.clj:68) > >>>> ~[storm-core-0.9.6.jar:0.9.6] > >>>> at clojure.lang.RestFn.invoke(RestFn.java:668) > >>>> [clojure-1.5.1.jar:na] > >>>> at > >>> > backtype.storm.daemon.worker$launch_receive_thread.invoke(worker.clj:380) > >>>> ~[storm-core-0.9.6.jar:0.9.6] > >>>> at > >>> > backtype.storm.daemon.worker$fn__4629$exec_fn__1104__auto____4630.invoke(worker.clj:415) > >>>> ~[storm-core-0.9.6.jar:0.9.6] > >>>> at clojure.lang.AFn.applyToHelper(AFn.java:185) > >>>> [clojure-1.5.1.jar:na] > >>>> at clojure.lang.AFn.applyTo(AFn.java:151) > [clojure-1.5.1.jar:na] > >>>> at clojure.core$apply.invoke(core.clj:617) > >>> ~[clojure-1.5.1.jar:na] > >>>> at > >>> > backtype.storm.daemon.worker$fn__4629$mk_worker__4685.doInvoke(worker.clj:393) > >>>> [storm-core-0.9.6.jar:0.9.6] > >>>> at clojure.lang.RestFn.invoke(RestFn.java:512) > >>>> [clojure-1.5.1.jar:na] > >>>> at backtype.storm.daemon.worker$_main.invoke(worker.clj:504) > >>>> [storm-core-0.9.6.jar:0.9.6] > >>>> at clojure.lang.AFn.applyToHelper(AFn.java:172) > >>>> [clojure-1.5.1.jar:na] > >>>> at clojure.lang.AFn.applyTo(AFn.java:151) > [clojure-1.5.1.jar:na] > >>>> at backtype.storm.daemon.worker.main(Unknown Source) > >>>> [storm-core-0.9.6.jar:0.9.6] > >>>> java.net.BindException: Address already in use > >>>> at sun.nio.ch.Net.bind0(Native Method) ~[na:1.8.0_45] > >>>> at sun.nio.ch.Net.bind(Net.java:437) ~[na:1.8.0_45] > >>>> at sun.nio.ch.Net.bind(Net.java:429) ~[na:1.8.0_45] > >>>> at > >>> > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > >>>> ~[na:1.8.0_45] > >>>> at > >>>> sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > >>>> ~[na:1.8.0_45] > >>>> at > >>> > org.apache.storm.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193) > >>>> ~[storm-core-0.9.6.jar:0.9.6] > >>>> at > >>> > org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:372) > >>>> ~[storm-core-0.9.6.jar:0.9.6] > >>>> at > >>> > org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:296) > >>>> ~[storm-core-0.9.6.jar:0.9.6] > >>>> at > >>> > org.apache.storm.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) > >>>> ~[storm-core-0.9.6.jar:0.9.6] > >>>> at > >>> > org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > >>>> ~[storm-core-0.9.6.jar:0.9.6] > >>>> at > >>> > org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > >>>> ~[storm-core-0.9.6.jar:0.9.6] > >>>> at > >>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > >>>> ~[na:1.8.0_45] > >>>> at > >>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > >>>> ~[na:1.8.0_45] > >>>> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_45] > >>>> 2016-07-28 18:34:00 [main] b.s.util [ERROR] Halting process: ("Error > on > >>>> initialization") > >>>> java.lang.RuntimeException: ("Error on initialization") > >>>> at > backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) > >>>> [storm-core-0.9.6.jar:0.9.6] > >>>> at clojure.lang.RestFn.invoke(RestFn.java:423) > >>>> [clojure-1.5.1.jar:na] > >>>> at > >>> > backtype.storm.daemon.worker$fn__4629$mk_worker__4685.doInvoke(worker.clj:393) > >>>> [storm-core-0.9.6.jar:0.9.6] > >>>> at clojure.lang.RestFn.invoke(RestFn.java:512) > >>>> [clojure-1.5.1.jar:na] > >>>> at backtype.storm.daemon.worker$_main.invoke(worker.clj:504) > >>>> [storm-core-0.9.6.jar:0.9.6] > >>>> at clojure.lang.AFn.applyToHelper(AFn.java:172) > >>>> [clojure-1.5.1.jar:na] > >>>> at clojure.lang.AFn.applyTo(AFn.java:151) > [clojure-1.5.1.jar:na] > >>>> at backtype.storm.daemon.worker.main(Unknown Source) > >>>> [storm-core-0.9.6.jar:0.9.6] > >>>> > >>>> > >>>> We are running storm 0.9.6. The ports that we have assigned for the > >>>> supervisor are 59027, 59028, 59029, 59030. When I run commands to > >>> check if > >>>> anything is running on those ports ( for eg. netstat -an | grep 59027 > >>> ), I > >>>> do not get back any results. So it looks like there is nothing running > >>> on > >>>> those ports. (Based on this : > >>> > http://grokbase.com/t/gg/storm-user/137h7hr7f0/hi-when-i-run-storm-ui-i-get-address-is-already-in-use-error > >>> ) > >>>> It almost seems the storm supervisor on that box is not able to open > up > >>>> those ports for the workers to be started on. Does anyone know how > this > >>>> problem can be solved/debugged? This cluster was working without any > >>> issues > >>>> and then we started hitting the “Address already in use” errors and > have > >>>> been unable to get around it. If you need any more information about > the > >>>> nature of our setup, please let me know. > >>>> > >>>> Thanks! > >>>> > >>>> Best, > >>>> Arjun > >> > >> >
