The behavior is similar across any set of port ranges assigned as the supervisor slots ports. I tried with 6700-6703 and it's the same issue of address in use. I might be wrong but is it possible that the supervisor is not opening the ports as opposed to some other process using that port?
Sent from my iPhone > On Jul 28, 2016, at 8:56 PM, Erik Weathers <[email protected]> > wrote: > > I'm a bit confused as to what all of those cmds are showing / proving. > > But one thing I will point out is that you probably shouldn't be using > ports between 32768-61000 for your workers, because those ports are for > ephemeral usage, so could be used by another process randomly. (That's the > default on linux at least.) > > - Erik > >> On Thu, Jul 28, 2016 at 5:47 PM, Arjun Rao <[email protected]> wrote: >> >> Thanks for the reply Erik. I ran nc -l 59027 on the supervisor host, but >> i think it is able to connect successfully. i ran the strace in any case >> and the output is attached in the file. I ran a couple of other commands as >> well and this is what i found. >> >> *With the supervisor running* >> >> >> >> nc -v devctsl001 59027 >> >> nc: connect to devctsl001 port 59027 (tcp) failed: Connection refused >> >> >> >> telnet devctsl001 59027 >> >> Trying 45.32.96.34... >> >> telnet: connect to address xx.xx.xx.xx: Connection refused >> >> >> >> nc -l 59027 >> >> {No address already in use error. Connection seems to be open} >> >> >> *With the UI running ( the storm ui connects on 59031. The UI comes up >> successfully without any issues)* >> >> >> >> nc -v devctsl001 59031 >> >> Connection to devctsl001 59031 port [tcp/*] succeeded! >> >> >> telnet devctsl001 59031 >> >> Trying xx.xx.xx.xx... >> >> Connected to devctsl001. >> >> Escape character is '^]'. >> >> >> nc -l 59031 >> >> nc: Address already in use >> >> >> >> >> Might be a red herring, but thought i'd share what i have done so far. >> >> >> Best, >> >> Arjun >> >> On Thu, Jul 28, 2016 at 7:35 PM, Erik Weathers < >> [email protected]> wrote: >> >>> Somehow the OS is denying your application's request to create a socket. >>> Either the port really is bound to another process despite your netstat >>> cmd >>> not revealing that, or you are hitting some other limit. The thread you >>> linked doesn't seem useful towards determining what your problem's root >>> cause is. >>> >>> I would run: `nc -l 59027` in order to see if anything can bind to that >>> port. >>> Assuming it fails, then follow that up with an `strace nc -l 59027` to see >>> if there's any other evidence of why it's failing to bind. >>> >>> - Erik >>> >>> On Thu, Jul 28, 2016 at 3:46 PM, Arjun Rao <[email protected]> >>> wrote: >>> >>>> Hi all, >>>> >>>> We are active users of storm in production. One of our pre-prod clusters >>>> however, is not functional at the moment. The storm daemons ( nimbus, >>> ui, >>>> logviewer, supervisor ) start up fine, but the storm workers are not get >>>> instantiated, when we submit topologies. We see the following error in >>> the >>>> worker logs: >>>> >>>> 2016-07-28 18:33:59 [main] b.s.d.worker [INFO] Reading Assignments. >>>> 2016-07-28 18:34:00 [main] b.s.m.TransportFactory [INFO] Storm peer >>>> transport plugin:backtype.storm.messaging.netty.Context >>>> 2016-07-28 18:34:00 [main] b.s.d.worker [INFO] Launching receive-thread >>>> for b4560ed4-d257-4151-9764-633707282a1f:59027 >>>> 2016-07-28 18:34:00 [main] b.s.m.n.Server [INFO] Create Netty Server >>>> Netty-server-localhost-59027, buffer_size: 5242880, maxWorkers: 1 >>>> 2016-07-28 18:34:00 [main] b.s.d.worker [ERROR] Error on initialization >>> of >>>> server mk-worker >>>> org.apache.storm.netty.channel.ChannelException: Failed to bind to: >>>> 0.0.0.0/0.0.0.0:59027 >>>> at >>> org.apache.storm.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) >>>> ~[storm-core-0.9.6.jar:0.9.6] >>>> at backtype.storm.messaging.netty.Server.<init>(Server.java:130) >>>> ~[storm-core-0.9.6.jar:0.9.6] >>>> at backtype.storm.messaging.netty.Context.bind(Context.java:73) >>>> ~[storm-core-0.9.6.jar:0.9.6] >>>> at >>> backtype.storm.messaging.loader$launch_receive_thread_BANG_.doInvoke(loader.clj:68) >>>> ~[storm-core-0.9.6.jar:0.9.6] >>>> at clojure.lang.RestFn.invoke(RestFn.java:668) >>>> [clojure-1.5.1.jar:na] >>>> at >>> backtype.storm.daemon.worker$launch_receive_thread.invoke(worker.clj:380) >>>> ~[storm-core-0.9.6.jar:0.9.6] >>>> at >>> backtype.storm.daemon.worker$fn__4629$exec_fn__1104__auto____4630.invoke(worker.clj:415) >>>> ~[storm-core-0.9.6.jar:0.9.6] >>>> at clojure.lang.AFn.applyToHelper(AFn.java:185) >>>> [clojure-1.5.1.jar:na] >>>> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na] >>>> at clojure.core$apply.invoke(core.clj:617) >>> ~[clojure-1.5.1.jar:na] >>>> at >>> backtype.storm.daemon.worker$fn__4629$mk_worker__4685.doInvoke(worker.clj:393) >>>> [storm-core-0.9.6.jar:0.9.6] >>>> at clojure.lang.RestFn.invoke(RestFn.java:512) >>>> [clojure-1.5.1.jar:na] >>>> at backtype.storm.daemon.worker$_main.invoke(worker.clj:504) >>>> [storm-core-0.9.6.jar:0.9.6] >>>> at clojure.lang.AFn.applyToHelper(AFn.java:172) >>>> [clojure-1.5.1.jar:na] >>>> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na] >>>> at backtype.storm.daemon.worker.main(Unknown Source) >>>> [storm-core-0.9.6.jar:0.9.6] >>>> java.net.BindException: Address already in use >>>> at sun.nio.ch.Net.bind0(Native Method) ~[na:1.8.0_45] >>>> at sun.nio.ch.Net.bind(Net.java:437) ~[na:1.8.0_45] >>>> at sun.nio.ch.Net.bind(Net.java:429) ~[na:1.8.0_45] >>>> at >>> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) >>>> ~[na:1.8.0_45] >>>> at >>>> sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) >>>> ~[na:1.8.0_45] >>>> at >>> org.apache.storm.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193) >>>> ~[storm-core-0.9.6.jar:0.9.6] >>>> at >>> org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:372) >>>> ~[storm-core-0.9.6.jar:0.9.6] >>>> at >>> org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:296) >>>> ~[storm-core-0.9.6.jar:0.9.6] >>>> at >>> org.apache.storm.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) >>>> ~[storm-core-0.9.6.jar:0.9.6] >>>> at >>> org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) >>>> ~[storm-core-0.9.6.jar:0.9.6] >>>> at >>> org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) >>>> ~[storm-core-0.9.6.jar:0.9.6] >>>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>> ~[na:1.8.0_45] >>>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>> ~[na:1.8.0_45] >>>> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_45] >>>> 2016-07-28 18:34:00 [main] b.s.util [ERROR] Halting process: ("Error on >>>> initialization") >>>> java.lang.RuntimeException: ("Error on initialization") >>>> at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) >>>> [storm-core-0.9.6.jar:0.9.6] >>>> at clojure.lang.RestFn.invoke(RestFn.java:423) >>>> [clojure-1.5.1.jar:na] >>>> at >>> backtype.storm.daemon.worker$fn__4629$mk_worker__4685.doInvoke(worker.clj:393) >>>> [storm-core-0.9.6.jar:0.9.6] >>>> at clojure.lang.RestFn.invoke(RestFn.java:512) >>>> [clojure-1.5.1.jar:na] >>>> at backtype.storm.daemon.worker$_main.invoke(worker.clj:504) >>>> [storm-core-0.9.6.jar:0.9.6] >>>> at clojure.lang.AFn.applyToHelper(AFn.java:172) >>>> [clojure-1.5.1.jar:na] >>>> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na] >>>> at backtype.storm.daemon.worker.main(Unknown Source) >>>> [storm-core-0.9.6.jar:0.9.6] >>>> >>>> >>>> We are running storm 0.9.6. The ports that we have assigned for the >>>> supervisor are 59027, 59028, 59029, 59030. When I run commands to >>> check if >>>> anything is running on those ports ( for eg. netstat -an | grep 59027 >>> ), I >>>> do not get back any results. So it looks like there is nothing running >>> on >>>> those ports. (Based on this : >>> http://grokbase.com/t/gg/storm-user/137h7hr7f0/hi-when-i-run-storm-ui-i-get-address-is-already-in-use-error >>> ) >>>> It almost seems the storm supervisor on that box is not able to open up >>>> those ports for the workers to be started on. Does anyone know how this >>>> problem can be solved/debugged? This cluster was working without any >>> issues >>>> and then we started hitting the “Address already in use” errors and have >>>> been unable to get around it. If you need any more information about the >>>> nature of our setup, please let me know. >>>> >>>> Thanks! >>>> >>>> Best, >>>> Arjun >> >>
