I'm a bit confused as to what all of those cmds are showing / proving. But one thing I will point out is that you probably shouldn't be using ports between 32768-61000 for your workers, because those ports are for ephemeral usage, so could be used by another process randomly. (That's the default on linux at least.)
- Erik On Thu, Jul 28, 2016 at 5:47 PM, Arjun Rao <[email protected]> wrote: > Thanks for the reply Erik. I ran nc -l 59027 on the supervisor host, but > i think it is able to connect successfully. i ran the strace in any case > and the output is attached in the file. I ran a couple of other commands as > well and this is what i found. > > *With the supervisor running* > > > > nc -v devctsl001 59027 > > nc: connect to devctsl001 port 59027 (tcp) failed: Connection refused > > > > telnet devctsl001 59027 > > Trying 45.32.96.34... > > telnet: connect to address xx.xx.xx.xx: Connection refused > > > > nc -l 59027 > > {No address already in use error. Connection seems to be open} > > > *With the UI running ( the storm ui connects on 59031. The UI comes up > successfully without any issues)* > > > > nc -v devctsl001 59031 > > Connection to devctsl001 59031 port [tcp/*] succeeded! > > > telnet devctsl001 59031 > > Trying xx.xx.xx.xx... > > Connected to devctsl001. > > Escape character is '^]'. > > > nc -l 59031 > > nc: Address already in use > > > > > Might be a red herring, but thought i'd share what i have done so far. > > > Best, > > Arjun > > On Thu, Jul 28, 2016 at 7:35 PM, Erik Weathers < > [email protected]> wrote: > >> Somehow the OS is denying your application's request to create a socket. >> Either the port really is bound to another process despite your netstat >> cmd >> not revealing that, or you are hitting some other limit. The thread you >> linked doesn't seem useful towards determining what your problem's root >> cause is. >> >> I would run: `nc -l 59027` in order to see if anything can bind to that >> port. >> Assuming it fails, then follow that up with an `strace nc -l 59027` to see >> if there's any other evidence of why it's failing to bind. >> >> - Erik >> >> On Thu, Jul 28, 2016 at 3:46 PM, Arjun Rao <[email protected]> >> wrote: >> >> > Hi all, >> > >> > We are active users of storm in production. One of our pre-prod clusters >> > however, is not functional at the moment. The storm daemons ( nimbus, >> ui, >> > logviewer, supervisor ) start up fine, but the storm workers are not get >> > instantiated, when we submit topologies. We see the following error in >> the >> > worker logs: >> > >> > 2016-07-28 18:33:59 [main] b.s.d.worker [INFO] Reading Assignments. >> > 2016-07-28 18:34:00 [main] b.s.m.TransportFactory [INFO] Storm peer >> > transport plugin:backtype.storm.messaging.netty.Context >> > 2016-07-28 18:34:00 [main] b.s.d.worker [INFO] Launching receive-thread >> > for b4560ed4-d257-4151-9764-633707282a1f:59027 >> > 2016-07-28 18:34:00 [main] b.s.m.n.Server [INFO] Create Netty Server >> > Netty-server-localhost-59027, buffer_size: 5242880, maxWorkers: 1 >> > 2016-07-28 18:34:00 [main] b.s.d.worker [ERROR] Error on initialization >> of >> > server mk-worker >> > org.apache.storm.netty.channel.ChannelException: Failed to bind to: >> > 0.0.0.0/0.0.0.0:59027 >> > at >> > >> org.apache.storm.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) >> > ~[storm-core-0.9.6.jar:0.9.6] >> > at backtype.storm.messaging.netty.Server.<init>(Server.java:130) >> > ~[storm-core-0.9.6.jar:0.9.6] >> > at backtype.storm.messaging.netty.Context.bind(Context.java:73) >> > ~[storm-core-0.9.6.jar:0.9.6] >> > at >> > >> backtype.storm.messaging.loader$launch_receive_thread_BANG_.doInvoke(loader.clj:68) >> > ~[storm-core-0.9.6.jar:0.9.6] >> > at clojure.lang.RestFn.invoke(RestFn.java:668) >> > [clojure-1.5.1.jar:na] >> > at >> > >> backtype.storm.daemon.worker$launch_receive_thread.invoke(worker.clj:380) >> > ~[storm-core-0.9.6.jar:0.9.6] >> > at >> > >> backtype.storm.daemon.worker$fn__4629$exec_fn__1104__auto____4630.invoke(worker.clj:415) >> > ~[storm-core-0.9.6.jar:0.9.6] >> > at clojure.lang.AFn.applyToHelper(AFn.java:185) >> > [clojure-1.5.1.jar:na] >> > at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na] >> > at clojure.core$apply.invoke(core.clj:617) >> ~[clojure-1.5.1.jar:na] >> > at >> > >> backtype.storm.daemon.worker$fn__4629$mk_worker__4685.doInvoke(worker.clj:393) >> > [storm-core-0.9.6.jar:0.9.6] >> > at clojure.lang.RestFn.invoke(RestFn.java:512) >> > [clojure-1.5.1.jar:na] >> > at backtype.storm.daemon.worker$_main.invoke(worker.clj:504) >> > [storm-core-0.9.6.jar:0.9.6] >> > at clojure.lang.AFn.applyToHelper(AFn.java:172) >> > [clojure-1.5.1.jar:na] >> > at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na] >> > at backtype.storm.daemon.worker.main(Unknown Source) >> > [storm-core-0.9.6.jar:0.9.6] >> > java.net.BindException: Address already in use >> > at sun.nio.ch.Net.bind0(Native Method) ~[na:1.8.0_45] >> > at sun.nio.ch.Net.bind(Net.java:437) ~[na:1.8.0_45] >> > at sun.nio.ch.Net.bind(Net.java:429) ~[na:1.8.0_45] >> > at >> > >> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) >> > ~[na:1.8.0_45] >> > at >> > sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) >> > ~[na:1.8.0_45] >> > at >> > >> org.apache.storm.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193) >> > ~[storm-core-0.9.6.jar:0.9.6] >> > at >> > >> org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:372) >> > ~[storm-core-0.9.6.jar:0.9.6] >> > at >> > >> org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:296) >> > ~[storm-core-0.9.6.jar:0.9.6] >> > at >> > >> org.apache.storm.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) >> > ~[storm-core-0.9.6.jar:0.9.6] >> > at >> > >> org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) >> > ~[storm-core-0.9.6.jar:0.9.6] >> > at >> > >> org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) >> > ~[storm-core-0.9.6.jar:0.9.6] >> > at >> > >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> > ~[na:1.8.0_45] >> > at >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> > ~[na:1.8.0_45] >> > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_45] >> > 2016-07-28 18:34:00 [main] b.s.util [ERROR] Halting process: ("Error on >> > initialization") >> > java.lang.RuntimeException: ("Error on initialization") >> > at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) >> > [storm-core-0.9.6.jar:0.9.6] >> > at clojure.lang.RestFn.invoke(RestFn.java:423) >> > [clojure-1.5.1.jar:na] >> > at >> > >> backtype.storm.daemon.worker$fn__4629$mk_worker__4685.doInvoke(worker.clj:393) >> > [storm-core-0.9.6.jar:0.9.6] >> > at clojure.lang.RestFn.invoke(RestFn.java:512) >> > [clojure-1.5.1.jar:na] >> > at backtype.storm.daemon.worker$_main.invoke(worker.clj:504) >> > [storm-core-0.9.6.jar:0.9.6] >> > at clojure.lang.AFn.applyToHelper(AFn.java:172) >> > [clojure-1.5.1.jar:na] >> > at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na] >> > at backtype.storm.daemon.worker.main(Unknown Source) >> > [storm-core-0.9.6.jar:0.9.6] >> > >> > >> > We are running storm 0.9.6. The ports that we have assigned for the >> > supervisor are 59027, 59028, 59029, 59030. When I run commands to >> check if >> > anything is running on those ports ( for eg. netstat -an | grep 59027 >> ), I >> > do not get back any results. So it looks like there is nothing running >> on >> > those ports. (Based on this : >> > >> http://grokbase.com/t/gg/storm-user/137h7hr7f0/hi-when-i-run-storm-ui-i-get-address-is-already-in-use-error >> ) >> > It almost seems the storm supervisor on that box is not able to open up >> > those ports for the workers to be started on. Does anyone know how this >> > problem can be solved/debugged? This cluster was working without any >> issues >> > and then we started hitting the “Address already in use” errors and have >> > been unable to get around it. If you need any more information about the >> > nature of our setup, please let me know. >> > >> > Thanks! >> > >> > Best, >> > Arjun >> > >
