I'm a bit confused as to what all of those cmds are showing / proving.

But one thing I will point out is that you probably shouldn't be using
ports between 32768-61000 for your workers, because those ports are for
ephemeral usage, so could be used by another process randomly.  (That's the
default on linux at least.)

- Erik

On Thu, Jul 28, 2016 at 5:47 PM, Arjun Rao <[email protected]> wrote:

> Thanks for the reply Erik. I ran nc -l 59027 on the supervisor host, but
> i think it is able to connect successfully. i ran the strace in any case
> and the output is attached in the file. I ran a couple of other commands as
> well and this is what i found.
>
> *With the supervisor running*
>
>
>
> nc -v devctsl001 59027
>
> nc: connect to devctsl001 port 59027 (tcp) failed: Connection refused
>
>
>
> telnet devctsl001 59027
>
> Trying 45.32.96.34...
>
> telnet: connect to address xx.xx.xx.xx: Connection refused
>
>
>
> nc -l 59027
>
> {No address already in use error. Connection seems to be open}
>
>
> *With the UI running ( the storm ui connects on 59031. The UI comes up
> successfully without any issues)*
>
>
>
> nc -v devctsl001 59031
>
> Connection to devctsl001 59031 port [tcp/*] succeeded!
>
>
> telnet devctsl001 59031
>
> Trying xx.xx.xx.xx...
>
> Connected to devctsl001.
>
> Escape character is '^]'.
>
>
> nc -l 59031
>
> nc: Address already in use
>
>
>
>
> Might be a red herring, but thought i'd share what i have done so far.
>
>
> Best,
>
> Arjun
>
> On Thu, Jul 28, 2016 at 7:35 PM, Erik Weathers <
> [email protected]> wrote:
>
>> Somehow the OS is denying your application's request to create a socket.
>> Either the port really is bound to another process despite your netstat
>> cmd
>> not revealing that, or you are hitting some other limit.  The thread you
>> linked doesn't seem useful towards determining what your problem's root
>> cause is.
>>
>> I would run:  `nc -l 59027` in order to see if anything can bind to that
>> port.
>> Assuming it fails, then follow that up with an `strace nc -l 59027` to see
>> if there's any other evidence of why it's failing to bind.
>>
>> - Erik
>>
>> On Thu, Jul 28, 2016 at 3:46 PM, Arjun Rao <[email protected]>
>> wrote:
>>
>> > Hi all,
>> >
>> > We are active users of storm in production. One of our pre-prod clusters
>> > however, is not functional at the moment. The storm daemons ( nimbus,
>> ui,
>> > logviewer, supervisor ) start up fine, but the storm workers are not get
>> > instantiated, when we submit topologies. We see the following error in
>> the
>> > worker logs:
>> >
>> > 2016-07-28 18:33:59 [main] b.s.d.worker [INFO] Reading Assignments.
>> > 2016-07-28 18:34:00 [main] b.s.m.TransportFactory [INFO] Storm peer
>> > transport plugin:backtype.storm.messaging.netty.Context
>> > 2016-07-28 18:34:00 [main] b.s.d.worker [INFO] Launching receive-thread
>> > for b4560ed4-d257-4151-9764-633707282a1f:59027
>> > 2016-07-28 18:34:00 [main] b.s.m.n.Server [INFO] Create Netty Server
>> > Netty-server-localhost-59027, buffer_size: 5242880, maxWorkers: 1
>> > 2016-07-28 18:34:00 [main] b.s.d.worker [ERROR] Error on initialization
>> of
>> > server mk-worker
>> > org.apache.storm.netty.channel.ChannelException: Failed to bind to:
>> > 0.0.0.0/0.0.0.0:59027
>> >         at
>> >
>> org.apache.storm.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
>> > ~[storm-core-0.9.6.jar:0.9.6]
>> >         at backtype.storm.messaging.netty.Server.<init>(Server.java:130)
>> > ~[storm-core-0.9.6.jar:0.9.6]
>> >         at backtype.storm.messaging.netty.Context.bind(Context.java:73)
>> > ~[storm-core-0.9.6.jar:0.9.6]
>> >         at
>> >
>> backtype.storm.messaging.loader$launch_receive_thread_BANG_.doInvoke(loader.clj:68)
>> > ~[storm-core-0.9.6.jar:0.9.6]
>> >         at clojure.lang.RestFn.invoke(RestFn.java:668)
>> > [clojure-1.5.1.jar:na]
>> >         at
>> >
>> backtype.storm.daemon.worker$launch_receive_thread.invoke(worker.clj:380)
>> > ~[storm-core-0.9.6.jar:0.9.6]
>> >         at
>> >
>> backtype.storm.daemon.worker$fn__4629$exec_fn__1104__auto____4630.invoke(worker.clj:415)
>> > ~[storm-core-0.9.6.jar:0.9.6]
>> >         at clojure.lang.AFn.applyToHelper(AFn.java:185)
>> > [clojure-1.5.1.jar:na]
>> >         at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
>> >         at clojure.core$apply.invoke(core.clj:617)
>> ~[clojure-1.5.1.jar:na]
>> >         at
>> >
>> backtype.storm.daemon.worker$fn__4629$mk_worker__4685.doInvoke(worker.clj:393)
>> > [storm-core-0.9.6.jar:0.9.6]
>> >         at clojure.lang.RestFn.invoke(RestFn.java:512)
>> > [clojure-1.5.1.jar:na]
>> >         at backtype.storm.daemon.worker$_main.invoke(worker.clj:504)
>> > [storm-core-0.9.6.jar:0.9.6]
>> >         at clojure.lang.AFn.applyToHelper(AFn.java:172)
>> > [clojure-1.5.1.jar:na]
>> >         at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
>> >         at backtype.storm.daemon.worker.main(Unknown Source)
>> > [storm-core-0.9.6.jar:0.9.6]
>> > java.net.BindException: Address already in use
>> >         at sun.nio.ch.Net.bind0(Native Method) ~[na:1.8.0_45]
>> >         at sun.nio.ch.Net.bind(Net.java:437) ~[na:1.8.0_45]
>> >         at sun.nio.ch.Net.bind(Net.java:429) ~[na:1.8.0_45]
>> >         at
>> >
>> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>> > ~[na:1.8.0_45]
>> >         at
>> > sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>> > ~[na:1.8.0_45]
>> >         at
>> >
>> org.apache.storm.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
>> > ~[storm-core-0.9.6.jar:0.9.6]
>> >         at
>> >
>> org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:372)
>> > ~[storm-core-0.9.6.jar:0.9.6]
>> >         at
>> >
>> org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:296)
>> > ~[storm-core-0.9.6.jar:0.9.6]
>> >         at
>> >
>> org.apache.storm.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
>> > ~[storm-core-0.9.6.jar:0.9.6]
>> >         at
>> >
>> org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>> > ~[storm-core-0.9.6.jar:0.9.6]
>> >         at
>> >
>> org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>> > ~[storm-core-0.9.6.jar:0.9.6]
>> >         at
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> > ~[na:1.8.0_45]
>> >         at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> > ~[na:1.8.0_45]
>> >         at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_45]
>> > 2016-07-28 18:34:00 [main] b.s.util [ERROR] Halting process: ("Error on
>> > initialization")
>> > java.lang.RuntimeException: ("Error on initialization")
>> >         at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325)
>> > [storm-core-0.9.6.jar:0.9.6]
>> >         at clojure.lang.RestFn.invoke(RestFn.java:423)
>> > [clojure-1.5.1.jar:na]
>> >         at
>> >
>> backtype.storm.daemon.worker$fn__4629$mk_worker__4685.doInvoke(worker.clj:393)
>> > [storm-core-0.9.6.jar:0.9.6]
>> >         at clojure.lang.RestFn.invoke(RestFn.java:512)
>> > [clojure-1.5.1.jar:na]
>> >         at backtype.storm.daemon.worker$_main.invoke(worker.clj:504)
>> > [storm-core-0.9.6.jar:0.9.6]
>> >         at clojure.lang.AFn.applyToHelper(AFn.java:172)
>> > [clojure-1.5.1.jar:na]
>> >         at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
>> >         at backtype.storm.daemon.worker.main(Unknown Source)
>> > [storm-core-0.9.6.jar:0.9.6]
>> >
>> >
>> > We are running storm 0.9.6. The ports that we have assigned for the
>> > supervisor are 59027, 59028, 59029, 59030.  When I run commands to
>> check if
>> > anything is running on those ports ( for eg. netstat -an | grep 59027
>> ), I
>> > do not get back any results. So it looks like there is nothing running
>> on
>> > those ports. (Based on this :
>> >
>> http://grokbase.com/t/gg/storm-user/137h7hr7f0/hi-when-i-run-storm-ui-i-get-address-is-already-in-use-error
>> )
>> > It almost seems the storm supervisor on that box is not able to open up
>> > those ports for the workers to be started on. Does anyone know how this
>> > problem can be solved/debugged? This cluster was working without any
>> issues
>> > and then we started hitting the “Address already in use” errors and have
>> > been unable to get around it. If you need any more information about the
>> > nature of our setup, please let me know.
>> >
>> > Thanks!
>> >
>> > Best,
>> > Arjun
>>
>
>

Reply via email to