Hi all,
We are active users of storm in production. One of our pre-prod clusters
however, is not functional at the moment. The storm daemons ( nimbus, ui,
logviewer, supervisor ) start up fine, but the storm workers are not get
instantiated, when we submit topologies. We see the following error in the
worker logs:
2016-07-28 18:33:59 [main] b.s.d.worker [INFO] Reading Assignments.
2016-07-28 18:34:00 [main] b.s.m.TransportFactory [INFO] Storm peer transport
plugin:backtype.storm.messaging.netty.Context
2016-07-28 18:34:00 [main] b.s.d.worker [INFO] Launching receive-thread for
b4560ed4-d257-4151-9764-633707282a1f:59027
2016-07-28 18:34:00 [main] b.s.m.n.Server [INFO] Create Netty Server
Netty-server-localhost-59027, buffer_size: 5242880, maxWorkers: 1
2016-07-28 18:34:00 [main] b.s.d.worker [ERROR] Error on initialization of
server mk-worker
org.apache.storm.netty.channel.ChannelException: Failed to bind to:
0.0.0.0/0.0.0.0:59027
at
org.apache.storm.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
~[storm-core-0.9.6.jar:0.9.6]
at backtype.storm.messaging.netty.Server.<init>(Server.java:130)
~[storm-core-0.9.6.jar:0.9.6]
at backtype.storm.messaging.netty.Context.bind(Context.java:73)
~[storm-core-0.9.6.jar:0.9.6]
at
backtype.storm.messaging.loader$launch_receive_thread_BANG_.doInvoke(loader.clj:68)
~[storm-core-0.9.6.jar:0.9.6]
at clojure.lang.RestFn.invoke(RestFn.java:668) [clojure-1.5.1.jar:na]
at
backtype.storm.daemon.worker$launch_receive_thread.invoke(worker.clj:380)
~[storm-core-0.9.6.jar:0.9.6]
at
backtype.storm.daemon.worker$fn__4629$exec_fn__1104__auto____4630.invoke(worker.clj:415)
~[storm-core-0.9.6.jar:0.9.6]
at clojure.lang.AFn.applyToHelper(AFn.java:185) [clojure-1.5.1.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
at clojure.core$apply.invoke(core.clj:617) ~[clojure-1.5.1.jar:na]
at
backtype.storm.daemon.worker$fn__4629$mk_worker__4685.doInvoke(worker.clj:393)
[storm-core-0.9.6.jar:0.9.6]
at clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.5.1.jar:na]
at backtype.storm.daemon.worker$_main.invoke(worker.clj:504)
[storm-core-0.9.6.jar:0.9.6]
at clojure.lang.AFn.applyToHelper(AFn.java:172) [clojure-1.5.1.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
at backtype.storm.daemon.worker.main(Unknown Source)
[storm-core-0.9.6.jar:0.9.6]
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method) ~[na:1.8.0_45]
at sun.nio.ch.Net.bind(Net.java:437) ~[na:1.8.0_45]
at sun.nio.ch.Net.bind(Net.java:429) ~[na:1.8.0_45]
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
~[na:1.8.0_45]
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
~[na:1.8.0_45]
at
org.apache.storm.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
~[storm-core-0.9.6.jar:0.9.6]
at
org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:372)
~[storm-core-0.9.6.jar:0.9.6]
at
org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:296)
~[storm-core-0.9.6.jar:0.9.6]
at
org.apache.storm.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
~[storm-core-0.9.6.jar:0.9.6]
at
org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
~[storm-core-0.9.6.jar:0.9.6]
at
org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
~[storm-core-0.9.6.jar:0.9.6]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_45]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
~[na:1.8.0_45]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_45]
2016-07-28 18:34:00 [main] b.s.util [ERROR] Halting process: ("Error on
initialization")
java.lang.RuntimeException: ("Error on initialization")
at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325)
[storm-core-0.9.6.jar:0.9.6]
at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
at
backtype.storm.daemon.worker$fn__4629$mk_worker__4685.doInvoke(worker.clj:393)
[storm-core-0.9.6.jar:0.9.6]
at clojure.lang.RestFn.invoke(RestFn.java:512) [clojure-1.5.1.jar:na]
at backtype.storm.daemon.worker$_main.invoke(worker.clj:504)
[storm-core-0.9.6.jar:0.9.6]
at clojure.lang.AFn.applyToHelper(AFn.java:172) [clojure-1.5.1.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
at backtype.storm.daemon.worker.main(Unknown Source)
[storm-core-0.9.6.jar:0.9.6]
We are running storm 0.9.6. The ports that we have assigned for the supervisor
are 59027, 59028, 59029, 59030. When I run commands to check if anything is
running on those ports ( for eg. netstat -an | grep 59027 ), I do not get back
any results. So it looks like there is nothing running on those ports. (Based
on this :
http://grokbase.com/t/gg/storm-user/137h7hr7f0/hi-when-i-run-storm-ui-i-get-address-is-already-in-use-error)
It almost seems the storm supervisor on that box is not able to open up those
ports for the workers to be started on. Does anyone know how this problem can
be solved/debugged? This cluster was working without any issues and then we
started hitting the “Address already in use” errors and have been unable to get
around it. If you need any more information about the nature of our setup,
please let me know.
Thanks!
Best,
Arjun