Hi guys We have an issue with adding a Graylog server to an existing Graylog cluster. Basically, we need to add another server to scale out our logging capabilities. We're doing this right now on test cluster just to understand what we have to do in production.
*Environment* We have an existing Graylog cluster with the following machines: - Graylog web interface box - Graylog server - 3 Elasticsearch nodes On the existing Graylog server [or svr1], ismaster=true in its server.conf and its rest_listen_ip is set to the internal network IP of 10.x.y.z. The new Graylog server [svr2] that I am adding into the cluster has ismaster=false and just like the primary Graylog server its rest_listen_ip is set to its internal network IP. *Errors* The existing cluster was operating as expected until we added the second server - svr2. When we start up svr2, the Graylog web interface server appears to see it and on login to the Graylog admin page, I can see it registered under System>Nodes. However, after a few minutes, svr2 drops off the Nodes page. On checking logs of the the Graylog web interface box, I see multiple API call failures for svr 2 and a few for svr1: 2015-11-16 13:57:03,273 - [ERROR] - from org.graylog2.restclient.lib.ApiClient in servernodes-refresh-0 API call failed to execute. java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused: /SVR2_IP_ADDRESS:12900 to http://SVR2_IP_ADDRESS:12900/system/cluster/node at com.ning.http.client.providers.netty.NettyResponseFuture.abort(NettyResponseFuture.java:342) ~[com.ning.async-http-client-1.8.14.jar:na] at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:108) ~[com.ning.async-http-client-1.8.14.jar:na] at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:431) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:422) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:384) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[io.netty.netty-3.9.3.Final.jar:na] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_11] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_11] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_11] Caused by: java.net.ConnectException: Connection refused: /SVR2_IP_ADDRESS:12900 to http://SVR2_IP_ADDRESS:12900/system/cluster/node at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:104) ~[com.ning.async-http-client-1.8.14.jar:na] ... 12 common frames omitted Caused by: java.net.ConnectException: Connection refused: /SVR2_IP_ADDRESS:12900 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.8.0_11] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:712) ~[na:1.8.0_11] at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) ~[io.netty.netty-3.9.3.Final.jar:na] ... 8 common frames omitted 2015-11-16 13:57:03,276 - [ERROR] - from org.graylog2.restclient.lib.ApiClient in servernodes-refresh-0 API call failed to execute. java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused: /SVR1_IP_ADDRESS:12900 to http://SVR1_IP_ADDRESS:12900/system/cluster/node at com.ning.http.client.providers.netty.NettyResponseFuture.abort(NettyResponseFuture.java:342) ~[com.ning.async-http-client-1.8.14.jar:na] at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:108) ~[com.ning.async-http-client-1.8.14.jar:na] at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:431) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:422) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:384) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[io.netty.netty-3.9.3.Final.jar:na] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_11] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_11] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_11] Caused by: java.net.ConnectException: Connection refused: /SVR1_IP_ADDRESS:12900 to http://SVR1_IP_ADDRESS:12900/system/cluster/node at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:104) ~[com.ning.async-http-client-1.8.14.jar:na] ... 12 common frames omitted Caused by: java.net.ConnectException: Connection refused: /SVR1_IP_ADDRESS:12900 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.8.0_11] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:712) ~[na:1.8.0_11] at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) ~[io.netty.netty-3.9.3.Final.jar:na] ... 8 common frames omitted 2015-11-16 13:57:08,282 - [ERROR] - from org.graylog2.restclient.lib.ApiClient in servernodes-refresh-0 API call failed to execute. java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused: /SVR2_IP_ADDRESS:12900 to http://SVR2_IP_ADDRESS:12900/system/cluster/node at com.ning.http.client.providers.netty.NettyResponseFuture.abort(NettyResponseFuture.java:342) ~[com.ning.async-http-client-1.8.14.jar:na] at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:108) ~[com.ning.async-http-client-1.8.14.jar:na] at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:431) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:422) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:384) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ~[io.netty.netty-3.9.3.Final.jar:na] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_11] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_11] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_11] Caused by: java.net.ConnectException: Connection refused: /SVR2_IP_ADDRESS:12900 to http://SVR2_IP_ADDRESS:12900/system/cluster/node at com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:104) ~[com.ning.async-http-client-1.8.14.jar:na] ... 12 common frames omitted Caused by: java.net.ConnectException: Connection refused: /SVR2_IP_ADDRESS:12900 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.8.0_11] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:712) ~[na:1.8.0_11] at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) ~[io.netty.netty-3.9.3.Final.jar:na] at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) ~[io.netty.netty-3.9.3.Final.jar:na] ... 8 common frames omitted 2015-11-16 13:57:52,182 - [ERROR] - from org.graylog2.restclient.lib.ApiClient in play-akka.actor.default-dispatcher-10 API call failed to execute. java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: No response received after 5000 at com.ning.http.client.providers.netty.NettyResponseFuture.get(NettyResponseFuture.java:266) ~[com.ning.async-http-client-1.8.14.jar:na] at org.graylog2.restclient.lib.ApiClientImpl$ApiRequestBuilder.executeOnAll(ApiClientImpl.java:558) ~[org.graylog2.graylog2-rest-client-1.0.1.jar:na] at org.graylog2.restclient.models.ClusterService.getClusterJvmStats(ClusterService.java:157) [org.graylog2.graylog2-rest-client-1.0.1.jar:na] at controllers.NodesController.nodes(NodesController.java:61) [graylog-web-interface.graylog-web-interface-1.0.1.jar:1.0.1] at Routes$$anonfun$routes$1$$anonfun$applyOrElse$44$$anonfun$apply$496.apply(routes_routing.scala:1691) [graylog-web-interface.graylog-web-interface-1.0.1.jar:na] at Routes$$anonfun$routes$1$$anonfun$applyOrElse$44$$anonfun$apply$496.apply(routes_routing.scala:1691) [graylog-web-interface.graylog-web-interface-1.0.1.jar:na] at play.core.Router$HandlerInvokerFactory$$anon$4.resultCall(Router.scala:264) [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] at play.core.Router$HandlerInvokerFactory$JavaActionInvokerFactory$$anon$15$$anon$1.invocation(Router.scala:255) [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] at play.core.j.JavaAction$$anon$1.call(JavaAction.scala:55) [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] at play.GlobalSettings$1.call(GlobalSettings.java:67) [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] at play.mvc.Security$AuthenticatedAction.call(Security.java:44) [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] at play.core.j.JavaAction$$anonfun$11.apply(JavaAction.scala:82) [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] at play.core.j.JavaAction$$anonfun$11.apply(JavaAction.scala:82) [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) [org.scala-lang.scala-library-2.10.4.jar:na] at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) [org.scala-lang.scala-library-2.10.4.jar:na] at play.core.j.HttpExecutionContext$$anon$2.run(HttpExecutionContext.scala:40) [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] at play.api.libs.iteratee.Execution$trampoline$.execute(Execution.scala:46) [com.typesafe.play.play-iteratees_2.10-2.3.6.jar:2.3.6] at play.core.j.HttpExecutionContext.execute(HttpExecutionContext.scala:32) [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] at scala.concurrent.impl.Future$.apply(Future.scala:31) [org.scala-lang.scala-library-2.10.4.jar:na] at scala.concurrent.Future$.apply(Future.scala:485) [org.scala-lang.scala-library-2.10.4.jar:na] at play.core.j.JavaAction$class.apply(JavaAction.scala:82) [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] at play.core.Router$HandlerInvokerFactory$JavaActionInvokerFactory$$anon$15$$anon$1.apply(Router.scala:252) [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] at play.api.mvc.Action$$anonfun$apply$1$$anonfun$apply$4$$anonfun$apply$5.apply(Action.scala:130) [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] at play.api.mvc.Action$$anonfun$apply$1$$anonfun$apply$4$$anonfun$apply$5.apply(Action.scala:130) [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] at play.utils.Threads$.withContextClassLoader(Threads.scala:21) [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] at play.api.mvc.Action$$anonfun$apply$1$$anonfun$apply$4.apply(Action.scala:129) [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] at play.api.mvc.Action$$anonfun$apply$1$$anonfun$apply$4.apply(Action.scala:128) [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] at scala.Option.map(Option.scala:145) [org.scala-lang.scala-library-2.10.4.jar:na] at play.api.mvc.Action$$anonfun$apply$1.apply(Action.scala:128) [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] at play.api.mvc.Action$$anonfun$apply$1.apply(Action.scala:121) [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] at play.api.libs.iteratee.Iteratee$$anonfun$mapM$1.apply(Iteratee.scala:483) [com.typesafe.play.play-iteratees_2.10-2.3.6.jar:2.3.6] at play.api.libs.iteratee.Iteratee$$anonfun$mapM$1.apply(Iteratee.scala:483) [com.typesafe.play.play-iteratees_2.10-2.3.6.jar:2.3.6] at play.api.libs.iteratee.Iteratee$$anonfun$flatMapM$1.apply(Iteratee.scala:519) [com.typesafe.play.play-iteratees_2.10-2.3.6.jar:2.3.6] at play.api.libs.iteratee.Iteratee$$anonfun$flatMapM$1.apply(Iteratee.scala:519) [com.typesafe.play.play-iteratees_2.10-2.3.6.jar:2.3.6] at play.api.libs.iteratee.Iteratee$$anonfun$flatMap$1$$anonfun$apply$14.apply(Iteratee.scala:496) [com.typesafe.play.play-iteratees_2.10-2.3.6.jar:2.3.6] at play.api.libs.iteratee.Iteratee$$anonfun$flatMap$1$$anonfun$apply$14.apply(Iteratee.scala:496) [com.typesafe.play.play-iteratees_2.10-2.3.6.jar:2.3.6] at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) [org.scala-lang.scala-library-2.10.4.jar:na] at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) [org.scala-lang.scala-library-2.10.4.jar:na] at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) [com.typesafe.akka.akka-actor_2.10-2.3.4.jar:na] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) [com.typesafe.akka.akka-actor_2.10-2.3.4.jar:na] at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [org.scala-lang.scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [org.scala-lang.scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [org.scala-lang.scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [org.scala-lang.scala-library-2.10.4.jar:na] Caused by: java.util.concurrent.TimeoutException: No response received after 5000 at com.ning.http.client.providers.netty.NettyResponseFuture.get(NettyResponseFuture.java:260) ~[com.ning.async-http-client-1.8.14.jar:na] ... 43 common frames omitted It seems like timeouts or API failures are causing node comms to fallover on the cluster. We tried upping the Heap size on the web interface but no luck. This cluster is not under any load - basically its under build out right now, so message throughput should not be an issue. Any qs for more info, let me know! Any help you can give would be appreciated! Thanks Alexia -- This message is for the named person's use only. If you received this message in error, please immediately delete it and all copies and notify the sender. You must not, directly or indirectly, use, disclose, distribute, print, or copy any part of this message if you are not the intended recipient. Any views expressed in this message are those of the individual sender and not Trustev Ltd. Trustev is registered in Ireland No. 516425 and trades from 2100 Cork Airport Business Park, Cork, Ireland. -- You received this message because you are subscribed to the Google Groups "Graylog Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/graylog2/5583b0f9-5703-48b5-b219-009ad2e4c0e3%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
