Hi Alexia! This should work exactly as you configured it, the web interface will connect to both servers in a round robin fashion. The default heap sizes should not cause long GC pauses when faced with moderate load, I doubt that this is the cause of these problems.
A couple of things to check: 1) Are there any errors in the server logs? 2) Can you reach the servers via curl from the web interface machine? for both servers: curl -HAccept:application/json http://SVR2_IP_ADDRESS:12900/system/cluster/node 2a) Can you access the API browsers of either servers? They are served under /api-browser on each Graylog server node. 3) Are there any firewalls or other packet filters between the nodes? 4) Can you check the node id files on both servers to verify that they have different uuids? cheers, Kay On Monday, 16 November 2015 15:41:58 UTC+1, Alexia Golez wrote: > > Hi guys > > We have an issue with adding a Graylog server to an existing Graylog > cluster. Basically, we need to add another server to scale out our logging > capabilities. > We're doing this right now on test cluster just to understand what we have > to do in production. > > *Environment* > We have an existing Graylog cluster with the following machines: > > - Graylog web interface box > - Graylog server > - 3 Elasticsearch nodes > > > On the existing Graylog server [or svr1], ismaster=true in its > server.conf and its rest_listen_ip is set to the internal network IP of > 10.x.y.z. > The new Graylog server [svr2] that I am adding into the cluster has > ismaster=false and just like the primary Graylog server its rest_listen_ip > is set to its internal network IP. > > > *Errors* > The existing cluster was operating as expected until we added the second > server - svr2. > > When we start up svr2, the Graylog web interface server appears to see it > and on login to the Graylog admin page, I can see it registered under > System>Nodes. However, after a few minutes, svr2 drops off the Nodes page. > On checking logs of the the Graylog web interface box, I see multiple API > call failures for svr 2 and a few for svr1: > > > > 2015-11-16 13:57:03,273 - [ERROR] - from > org.graylog2.restclient.lib.ApiClient in servernodes-refresh-0 > API call failed to execute. > java.util.concurrent.ExecutionException: java.net.ConnectException: > Connection refused: /SVR2_IP_ADDRESS:12900 to > http://SVR2_IP_ADDRESS:12900/system/cluster/node > at > com.ning.http.client.providers.netty.NettyResponseFuture.abort(NettyResponseFuture.java:342) > > ~[com.ning.async-http-client-1.8.14.jar:na] > at > com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:108) > > ~[com.ning.async-http-client-1.8.14.jar:na] > at > org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:431) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:422) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:384) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > [na:1.8.0_11] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > [na:1.8.0_11] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_11] > Caused by: java.net.ConnectException: Connection refused: > /SVR2_IP_ADDRESS:12900 to http://SVR2_IP_ADDRESS:12900/system/cluster/node > at > com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:104) > > ~[com.ning.async-http-client-1.8.14.jar:na] > ... 12 common frames omitted > Caused by: java.net.ConnectException: Connection refused: > /SVR2_IP_ADDRESS:12900 > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > ~[na:1.8.0_11] > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:712) > ~[na:1.8.0_11] > at > org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) > > ~[io.netty.netty-3.9.3.Final.jar:na] > ... 8 common frames omitted > > 2015-11-16 13:57:03,276 - [ERROR] - from > org.graylog2.restclient.lib.ApiClient in servernodes-refresh-0 > API call failed to execute. > java.util.concurrent.ExecutionException: java.net.ConnectException: > Connection refused: /SVR1_IP_ADDRESS:12900 to > http://SVR1_IP_ADDRESS:12900/system/cluster/node > at > com.ning.http.client.providers.netty.NettyResponseFuture.abort(NettyResponseFuture.java:342) > > ~[com.ning.async-http-client-1.8.14.jar:na] > at > com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:108) > > ~[com.ning.async-http-client-1.8.14.jar:na] > at > org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:431) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:422) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:384) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > [na:1.8.0_11] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > [na:1.8.0_11] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_11] > Caused by: java.net.ConnectException: Connection refused: > /SVR1_IP_ADDRESS:12900 to http://SVR1_IP_ADDRESS:12900/system/cluster/node > at > com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:104) > > ~[com.ning.async-http-client-1.8.14.jar:na] > ... 12 common frames omitted > Caused by: java.net.ConnectException: Connection refused: > /SVR1_IP_ADDRESS:12900 > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > ~[na:1.8.0_11] > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:712) > ~[na:1.8.0_11] > at > org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) > > ~[io.netty.netty-3.9.3.Final.jar:na] > ... 8 common frames omitted > > 2015-11-16 13:57:08,282 - [ERROR] - from > org.graylog2.restclient.lib.ApiClient in servernodes-refresh-0 > API call failed to execute. > java.util.concurrent.ExecutionException: java.net.ConnectException: > Connection refused: /SVR2_IP_ADDRESS:12900 to > http://SVR2_IP_ADDRESS:12900/system/cluster/node > at > com.ning.http.client.providers.netty.NettyResponseFuture.abort(NettyResponseFuture.java:342) > > ~[com.ning.async-http-client-1.8.14.jar:na] > at > com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:108) > > ~[com.ning.async-http-client-1.8.14.jar:na] > at > org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:431) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:422) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:384) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:109) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > [na:1.8.0_11] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > [na:1.8.0_11] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_11] > Caused by: java.net.ConnectException: Connection refused: > /SVR2_IP_ADDRESS:12900 to http://SVR2_IP_ADDRESS:12900/system/cluster/node > at > com.ning.http.client.providers.netty.NettyConnectListener.operationComplete(NettyConnectListener.java:104) > > ~[com.ning.async-http-client-1.8.14.jar:na] > ... 12 common frames omitted > Caused by: java.net.ConnectException: Connection refused: > /SVR2_IP_ADDRESS:12900 > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > ~[na:1.8.0_11] > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:712) > ~[na:1.8.0_11] > at > org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152) > > ~[io.netty.netty-3.9.3.Final.jar:na] > at > org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105) > > ~[io.netty.netty-3.9.3.Final.jar:na] > ... 8 common frames omitted > > 2015-11-16 13:57:52,182 - [ERROR] - from > org.graylog2.restclient.lib.ApiClient in > play-akka.actor.default-dispatcher-10 > API call failed to execute. > java.util.concurrent.ExecutionException: > java.util.concurrent.TimeoutException: No response received after 5000 > at > com.ning.http.client.providers.netty.NettyResponseFuture.get(NettyResponseFuture.java:266) > > ~[com.ning.async-http-client-1.8.14.jar:na] > at > org.graylog2.restclient.lib.ApiClientImpl$ApiRequestBuilder.executeOnAll(ApiClientImpl.java:558) > > ~[org.graylog2.graylog2-rest-client-1.0.1.jar:na] > at > org.graylog2.restclient.models.ClusterService.getClusterJvmStats(ClusterService.java:157) > > [org.graylog2.graylog2-rest-client-1.0.1.jar:na] > at controllers.NodesController.nodes(NodesController.java:61) > [graylog-web-interface.graylog-web-interface-1.0.1.jar:1.0.1] > at > Routes$$anonfun$routes$1$$anonfun$applyOrElse$44$$anonfun$apply$496.apply(routes_routing.scala:1691) > > [graylog-web-interface.graylog-web-interface-1.0.1.jar:na] > at > Routes$$anonfun$routes$1$$anonfun$applyOrElse$44$$anonfun$apply$496.apply(routes_routing.scala:1691) > > [graylog-web-interface.graylog-web-interface-1.0.1.jar:na] > at > play.core.Router$HandlerInvokerFactory$$anon$4.resultCall(Router.scala:264) > [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] > at > play.core.Router$HandlerInvokerFactory$JavaActionInvokerFactory$$anon$15$$anon$1.invocation(Router.scala:255) > > [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] > at play.core.j.JavaAction$$anon$1.call(JavaAction.scala:55) > [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] > at play.GlobalSettings$1.call(GlobalSettings.java:67) > [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] > at play.mvc.Security$AuthenticatedAction.call(Security.java:44) > [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] > at play.core.j.JavaAction$$anonfun$11.apply(JavaAction.scala:82) > [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] > at play.core.j.JavaAction$$anonfun$11.apply(JavaAction.scala:82) > [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > > [org.scala-lang.scala-library-2.10.4.jar:na] > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > [org.scala-lang.scala-library-2.10.4.jar:na] > at > play.core.j.HttpExecutionContext$$anon$2.run(HttpExecutionContext.scala:40) > [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] > at > play.api.libs.iteratee.Execution$trampoline$.execute(Execution.scala:46) > [com.typesafe.play.play-iteratees_2.10-2.3.6.jar:2.3.6] > at > play.core.j.HttpExecutionContext.execute(HttpExecutionContext.scala:32) > [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] > at scala.concurrent.impl.Future$.apply(Future.scala:31) > [org.scala-lang.scala-library-2.10.4.jar:na] > at scala.concurrent.Future$.apply(Future.scala:485) > [org.scala-lang.scala-library-2.10.4.jar:na] > at play.core.j.JavaAction$class.apply(JavaAction.scala:82) > [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] > at > play.core.Router$HandlerInvokerFactory$JavaActionInvokerFactory$$anon$15$$anon$1.apply(Router.scala:252) > > [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] > at > play.api.mvc.Action$$anonfun$apply$1$$anonfun$apply$4$$anonfun$apply$5.apply(Action.scala:130) > > [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] > at > play.api.mvc.Action$$anonfun$apply$1$$anonfun$apply$4$$anonfun$apply$5.apply(Action.scala:130) > > [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] > at play.utils.Threads$.withContextClassLoader(Threads.scala:21) > [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] > at > play.api.mvc.Action$$anonfun$apply$1$$anonfun$apply$4.apply(Action.scala:129) > [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] > at > play.api.mvc.Action$$anonfun$apply$1$$anonfun$apply$4.apply(Action.scala:128) > [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] > at scala.Option.map(Option.scala:145) > [org.scala-lang.scala-library-2.10.4.jar:na] > at play.api.mvc.Action$$anonfun$apply$1.apply(Action.scala:128) > [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] > at play.api.mvc.Action$$anonfun$apply$1.apply(Action.scala:121) > [com.typesafe.play.play_2.10-2.3.6.jar:2.3.6] > at > play.api.libs.iteratee.Iteratee$$anonfun$mapM$1.apply(Iteratee.scala:483) > [com.typesafe.play.play-iteratees_2.10-2.3.6.jar:2.3.6] > at > play.api.libs.iteratee.Iteratee$$anonfun$mapM$1.apply(Iteratee.scala:483) > [com.typesafe.play.play-iteratees_2.10-2.3.6.jar:2.3.6] > at > play.api.libs.iteratee.Iteratee$$anonfun$flatMapM$1.apply(Iteratee.scala:519) > [com.typesafe.play.play-iteratees_2.10-2.3.6.jar:2.3.6] > at > play.api.libs.iteratee.Iteratee$$anonfun$flatMapM$1.apply(Iteratee.scala:519) > [com.typesafe.play.play-iteratees_2.10-2.3.6.jar:2.3.6] > at > play.api.libs.iteratee.Iteratee$$anonfun$flatMap$1$$anonfun$apply$14.apply(Iteratee.scala:496) > > [com.typesafe.play.play-iteratees_2.10-2.3.6.jar:2.3.6] > at > play.api.libs.iteratee.Iteratee$$anonfun$flatMap$1$$anonfun$apply$14.apply(Iteratee.scala:496) > > [com.typesafe.play.play-iteratees_2.10-2.3.6.jar:2.3.6] > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > > [org.scala-lang.scala-library-2.10.4.jar:na] > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > [org.scala-lang.scala-library-2.10.4.jar:na] > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) > [com.typesafe.akka.akka-actor_2.10-2.3.4.jar:na] > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) > > [com.typesafe.akka.akka-actor_2.10-2.3.4.jar:na] > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > [org.scala-lang.scala-library-2.10.4.jar:na] > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > [org.scala-lang.scala-library-2.10.4.jar:na] > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > [org.scala-lang.scala-library-2.10.4.jar:na] > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > [org.scala-lang.scala-library-2.10.4.jar:na] > Caused by: java.util.concurrent.TimeoutException: No response received > after 5000 > at > com.ning.http.client.providers.netty.NettyResponseFuture.get(NettyResponseFuture.java:260) > > ~[com.ning.async-http-client-1.8.14.jar:na] > ... 43 common frames omitted > > > It seems like timeouts or API failures are causing node comms to fallover > on the cluster. We tried upping the Heap size on the web interface but no > luck. > This cluster is not under any load - basically its under build out right > now, so message throughput should not be an issue. > > Any qs for more info, let me know! > Any help you can give would be appreciated! > > > Thanks > Alexia > > > > > -- You received this message because you are subscribed to the Google Groups "Graylog Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/graylog2/f89bbbd6-d7b3-4135-816f-60bdabec5d3e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
