Hi,
Unsure if this is the right place to mention this but we`ve been using
Graylog since pre 1.0 and are now on version 1.2.1 as supplied recently.
The thing is something started to occur to our Graylog instance in our
Production environment (Our Test instance is fine on the same version) a
couple of version back where when we view the Streams page the throughput
for the various streams is show but intermittently fails with "Throughput
unavailable" messages. This occurs every few seconds normally and our
Graylog server log fill up with these warnings \ errors:
2015-09-25 15:02:07,399 WARN :
org.graylog2.jersey.container.netty.ChunkedRequestAssembler - Error while
assembling HTTP request chunks
java.lang.NullPointerException
at
org.graylog2.jersey.container.netty.ChunkedRequestAssembler.assemble(ChunkedRequestAssembler.java:40)
at
org.graylog2.jersey.container.netty.NettyContainer.messageReceived(NettyContainer.java:278)
at
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:43)
at
org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:67)
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at
org.jboss.netty.handler.execution.MemoryAwareThreadPoolExecutor$MemoryAwareRunnable.run(MemoryAwareThreadPoolExecutor.java:606)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2015-09-25 15:04:07,437 ERROR:
org.graylog2.jersey.container.netty.ChunkedRequestAssembler - Chunks for
channel [id: 0xfa9a6d5a, /172.23.72.191:55890 => /172.23.72.191:12900]
couldn't be found, skipping chunk.
2015-09-25 15:05:27,566 ERROR:
org.graylog2.jersey.container.netty.ChunkedRequestAssembler - Chunks for
channel [id: 0x91cd1818, /172.23.72.191:55986 => /172.23.72.191:12900]
couldn't be found, skipping chunk.
2015-09-25 15:06:07,478 ERROR:
org.graylog2.jersey.container.netty.ChunkedRequestAssembler - Chunks for
channel [id: 0x4a3c1ff8, /172.23.72.191:56037 => /172.23.72.191:12900]
couldn't be found, skipping chunk.
We are also seeing this in our Graylog web logs (Although this may be a
separate issue but i`d suspect both are related):
2015-09-25 15:22:47,787 [pool-80-thread-1] ERROR
o.graylog2.restclient.lib.ApiClient - API call timed out
java.util.concurrent.TimeoutException: null
at
com.ning.http.client.providers.netty.future.NettyResponseFuture.get(NettyResponseFuture.java:159)
~[com.ning.async-http-client-1.9.31.jar:na]
at
org.graylog2.restclient.lib.ApiClientImpl$ApiRequestBuilder.executeOnAll(ApiClientImpl.java:608)
~[org.graylog2.graylog2-rest-client--1.2.1-1.2.1.jar:na]
at
controllers.api.MetricsController$PollingJob.run(MetricsController.java:117)
[graylog-web-interface.graylog-web-interface-1.2.1.jar:1.2.1]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_60]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
[na:1.8.0_60]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
[na:1.8.0_60]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
[na:1.8.0_60]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_60]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_60]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
2015-09-25 15:22:47,787 [pool-80-thread-1] INFO
org.graylog2.restclient.models.Node - Node
{'ef3fbd92-6ff1-49e4-8fa2-dbd6acce3820',
http://graylogsrv02prd.vcint.com:12900, inactive, failed: 390 times}
failed, marking as inactive.
This last error occurs quite a lot.
In Production we run 4 x physical servers (2 x Elasticsearch nodes & 2 x
Graylog frontend nodes both running one Graylog server and one Graylog web
instance). We run F5 load balancers in front of the Graylog frontend
servers to provide the web and LB input services. Networking on each
frontend server is simply 2 x GBit NIC`s in mode 1 (Active \ Backup)
bonding. Oh and we are using RHEL 6.7 (64bit) at our OS layer.
Has anyone experienced this sort of thing before after an upgrade? Just
wondering.
I can supply more information if needs be.
Generally the web UI seems to work fine for the most part despite the
tonnes of API call timed out errors written to the logs.
Thanks,
Christopher Murchison
--
You received this message because you are subscribed to the Google Groups
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/graylog2/8fc43ac5-9592-4f8f-8826-172e500bb891%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.