I think I just found the issue. I thought we had a box big enough to run
the Graylog2 server, plus Web Interface, but we had a bunch of Steams
enabled recently. We disabled them to see what would happen and we came
back to full processing capacity (~1750 msg/s). I'm suggesting that we get
a separate box for the web interface now.
On Tuesday, May 6, 2014 12:53:44 PM UTC-6, Tyler Bell wrote:
>
> There are no ES errors. Cluster Health is Green. I see data being added to
> my /data partition. Is there a way to see what else ES could be doing that
> would force Graylog to only process 1/3 of the logs it was processing a
> week ago?
>
> {
> "cluster_name" : "XXXXXXXXX",
> "status" : "green",
> "timed_out" : false,
> "number_of_nodes" : 3,
> "number_of_data_nodes" : 2,
> "active_primary_shards" : 320,
> "active_shards" : 320,
> "relocating_shards" : 0,
> "initializing_shards" : 0,
> "unassigned_shards" : 0
> }
>
>
> On Tuesday, May 6, 2014 12:29:53 PM UTC-6, lennart wrote:
>>
>> Can you check your ElasticSearch logs for errors? I am pretty sure it
>> is the reason.
>>
>> On Tue, May 6, 2014 at 5:57 PM, Tyler Bell <[email protected]>
>> wrote:
>> > I'm having an issue with Graylog continuously falling behind with log
>> > processing, and the MasterCache filling up til the 10G of Heap Space
>> maxes
>> > out and crashes. The really weird thing is that a week ago, everything
>> was
>> > processing fine and I was taking between 1500-2000 msg/s. Now I barely
>> get
>> > over 500-750 msg/s. I don't think ElasticSearch is the issue because
>> none of
>> > the OutputCache or Buffer is increasing.
>> >
>> > I'm wondering if it has something to do with this: Number of indices
>> (80)
>> > higher than limit (20). Running retention for 60 indices. It doesn't
>> look
>> > like Graylog is properly rotating indexes and running this retention
>> > instead.
>> >
>> > After restarting graylog2 and emptying cache...
>> > [util][caches][2014-05-06T08:46:04.850-07:00] InputCache size: 5758
>> > [util][caches][2014-05-06T08:46:04.850-07:00] OutputCache size: 0
>> > [util][buffers][2014-05-06T08:46:04.850-07:00] OutputBuffer is at 0.0%.
>> > [0/2048]
>> > [util][buffers][2014-05-06T08:46:04.850-07:00] ProcessBuffer is at
>> > 33.251953%. [681/2048]
>> > [util][heap][2014-05-06T08:46:04.850-07:00] Used memory (MB): 1465
>> > [util][heap][2014-05-06T08:46:04.850-07:00] Free memory (MB): 8330
>> > [util][heap][2014-05-06T08:46:04.850-07:00] Total memory (MB): 9814
>> > [util][heap][2014-05-06T08:46:04.850-07:00] Max memory (MB): 9814
>> > [util][written][2014-05-06T08:46:04.850-07:00] Messages written to all
>> > outputs: 1561
>> >
>> >
>> > After MasterCache fills up a bit
>> > [util][caches][2014-05-06T08:42:18.109-07:00] InputCache size: 2487587
>> > [util][caches][2014-05-06T08:42:18.109-07:00] OutputCache size: 0
>> > [util][buffers][2014-05-06T08:42:18.109-07:00] OutputBuffer is at 0.0%.
>> > [0/2048]
>> > [util][buffers][2014-05-06T08:42:18.109-07:00] ProcessBuffer is at
>> > 40.429688%. [828/2048]
>> > [util][heap][2014-05-06T08:42:18.109-07:00] Used memory (MB): 6392
>> > [util][heap][2014-05-06T08:42:18.109-07:00] Free memory (MB): 3736
>> > [util][heap][2014-05-06T08:42:18.109-07:00] Total memory (MB): 10129
>> > [util][heap][2014-05-06T08:42:18.109-07:00] Max memory (MB): 10129
>> > [util][written][2014-05-06T08:42:18.109-07:00] Messages written to all
>> > outputs: 3100
>> >
>> >
>> > ES Node config: (GLNode0 is the Graylog server). I know mlockall is
>> false,
>> > and is configured to be true, but these are virtualized servers and
>> there
>> > are some issues there.
>> >
>> > {
>> > "ok" : true,
>> > "cluster_name" : "Graylog2",
>> > "nodes" : {
>> > "X.X.X.X" : {
>> > "name" : "GLNode1",
>> > "transport_address" : "inet[/X.X.X.X:9300]",
>> > "hostname" : "X.X.X.X",
>> > "version" : "0.90.10",
>> > "http_address" : "inet[/X.X.X.X:9200]",
>> > "attributes" : {
>> > "master" : "true"
>> > },
>> > "process" : {
>> > "refresh_interval" : 1000,
>> > "id" : 1611,
>> > "max_file_descriptors" : 32000,
>> > "mlockall" : false
>> > }
>> > },
>> > "X.X.X.X" : {
>> > "name" : "GLNode0",
>> > "transport_address" : "inet[/X.X.X.X:9350]",
>> > "hostname" : "X.X.X.X",
>> > "version" : "0.90.10",
>> > "attributes" : {
>> > "client" : "true",
>> > "data" : "false",
>> > "master" : "false"
>> > },
>> > "process" : {
>> > "refresh_interval" : 1000,
>> > "id" : 28382,
>> > "max_file_descriptors" : 4096,
>> > "mlockall" : false
>> > }
>> > },
>> > "X.X.X.X" : {
>> > "name" : "GLNode2",
>> > "transport_address" : "inet[/X.X.X.X:9300]",
>> > "hostname" : "X.X.X.X",
>> > "version" : "0.90.10",
>> > "http_address" : "inet[/X.X.X.X:9200]",
>> > "attributes" : {
>> > "master" : "false"
>> > },
>> > "process" : {
>> > "refresh_interval" : 1000,
>> > "id" : 4508,
>> > "max_file_descriptors" : 32000,
>> > "mlockall" : false
>> > }
>> > }
>> > }
>> > }
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups
>> > "graylog2" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an
>> > email to [email protected].
>> > For more options, visit https://groups.google.com/d/optout.
>>
>
--
You received this message because you are subscribed to the Google Groups
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.