I think I just found the issue. I thought we had a box big enough to run 
the Graylog2 server, plus Web Interface, but we had a bunch of Steams 
enabled recently. We disabled them to see what would happen and we came 
back to full processing capacity (~1750 msg/s). I'm suggesting that we get 
a separate box for the web interface now.

On Tuesday, May 6, 2014 12:53:44 PM UTC-6, Tyler Bell wrote:
>
> There are no ES errors. Cluster Health is Green. I see data being added to 
> my /data partition. Is there a way to see what else ES could be doing that 
> would force Graylog to only process 1/3 of the logs it was processing a 
> week ago?
>
> {
>   "cluster_name" : "XXXXXXXXX",
>   "status" : "green",
>   "timed_out" : false,
>   "number_of_nodes" : 3,
>   "number_of_data_nodes" : 2,
>   "active_primary_shards" : 320,
>   "active_shards" : 320,
>   "relocating_shards" : 0,
>   "initializing_shards" : 0,
>   "unassigned_shards" : 0
> }
>
>
> On Tuesday, May 6, 2014 12:29:53 PM UTC-6, lennart wrote:
>>
>> Can you check your ElasticSearch logs for errors? I am pretty sure it 
>> is the reason. 
>>
>> On Tue, May 6, 2014 at 5:57 PM, Tyler Bell <[email protected]> 
>> wrote: 
>> > I'm having an issue with Graylog continuously falling behind with log 
>> > processing, and the MasterCache filling up til the 10G of Heap Space 
>> maxes 
>> > out and crashes. The really weird thing is that a week ago, everything 
>> was 
>> > processing fine and I was taking between 1500-2000 msg/s. Now I barely 
>> get 
>> > over 500-750 msg/s. I don't think ElasticSearch is the issue because 
>> none of 
>> > the OutputCache or Buffer is increasing. 
>> > 
>> > I'm wondering if it has something to do with this: Number of indices 
>> (80) 
>> > higher than limit (20). Running retention for 60 indices. It doesn't 
>> look 
>> > like Graylog is properly rotating indexes and running this retention 
>> > instead. 
>> > 
>> > After restarting graylog2 and emptying cache... 
>> > [util][caches][2014-05-06T08:46:04.850-07:00] InputCache size: 5758 
>> > [util][caches][2014-05-06T08:46:04.850-07:00] OutputCache size: 0 
>> > [util][buffers][2014-05-06T08:46:04.850-07:00] OutputBuffer is at 0.0%. 
>> > [0/2048] 
>> > [util][buffers][2014-05-06T08:46:04.850-07:00] ProcessBuffer is at 
>> > 33.251953%. [681/2048] 
>> > [util][heap][2014-05-06T08:46:04.850-07:00] Used memory (MB): 1465 
>> > [util][heap][2014-05-06T08:46:04.850-07:00] Free memory (MB): 8330 
>> > [util][heap][2014-05-06T08:46:04.850-07:00] Total memory (MB): 9814 
>> > [util][heap][2014-05-06T08:46:04.850-07:00] Max memory (MB): 9814 
>> > [util][written][2014-05-06T08:46:04.850-07:00] Messages written to all 
>> > outputs: 1561 
>> > 
>> > 
>> > After MasterCache fills up a bit 
>> > [util][caches][2014-05-06T08:42:18.109-07:00] InputCache size: 2487587 
>> > [util][caches][2014-05-06T08:42:18.109-07:00] OutputCache size: 0 
>> > [util][buffers][2014-05-06T08:42:18.109-07:00] OutputBuffer is at 0.0%. 
>> > [0/2048] 
>> > [util][buffers][2014-05-06T08:42:18.109-07:00] ProcessBuffer is at 
>> > 40.429688%. [828/2048] 
>> > [util][heap][2014-05-06T08:42:18.109-07:00] Used memory (MB): 6392 
>> > [util][heap][2014-05-06T08:42:18.109-07:00] Free memory (MB): 3736 
>> > [util][heap][2014-05-06T08:42:18.109-07:00] Total memory (MB): 10129 
>> > [util][heap][2014-05-06T08:42:18.109-07:00] Max memory (MB): 10129 
>> > [util][written][2014-05-06T08:42:18.109-07:00] Messages written to all 
>> > outputs: 3100 
>> > 
>> > 
>> > ES Node config: (GLNode0 is the Graylog server). I know mlockall is 
>> false, 
>> > and is configured to be true, but these are virtualized servers and 
>> there 
>> > are some issues there. 
>> > 
>> > { 
>> >   "ok" : true, 
>> >   "cluster_name" : "Graylog2", 
>> >   "nodes" : { 
>> >     "X.X.X.X" : { 
>> >       "name" : "GLNode1", 
>> >       "transport_address" : "inet[/X.X.X.X:9300]", 
>> >       "hostname" : "X.X.X.X", 
>> >       "version" : "0.90.10", 
>> >       "http_address" : "inet[/X.X.X.X:9200]", 
>> >       "attributes" : { 
>> >         "master" : "true" 
>> >       }, 
>> >       "process" : { 
>> >         "refresh_interval" : 1000, 
>> >         "id" : 1611, 
>> >         "max_file_descriptors" : 32000, 
>> >         "mlockall" : false 
>> >       } 
>> >     }, 
>> >     "X.X.X.X" : { 
>> >       "name" : "GLNode0", 
>> >       "transport_address" : "inet[/X.X.X.X:9350]", 
>> >       "hostname" : "X.X.X.X", 
>> >       "version" : "0.90.10", 
>> >       "attributes" : { 
>> >         "client" : "true", 
>> >         "data" : "false", 
>> >         "master" : "false" 
>> >       }, 
>> >       "process" : { 
>> >         "refresh_interval" : 1000, 
>> >         "id" : 28382, 
>> >         "max_file_descriptors" : 4096, 
>> >         "mlockall" : false 
>> >       } 
>> >     }, 
>> >     "X.X.X.X" : { 
>> >       "name" : "GLNode2", 
>> >       "transport_address" : "inet[/X.X.X.X:9300]", 
>> >       "hostname" : "X.X.X.X", 
>> >       "version" : "0.90.10", 
>> >       "http_address" : "inet[/X.X.X.X:9200]", 
>> >       "attributes" : { 
>> >         "master" : "false" 
>> >       }, 
>> >       "process" : { 
>> >         "refresh_interval" : 1000, 
>> >         "id" : 4508, 
>> >         "max_file_descriptors" : 32000, 
>> >         "mlockall" : false 
>> >       } 
>> >     } 
>> >   } 
>> > } 
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google 
>> Groups 
>> > "graylog2" group. 
>> > To unsubscribe from this group and stop receiving emails from it, send 
>> an 
>> > email to [email protected]. 
>> > For more options, visit https://groups.google.com/d/optout. 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to