Re: [graylog2] Re: High CPU and did not find meta info issues since adding new Graylog servers and increased input messages/second

Mathieu Grzybek Mon, 04 May 2015 23:03:55 -0700

Hi,

What's the load average of the servers ? CPU usage is just a piece of
information and is not enough to diagnose anything.


Mathieu
Le 5 mai 2015 02:23, "Pete GS" <[email protected]> a écrit :

> Yesterday I did a yum update on all Graylog and MongoDB nodes and since
> doing that and rebooting them all (there was a kernel update) it seems that
> there are no longer issues connecting to the Mongo database.
>
> However, I'm still seeing excessively high CPU usage on the Graylog nodes
> where all vCPU's are regularly exceeding 95%.
>
> What can contribute to this? I'm a little stumped at present.
>
> I would say our average messages/second is around 5,000 to 6,000 with
> peaks up to about 12,000.
>
> Cheers, Pete
>
> On Friday, 1 May 2015 08:20:35 UTC+10, Pete GS wrote:
>>
>> Does anyone have any thoughts on this?
>>
>> Even if someone could identify some scenarios that would cause high CPU
>> on Graylog servers and in what circumstances Graylog would have trouble
>> contacting the MongoDB servers.
>>
>> Cheers, Pete
>>
>> On Wednesday, 29 April 2015 10:34:28 UTC+10, Pete GS wrote:
>>>
>>> Hi all,
>>>
>>> We acquired a company a while ago and last week we added all of their
>>> logs to our Graylog environment which all come in from their Syslog server
>>> via UDP.
>>>
>>> After this, I noticed that the Graylog servers were maxing CPU so to
>>> alleviate this I increased CPU resources to the existing servers and added
>>> two new servers.
>>>
>>> I'm still seeing generally high CPU usage with peaks of 100% on all four
>>> of the Graylog servers but I now have issues where they also seem to have
>>> issues connecting to MongoDB.
>>>
>>> I see lots of "[NodePingThread] Did not find meta info of this node.
>>> Re-registering." streaming through the log files but it only seems to
>>> happen when I have more than two Graylog servers running.
>>>
>>> I have verified NTP is installed and configured and all servers
>>> including the MongoDB and ElasticSearch servers are sync'ing with the same
>>> NTP servers.
>>>
>>> We're doing less than 10,000 messages per second so with the resources
>>> I've allocated I would have expected no issues whatsoever.
>>>
>>> I have seen this link:
>>> https://groups.google.com/forum/?hl=en#!topic/graylog2/bW2glCdBIUI but
>>> I don't believe it is our issue.
>>>
>>> If it truly is being caused by doing lots of reverse DNS lookups, I
>>> would expect tcpdump to show me that traffic to our DNS servers, but I see
>>> almost no DNS lookups at all.
>>>
>>> We have 6 inputs in total but only one receives the bulk of the Syslog
>>> UDP messages. Most of the other inputs are GELF UDP inputs.
>>>
>>> We also have 11 streams, however pausing these streams seems to have
>>> little to no impact on the CPU usage.
>>>
>>> All the Graylog servers are virtualised on top of vSphere 5.5 Update 2
>>> with plenty of physical hardware available to service the workload (little
>>> to no contention).
>>>
>>> The original two have 20 vCPU's and 32GB RAM, the additional two have 16
>>> vCPU's and 32GB RAM.
>>>
>>> Java heap on all is set to 16GB.
>>>
>>> This is all running on CentOS 6.
>>>
>>> Any input would be greatly appreciated as I'm a bit stumped on how to
>>> get this resolved at present.
>>>
>>> Here is the config file I'm using (censored where appropriate):
>>>
>>> is_master = false
>>> node_id_file = /etc/graylog2/server/node-id
>>> password_secret = <Censored>
>>> root_username = <Censored>
>>> root_password_sha2 = <Censored>
>>> plugin_dir = /usr/share/graylog2-server/plugin
>>> rest_listen_uri = http://172.22.20.66:12900/
>>>
>>> elasticsearch_max_docs_per_index = 20000000
>>> elasticsearch_max_number_of_indices = 999
>>> retention_strategy = close
>>> elasticsearch_shards = 4
>>> elasticsearch_replicas = 1
>>> elasticsearch_index_prefix = graylog2
>>> allow_leading_wildcard_searches = true
>>> allow_highlighting = true
>>> elasticsearch_cluster_name = graylog2
>>> elasticsearch_node_name = bne3-0002las
>>> elasticsearch_node_master = false
>>> elasticsearch_node_data = false
>>> elasticsearch_discovery_zen_ping_multicast_enabled = false
>>> elasticsearch_discovery_zen_ping_unicast_hosts =
>>> bne3-0001lai.server-web.com:9300,bne3-0002lai.server-web.com:9300,
>>> bne3-0003lai.server-web.com:9300,bne3-0004lai.server-web.com:9300,
>>> bne3-0005lai.server-web.com:9300,bne3-0006lai.server-web.com:9300,
>>> bne3-0007lai.server-web.com:9300,bne3-0008lai.server-web.com:9300,
>>> bne3-0009lai.server-web.com:9300
>>> elasticsearch_cluster_discovery_timeout = 5000
>>> elasticsearch_discovery_initial_state_timeout = 3s
>>> elasticsearch_analyzer = standard
>>>
>>> output_batch_size = 5000
>>> output_flush_interval = 1
>>> processbuffer_processors = 20
>>> outputbuffer_processors = 5
>>> #outputbuffer_processor_keep_alive_time = 5000
>>> #outputbuffer_processor_threads_core_pool_size = 3
>>> #outputbuffer_processor_threads_max_pool_size = 30
>>> #udp_recvbuffer_sizes = 1048576
>>> processor_wait_strategy = blocking
>>> ring_size = 65536
>>>
>>> inputbuffer_ring_size = 65536
>>> inputbuffer_processors = 2
>>> inputbuffer_wait_strategy = blocking
>>>
>>> message_journal_enabled = true
>>> message_journal_dir = /var/lib/graylog-server/journal
>>> message_journal_max_age = 24h
>>> message_journal_max_size = 150gb
>>> message_journal_flush_age = 1m
>>> message_journal_flush_interval = 1000000
>>> message_journal_segment_age = 1h
>>> message_journal_segment_size = 1gb
>>>
>>> dead_letters_enabled = false
>>> lb_recognition_period_seconds = 3
>>>
>>> mongodb_useauth = true
>>> mongodb_user = <Censored>
>>> mongodb_password = <Censored>
>>> mongodb_replica_set = bne3-0001ladb.server-web.com:27017,
>>> bne3-0002ladb.server-web.com:27017
>>> mongodb_database = graylog2
>>> mongodb_max_connections = 200
>>> mongodb_threads_allowed_to_block_multiplier = 5
>>>
>>> #rules_file = /etc/graylog2.drl
>>>
>>> # Email transport
>>> transport_email_enabled = true
>>> transport_email_hostname = <Censored>
>>> transport_email_port = 25
>>> transport_email_use_auth = false
>>> transport_email_use_tls = false
>>> transport_email_use_ssl = false
>>> transport_email_auth_username = [email protected]
>>> transport_email_auth_password = secret
>>> transport_email_subject_prefix = [graylog2]
>>> transport_email_from_email = <Censored>
>>> transport_email_web_interface_url = <Censored>
>>>
>>> message_cache_off_heap = false
>>> message_cache_spool_dir = /var/lib/graylog2-server/message-cache-spool
>>> #message_cache_commit_interval = 1000
>>> #input_cache_max_size = 0
>>>
>>> #ldap_connection_timeout = 2000
>>>
>>> versionchecks = false
>>>
>>> #enable_metrics_collection = false
>>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "graylog2" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [graylog2] Re: High CPU and did not find meta info issues since adding new Graylog servers and increased input messages/second

Reply via email to