Re: [graylog2] High CPU and did not find meta info issues since adding new Graylog servers and increased input messages/second

Mathieu Grzybek Tue, 05 May 2015 10:06:08 -0700

Also check « co stop » metric on VMware. I am sure you have too many vCPUs.


> Le 5 mai 2015 à 16:21, Arie <[email protected]> a écrit :
> 
> What happens when you raise "outputbuffer_processors = 5" to 
> "outputbuffer_processors = 10" ?
> 
> Op dinsdag 5 mei 2015 02:23:37 UTC+2 schreef Pete GS:
> Yesterday I did a yum update on all Graylog and MongoDB nodes and since doing 
> that and rebooting them all (there was a kernel update) it seems that there 
> are no longer issues connecting to the Mongo database.
> 
> However, I'm still seeing excessively high CPU usage on the Graylog nodes 
> where all vCPU's are regularly exceeding 95%.
> 
> What can contribute to this? I'm a little stumped at present.
> 
> I would say our average messages/second is around 5,000 to 6,000 with peaks 
> up to about 12,000.
> 
> Cheers, Pete
> 
> On Friday, 1 May 2015 08:20:35 UTC+10, Pete GS wrote:
> Does anyone have any thoughts on this?
> 
> Even if someone could identify some scenarios that would cause high CPU on 
> Graylog servers and in what circumstances Graylog would have trouble 
> contacting the MongoDB servers.
> 
> Cheers, Pete
> 
> On Wednesday, 29 April 2015 10:34:28 UTC+10, Pete GS wrote:
> Hi all,
> 
> We acquired a company a while ago and last week we added all of their logs to 
> our Graylog environment which all come in from their Syslog server via UDP.
> 
> After this, I noticed that the Graylog servers were maxing CPU so to 
> alleviate this I increased CPU resources to the existing servers and added 
> two new servers.
> 
> I'm still seeing generally high CPU usage with peaks of 100% on all four of 
> the Graylog servers but I now have issues where they also seem to have issues 
> connecting to MongoDB.
> 
> I see lots of "[NodePingThread] Did not find meta info of this node. 
> Re-registering." streaming through the log files but it only seems to happen 
> when I have more than two Graylog servers running.
> 
> I have verified NTP is installed and configured and all servers including the 
> MongoDB and ElasticSearch servers are sync'ing with the same NTP servers.
> 
> We're doing less than 10,000 messages per second so with the resources I've 
> allocated I would have expected no issues whatsoever.
> 
> I have seen this link: 
> https://groups.google.com/forum/?hl=en#!topic/graylog2/bW2glCdBIUI 
> <https://groups.google.com/forum/?hl=en#!topic/graylog2/bW2glCdBIUI> but I 
> don't believe it is our issue.
> 
> If it truly is being caused by doing lots of reverse DNS lookups, I would 
> expect tcpdump to show me that traffic to our DNS servers, but I see almost 
> no DNS lookups at all.
> 
> We have 6 inputs in total but only one receives the bulk of the Syslog UDP 
> messages. Most of the other inputs are GELF UDP inputs.
> 
> We also have 11 streams, however pausing these streams seems to have little 
> to no impact on the CPU usage.
> 
> All the Graylog servers are virtualised on top of vSphere 5.5 Update 2 with 
> plenty of physical hardware available to service the workload (little to no 
> contention).
> 
> The original two have 20 vCPU's and 32GB RAM, the additional two have 16 
> vCPU's and 32GB RAM.
> 
> Java heap on all is set to 16GB.
> 
> This is all running on CentOS 6.
> 
> Any input would be greatly appreciated as I'm a bit stumped on how to get 
> this resolved at present.
> 
> Here is the config file I'm using (censored where appropriate):
> 
> is_master = false
> node_id_file = /etc/graylog2/server/node-id
> password_secret = <Censored>
> root_username = <Censored>
> root_password_sha2 = <Censored>
> plugin_dir = /usr/share/graylog2-server/plugin
> rest_listen_uri = http://172.22.20.66:12900/ <http://172.22.20.66:12900/>
> 
> elasticsearch_max_docs_per_index = 20000000
> elasticsearch_max_number_of_indices = 999
> retention_strategy = close
> elasticsearch_shards = 4
> elasticsearch_replicas = 1
> elasticsearch_index_prefix = graylog2
> allow_leading_wildcard_searches = true
> allow_highlighting = true
> elasticsearch_cluster_name = graylog2
> elasticsearch_node_name = bne3-0002las
> elasticsearch_node_master = false
> elasticsearch_node_data = false
> elasticsearch_discovery_zen_ping_multicast_enabled = false
> elasticsearch_discovery_zen_ping_unicast_hosts = 
> bne3-0001lai.server-web.com:9300 
> <http://bne3-0001lai.server-web.com:9300/>,bne3-0002lai.server-web.com:9300 
> <http://bne3-0002lai.server-web.com:9300/>,bne3-0003lai.server-web.com:9300 
> <http://bne3-0003lai.server-web.com:9300/>,bne3-0004lai.server-web.com:9300 
> <http://bne3-0004lai.server-web.com:9300/>,bne3-0005lai.server-web.com:9300 
> <http://bne3-0005lai.server-web.com:9300/>,bne3-0006lai.server-web.com:9300 
> <http://bne3-0006lai.server-web.com:9300/>,bne3-0007lai.server-web.com:9300 
> <http://bne3-0007lai.server-web.com:9300/>,bne3-0008lai.server-web.com:9300 
> <http://bne3-0008lai.server-web.com:9300/>,bne3-0009lai.server-web.com:9300 
> <http://bne3-0009lai.server-web.com:9300/>
> elasticsearch_cluster_discovery_timeout = 5000
> elasticsearch_discovery_initial_state_timeout = 3s
> elasticsearch_analyzer = standard
> 
> output_batch_size = 5000
> output_flush_interval = 1
> processbuffer_processors = 20
> outputbuffer_processors = 5
> #outputbuffer_processor_keep_alive_time = 5000
> #outputbuffer_processor_threads_core_pool_size = 3
> #outputbuffer_processor_threads_max_pool_size = 30
> #udp_recvbuffer_sizes = 1048576
> processor_wait_strategy = blocking
> ring_size = 65536
> 
> inputbuffer_ring_size = 65536
> inputbuffer_processors = 2
> inputbuffer_wait_strategy = blocking
> 
> message_journal_enabled = true
> message_journal_dir = /var/lib/graylog-server/journal
> message_journal_max_age = 24h
> message_journal_max_size = 150gb
> message_journal_flush_age = 1m
> message_journal_flush_interval = 1000000
> message_journal_segment_age = 1h
> message_journal_segment_size = 1gb
> 
> dead_letters_enabled = false
> lb_recognition_period_seconds = 3
> 
> mongodb_useauth = true
> mongodb_user = <Censored>
> mongodb_password = <Censored>
> mongodb_replica_set = bne3-0001ladb.server-web.com:27017 
> <http://bne3-0001ladb.server-web.com:27017/>,bne3-0002ladb.server-web.com:27017
>  <http://bne3-0002ladb.server-web.com:27017/>
> mongodb_database = graylog2
> mongodb_max_connections = 200
> mongodb_threads_allowed_to_block_multiplier = 5
> 
> #rules_file = /etc/graylog2.drl
> 
> # Email transport
> transport_email_enabled = true
> transport_email_hostname = <Censored>
> transport_email_port = 25
> transport_email_use_auth = false
> transport_email_use_tls = false
> transport_email_use_ssl = false
> transport_email_auth_username = y...@ <>example.com <http://example.com/>
> transport_email_auth_password = secret
> transport_email_subject_prefix = [graylog2]
> transport_email_from_email = <Censored>
> transport_email_web_interface_url = <Censored>
> 
> message_cache_off_heap = false
> message_cache_spool_dir = /var/lib/graylog2-server/message-cache-spool
> #message_cache_commit_interval = 1000
> #input_cache_max_size = 0
> 
> #ldap_connection_timeout = 2000
> 
> versionchecks = false
> 
> #enable_metrics_collection = false
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "graylog2" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [graylog2] High CPU and did not find meta info issues since adding new Graylog servers and increased input messages/second

Reply via email to