[graylog2] Re: High CPU and did not find meta info issues since adding new Graylog servers and increased input messages/second

Pete GS Thu, 30 Apr 2015 15:21:07 -0700

Does anyone have any thoughts on this?

Even if someone could identify some scenarios that would cause high CPU on 
Graylog servers and in what circumstances Graylog would have trouble 
contacting the MongoDB servers.


Cheers, Pete

On Wednesday, 29 April 2015 10:34:28 UTC+10, Pete GS wrote:
>
> Hi all,
>
> We acquired a company a while ago and last week we added all of their logs 
> to our Graylog environment which all come in from their Syslog server via 
> UDP.
>
> After this, I noticed that the Graylog servers were maxing CPU so to 
> alleviate this I increased CPU resources to the existing servers and added 
> two new servers.
>
> I'm still seeing generally high CPU usage with peaks of 100% on all four 
> of the Graylog servers but I now have issues where they also seem to have 
> issues connecting to MongoDB.
>
> I see lots of "[NodePingThread] Did not find meta info of this node. 
> Re-registering." streaming through the log files but it only seems to 
> happen when I have more than two Graylog servers running.
>
> I have verified NTP is installed and configured and all servers including 
> the MongoDB and ElasticSearch servers are sync'ing with the same NTP 
> servers.
>
> We're doing less than 10,000 messages per second so with the resources 
> I've allocated I would have expected no issues whatsoever.
>
> I have seen this link: 
> https://groups.google.com/forum/?hl=en#!topic/graylog2/bW2glCdBIUI but I 
> don't believe it is our issue.
>
> If it truly is being caused by doing lots of reverse DNS lookups, I would 
> expect tcpdump to show me that traffic to our DNS servers, but I see almost 
> no DNS lookups at all.
>
> We have 6 inputs in total but only one receives the bulk of the Syslog UDP 
> messages. Most of the other inputs are GELF UDP inputs.
>
> We also have 11 streams, however pausing these streams seems to have 
> little to no impact on the CPU usage.
>
> All the Graylog servers are virtualised on top of vSphere 5.5 Update 2 
> with plenty of physical hardware available to service the workload (little 
> to no contention).
>
> The original two have 20 vCPU's and 32GB RAM, the additional two have 16 
> vCPU's and 32GB RAM.
>
> Java heap on all is set to 16GB.
>
> This is all running on CentOS 6.
>
> Any input would be greatly appreciated as I'm a bit stumped on how to get 
> this resolved at present.
>
> Here is the config file I'm using (censored where appropriate):
>
> is_master = false
> node_id_file = /etc/graylog2/server/node-id
> password_secret = <Censored>
> root_username = <Censored>
> root_password_sha2 = <Censored>
> plugin_dir = /usr/share/graylog2-server/plugin
> rest_listen_uri = http://172.22.20.66:12900/
>
> elasticsearch_max_docs_per_index = 20000000
> elasticsearch_max_number_of_indices = 999
> retention_strategy = close
> elasticsearch_shards = 4
> elasticsearch_replicas = 1
> elasticsearch_index_prefix = graylog2
> allow_leading_wildcard_searches = true
> allow_highlighting = true
> elasticsearch_cluster_name = graylog2
> elasticsearch_node_name = bne3-0002las
> elasticsearch_node_master = false
> elasticsearch_node_data = false
> elasticsearch_discovery_zen_ping_multicast_enabled = false
> elasticsearch_discovery_zen_ping_unicast_hosts = 
> bne3-0001lai.server-web.com:9300,bne3-0002lai.server-web.com:9300,
> bne3-0003lai.server-web.com:9300,bne3-0004lai.server-web.com:9300,
> bne3-0005lai.server-web.com:9300,bne3-0006lai.server-web.com:9300,
> bne3-0007lai.server-web.com:9300,bne3-0008lai.server-web.com:9300,
> bne3-0009lai.server-web.com:9300
> elasticsearch_cluster_discovery_timeout = 5000
> elasticsearch_discovery_initial_state_timeout = 3s
> elasticsearch_analyzer = standard
>
> output_batch_size = 5000
> output_flush_interval = 1
> processbuffer_processors = 20
> outputbuffer_processors = 5
> #outputbuffer_processor_keep_alive_time = 5000
> #outputbuffer_processor_threads_core_pool_size = 3
> #outputbuffer_processor_threads_max_pool_size = 30
> #udp_recvbuffer_sizes = 1048576
> processor_wait_strategy = blocking
> ring_size = 65536
>
> inputbuffer_ring_size = 65536
> inputbuffer_processors = 2
> inputbuffer_wait_strategy = blocking
>
> message_journal_enabled = true
> message_journal_dir = /var/lib/graylog-server/journal
> message_journal_max_age = 24h
> message_journal_max_size = 150gb
> message_journal_flush_age = 1m
> message_journal_flush_interval = 1000000
> message_journal_segment_age = 1h
> message_journal_segment_size = 1gb
>
> dead_letters_enabled = false
> lb_recognition_period_seconds = 3
>
> mongodb_useauth = true
> mongodb_user = <Censored>
> mongodb_password = <Censored>
> mongodb_replica_set = bne3-0001ladb.server-web.com:27017,
> bne3-0002ladb.server-web.com:27017
> mongodb_database = graylog2
> mongodb_max_connections = 200
> mongodb_threads_allowed_to_block_multiplier = 5
>
> #rules_file = /etc/graylog2.drl
>
> # Email transport
> transport_email_enabled = true
> transport_email_hostname = <Censored>
> transport_email_port = 25
> transport_email_use_auth = false
> transport_email_use_tls = false
> transport_email_use_ssl = false
> transport_email_auth_username = [email protected]
> transport_email_auth_password = secret
> transport_email_subject_prefix = [graylog2]
> transport_email_from_email = <Censored>
> transport_email_web_interface_url = <Censored>
>
> message_cache_off_heap = false
> message_cache_spool_dir = /var/lib/graylog2-server/message-cache-spool
> #message_cache_commit_interval = 1000
> #input_cache_max_size = 0
>
> #ldap_connection_timeout = 2000
>
> versionchecks = false
>
> #enable_metrics_collection = false
>

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[graylog2] Re: High CPU and did not find meta info issues since adding new Graylog servers and increased input messages/second

Reply via email to