Yesterday I did a yum update on all Graylog and MongoDB nodes and since 
doing that and rebooting them all (there was a kernel update) it seems that 
there are no longer issues connecting to the Mongo database.

However, I'm still seeing excessively high CPU usage on the Graylog nodes 
where all vCPU's are regularly exceeding 95%.

What can contribute to this? I'm a little stumped at present.

I would say our average messages/second is around 5,000 to 6,000 with peaks 
up to about 12,000.

Cheers, Pete

On Friday, 1 May 2015 08:20:35 UTC+10, Pete GS wrote:
>
> Does anyone have any thoughts on this?
>
> Even if someone could identify some scenarios that would cause high CPU on 
> Graylog servers and in what circumstances Graylog would have trouble 
> contacting the MongoDB servers.
>
> Cheers, Pete
>
> On Wednesday, 29 April 2015 10:34:28 UTC+10, Pete GS wrote:
>>
>> Hi all,
>>
>> We acquired a company a while ago and last week we added all of their 
>> logs to our Graylog environment which all come in from their Syslog server 
>> via UDP.
>>
>> After this, I noticed that the Graylog servers were maxing CPU so to 
>> alleviate this I increased CPU resources to the existing servers and added 
>> two new servers.
>>
>> I'm still seeing generally high CPU usage with peaks of 100% on all four 
>> of the Graylog servers but I now have issues where they also seem to have 
>> issues connecting to MongoDB.
>>
>> I see lots of "[NodePingThread] Did not find meta info of this node. 
>> Re-registering." streaming through the log files but it only seems to 
>> happen when I have more than two Graylog servers running.
>>
>> I have verified NTP is installed and configured and all servers including 
>> the MongoDB and ElasticSearch servers are sync'ing with the same NTP 
>> servers.
>>
>> We're doing less than 10,000 messages per second so with the resources 
>> I've allocated I would have expected no issues whatsoever.
>>
>> I have seen this link: 
>> https://groups.google.com/forum/?hl=en#!topic/graylog2/bW2glCdBIUI but I 
>> don't believe it is our issue.
>>
>> If it truly is being caused by doing lots of reverse DNS lookups, I would 
>> expect tcpdump to show me that traffic to our DNS servers, but I see almost 
>> no DNS lookups at all.
>>
>> We have 6 inputs in total but only one receives the bulk of the Syslog 
>> UDP messages. Most of the other inputs are GELF UDP inputs.
>>
>> We also have 11 streams, however pausing these streams seems to have 
>> little to no impact on the CPU usage.
>>
>> All the Graylog servers are virtualised on top of vSphere 5.5 Update 2 
>> with plenty of physical hardware available to service the workload (little 
>> to no contention).
>>
>> The original two have 20 vCPU's and 32GB RAM, the additional two have 16 
>> vCPU's and 32GB RAM.
>>
>> Java heap on all is set to 16GB.
>>
>> This is all running on CentOS 6.
>>
>> Any input would be greatly appreciated as I'm a bit stumped on how to get 
>> this resolved at present.
>>
>> Here is the config file I'm using (censored where appropriate):
>>
>> is_master = false
>> node_id_file = /etc/graylog2/server/node-id
>> password_secret = <Censored>
>> root_username = <Censored>
>> root_password_sha2 = <Censored>
>> plugin_dir = /usr/share/graylog2-server/plugin
>> rest_listen_uri = http://172.22.20.66:12900/
>>
>> elasticsearch_max_docs_per_index = 20000000
>> elasticsearch_max_number_of_indices = 999
>> retention_strategy = close
>> elasticsearch_shards = 4
>> elasticsearch_replicas = 1
>> elasticsearch_index_prefix = graylog2
>> allow_leading_wildcard_searches = true
>> allow_highlighting = true
>> elasticsearch_cluster_name = graylog2
>> elasticsearch_node_name = bne3-0002las
>> elasticsearch_node_master = false
>> elasticsearch_node_data = false
>> elasticsearch_discovery_zen_ping_multicast_enabled = false
>> elasticsearch_discovery_zen_ping_unicast_hosts = 
>> bne3-0001lai.server-web.com:9300,bne3-0002lai.server-web.com:9300,
>> bne3-0003lai.server-web.com:9300,bne3-0004lai.server-web.com:9300,
>> bne3-0005lai.server-web.com:9300,bne3-0006lai.server-web.com:9300,
>> bne3-0007lai.server-web.com:9300,bne3-0008lai.server-web.com:9300,
>> bne3-0009lai.server-web.com:9300
>> elasticsearch_cluster_discovery_timeout = 5000
>> elasticsearch_discovery_initial_state_timeout = 3s
>> elasticsearch_analyzer = standard
>>
>> output_batch_size = 5000
>> output_flush_interval = 1
>> processbuffer_processors = 20
>> outputbuffer_processors = 5
>> #outputbuffer_processor_keep_alive_time = 5000
>> #outputbuffer_processor_threads_core_pool_size = 3
>> #outputbuffer_processor_threads_max_pool_size = 30
>> #udp_recvbuffer_sizes = 1048576
>> processor_wait_strategy = blocking
>> ring_size = 65536
>>
>> inputbuffer_ring_size = 65536
>> inputbuffer_processors = 2
>> inputbuffer_wait_strategy = blocking
>>
>> message_journal_enabled = true
>> message_journal_dir = /var/lib/graylog-server/journal
>> message_journal_max_age = 24h
>> message_journal_max_size = 150gb
>> message_journal_flush_age = 1m
>> message_journal_flush_interval = 1000000
>> message_journal_segment_age = 1h
>> message_journal_segment_size = 1gb
>>
>> dead_letters_enabled = false
>> lb_recognition_period_seconds = 3
>>
>> mongodb_useauth = true
>> mongodb_user = <Censored>
>> mongodb_password = <Censored>
>> mongodb_replica_set = bne3-0001ladb.server-web.com:27017,
>> bne3-0002ladb.server-web.com:27017
>> mongodb_database = graylog2
>> mongodb_max_connections = 200
>> mongodb_threads_allowed_to_block_multiplier = 5
>>
>> #rules_file = /etc/graylog2.drl
>>
>> # Email transport
>> transport_email_enabled = true
>> transport_email_hostname = <Censored>
>> transport_email_port = 25
>> transport_email_use_auth = false
>> transport_email_use_tls = false
>> transport_email_use_ssl = false
>> transport_email_auth_username = [email protected]
>> transport_email_auth_password = secret
>> transport_email_subject_prefix = [graylog2]
>> transport_email_from_email = <Censored>
>> transport_email_web_interface_url = <Censored>
>>
>> message_cache_off_heap = false
>> message_cache_spool_dir = /var/lib/graylog2-server/message-cache-spool
>> #message_cache_commit_interval = 1000
>> #input_cache_max_size = 0
>>
>> #ldap_connection_timeout = 2000
>>
>> versionchecks = false
>>
>> #enable_metrics_collection = false
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to