Hi all,

We acquired a company a while ago and last week we added all of their logs 
to our Graylog environment which all come in from their Syslog server via 
UDP.

After this, I noticed that the Graylog servers were maxing CPU so to 
alleviate this I increased CPU resources to the existing servers and added 
two new servers.

I'm still seeing generally high CPU usage with peaks of 100% on all four of 
the Graylog servers but I now have issues where they also seem to have 
issues connecting to MongoDB.

I see lots of "[NodePingThread] Did not find meta info of this node. 
Re-registering." streaming through the log files but it only seems to 
happen when I have more than two Graylog servers running.

I have verified NTP is installed and configured and all servers including 
the MongoDB and ElasticSearch servers are sync'ing with the same NTP 
servers.

We're doing less than 10,000 messages per second so with the resources I've 
allocated I would have expected no issues whatsoever.

I have seen this 
link: https://groups.google.com/forum/?hl=en#!topic/graylog2/bW2glCdBIUI 
but I don't believe it is our issue.

If it truly is being caused by doing lots of reverse DNS lookups, I would 
expect tcpdump to show me that traffic to our DNS servers, but I see almost 
no DNS lookups at all.

We have 6 inputs in total but only one receives the bulk of the Syslog UDP 
messages. Most of the other inputs are GELF UDP inputs.

We also have 11 streams, however pausing these streams seems to have little 
to no impact on the CPU usage.

All the Graylog servers are virtualised on top of vSphere 5.5 Update 2 with 
plenty of physical hardware available to service the workload (little to no 
contention).

The original two have 20 vCPU's and 32GB RAM, the additional two have 16 
vCPU's and 32GB RAM.

Java heap on all is set to 16GB.

This is all running on CentOS 6.

Any input would be greatly appreciated as I'm a bit stumped on how to get 
this resolved at present.

Here is the config file I'm using (censored where appropriate):

is_master = false
node_id_file = /etc/graylog2/server/node-id
password_secret = <Censored>
root_username = <Censored>
root_password_sha2 = <Censored>
plugin_dir = /usr/share/graylog2-server/plugin
rest_listen_uri = http://172.22.20.66:12900/

elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 999
retention_strategy = close
elasticsearch_shards = 4
elasticsearch_replicas = 1
elasticsearch_index_prefix = graylog2
allow_leading_wildcard_searches = true
allow_highlighting = true
elasticsearch_cluster_name = graylog2
elasticsearch_node_name = bne3-0002las
elasticsearch_node_master = false
elasticsearch_node_data = false
elasticsearch_discovery_zen_ping_multicast_enabled = false
elasticsearch_discovery_zen_ping_unicast_hosts = 
bne3-0001lai.server-web.com:9300,bne3-0002lai.server-web.com:9300,bne3-0003lai.server-web.com:9300,bne3-0004lai.server-web.com:9300,bne3-0005lai.server-web.com:9300,bne3-0006lai.server-web.com:9300,bne3-0007lai.server-web.com:9300,bne3-0008lai.server-web.com:9300,bne3-0009lai.server-web.com:9300
elasticsearch_cluster_discovery_timeout = 5000
elasticsearch_discovery_initial_state_timeout = 3s
elasticsearch_analyzer = standard

output_batch_size = 5000
output_flush_interval = 1
processbuffer_processors = 20
outputbuffer_processors = 5
#outputbuffer_processor_keep_alive_time = 5000
#outputbuffer_processor_threads_core_pool_size = 3
#outputbuffer_processor_threads_max_pool_size = 30
#udp_recvbuffer_sizes = 1048576
processor_wait_strategy = blocking
ring_size = 65536

inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking

message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
message_journal_max_age = 24h
message_journal_max_size = 150gb
message_journal_flush_age = 1m
message_journal_flush_interval = 1000000
message_journal_segment_age = 1h
message_journal_segment_size = 1gb

dead_letters_enabled = false
lb_recognition_period_seconds = 3

mongodb_useauth = true
mongodb_user = <Censored>
mongodb_password = <Censored>
mongodb_replica_set = 
bne3-0001ladb.server-web.com:27017,bne3-0002ladb.server-web.com:27017
mongodb_database = graylog2
mongodb_max_connections = 200
mongodb_threads_allowed_to_block_multiplier = 5

#rules_file = /etc/graylog2.drl

# Email transport
transport_email_enabled = true
transport_email_hostname = <Censored>
transport_email_port = 25
transport_email_use_auth = false
transport_email_use_tls = false
transport_email_use_ssl = false
transport_email_auth_username = [email protected]
transport_email_auth_password = secret
transport_email_subject_prefix = [graylog2]
transport_email_from_email = <Censored>
transport_email_web_interface_url = <Censored>

message_cache_off_heap = false
message_cache_spool_dir = /var/lib/graylog2-server/message-cache-spool
#message_cache_commit_interval = 1000
#input_cache_max_size = 0

#ldap_connection_timeout = 2000

versionchecks = false

#enable_metrics_collection = false

-- 
You received this message because you are subscribed to the Google Groups 
"graylog2" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to