Hi all, We acquired a company a while ago and last week we added all of their logs to our Graylog environment which all come in from their Syslog server via UDP.
After this, I noticed that the Graylog servers were maxing CPU so to alleviate this I increased CPU resources to the existing servers and added two new servers. I'm still seeing generally high CPU usage with peaks of 100% on all four of the Graylog servers but I now have issues where they also seem to have issues connecting to MongoDB. I see lots of "[NodePingThread] Did not find meta info of this node. Re-registering." streaming through the log files but it only seems to happen when I have more than two Graylog servers running. I have verified NTP is installed and configured and all servers including the MongoDB and ElasticSearch servers are sync'ing with the same NTP servers. We're doing less than 10,000 messages per second so with the resources I've allocated I would have expected no issues whatsoever. I have seen this link: https://groups.google.com/forum/?hl=en#!topic/graylog2/bW2glCdBIUI but I don't believe it is our issue. If it truly is being caused by doing lots of reverse DNS lookups, I would expect tcpdump to show me that traffic to our DNS servers, but I see almost no DNS lookups at all. We have 6 inputs in total but only one receives the bulk of the Syslog UDP messages. Most of the other inputs are GELF UDP inputs. We also have 11 streams, however pausing these streams seems to have little to no impact on the CPU usage. All the Graylog servers are virtualised on top of vSphere 5.5 Update 2 with plenty of physical hardware available to service the workload (little to no contention). The original two have 20 vCPU's and 32GB RAM, the additional two have 16 vCPU's and 32GB RAM. Java heap on all is set to 16GB. This is all running on CentOS 6. Any input would be greatly appreciated as I'm a bit stumped on how to get this resolved at present. Here is the config file I'm using (censored where appropriate): is_master = false node_id_file = /etc/graylog2/server/node-id password_secret = <Censored> root_username = <Censored> root_password_sha2 = <Censored> plugin_dir = /usr/share/graylog2-server/plugin rest_listen_uri = http://172.22.20.66:12900/ elasticsearch_max_docs_per_index = 20000000 elasticsearch_max_number_of_indices = 999 retention_strategy = close elasticsearch_shards = 4 elasticsearch_replicas = 1 elasticsearch_index_prefix = graylog2 allow_leading_wildcard_searches = true allow_highlighting = true elasticsearch_cluster_name = graylog2 elasticsearch_node_name = bne3-0002las elasticsearch_node_master = false elasticsearch_node_data = false elasticsearch_discovery_zen_ping_multicast_enabled = false elasticsearch_discovery_zen_ping_unicast_hosts = bne3-0001lai.server-web.com:9300,bne3-0002lai.server-web.com:9300,bne3-0003lai.server-web.com:9300,bne3-0004lai.server-web.com:9300,bne3-0005lai.server-web.com:9300,bne3-0006lai.server-web.com:9300,bne3-0007lai.server-web.com:9300,bne3-0008lai.server-web.com:9300,bne3-0009lai.server-web.com:9300 elasticsearch_cluster_discovery_timeout = 5000 elasticsearch_discovery_initial_state_timeout = 3s elasticsearch_analyzer = standard output_batch_size = 5000 output_flush_interval = 1 processbuffer_processors = 20 outputbuffer_processors = 5 #outputbuffer_processor_keep_alive_time = 5000 #outputbuffer_processor_threads_core_pool_size = 3 #outputbuffer_processor_threads_max_pool_size = 30 #udp_recvbuffer_sizes = 1048576 processor_wait_strategy = blocking ring_size = 65536 inputbuffer_ring_size = 65536 inputbuffer_processors = 2 inputbuffer_wait_strategy = blocking message_journal_enabled = true message_journal_dir = /var/lib/graylog-server/journal message_journal_max_age = 24h message_journal_max_size = 150gb message_journal_flush_age = 1m message_journal_flush_interval = 1000000 message_journal_segment_age = 1h message_journal_segment_size = 1gb dead_letters_enabled = false lb_recognition_period_seconds = 3 mongodb_useauth = true mongodb_user = <Censored> mongodb_password = <Censored> mongodb_replica_set = bne3-0001ladb.server-web.com:27017,bne3-0002ladb.server-web.com:27017 mongodb_database = graylog2 mongodb_max_connections = 200 mongodb_threads_allowed_to_block_multiplier = 5 #rules_file = /etc/graylog2.drl # Email transport transport_email_enabled = true transport_email_hostname = <Censored> transport_email_port = 25 transport_email_use_auth = false transport_email_use_tls = false transport_email_use_ssl = false transport_email_auth_username = [email protected] transport_email_auth_password = secret transport_email_subject_prefix = [graylog2] transport_email_from_email = <Censored> transport_email_web_interface_url = <Censored> message_cache_off_heap = false message_cache_spool_dir = /var/lib/graylog2-server/message-cache-spool #message_cache_commit_interval = 1000 #input_cache_max_size = 0 #ldap_connection_timeout = 2000 versionchecks = false #enable_metrics_collection = false -- You received this message because you are subscribed to the Google Groups "graylog2" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
