Also check « co stop » metric on VMware. I am sure you have too many vCPUs.
> Le 5 mai 2015 à 16:21, Arie <[email protected]> a écrit : > > What happens when you raise "outputbuffer_processors = 5" to > "outputbuffer_processors = 10" ? > > Op dinsdag 5 mei 2015 02:23:37 UTC+2 schreef Pete GS: > Yesterday I did a yum update on all Graylog and MongoDB nodes and since doing > that and rebooting them all (there was a kernel update) it seems that there > are no longer issues connecting to the Mongo database. > > However, I'm still seeing excessively high CPU usage on the Graylog nodes > where all vCPU's are regularly exceeding 95%. > > What can contribute to this? I'm a little stumped at present. > > I would say our average messages/second is around 5,000 to 6,000 with peaks > up to about 12,000. > > Cheers, Pete > > On Friday, 1 May 2015 08:20:35 UTC+10, Pete GS wrote: > Does anyone have any thoughts on this? > > Even if someone could identify some scenarios that would cause high CPU on > Graylog servers and in what circumstances Graylog would have trouble > contacting the MongoDB servers. > > Cheers, Pete > > On Wednesday, 29 April 2015 10:34:28 UTC+10, Pete GS wrote: > Hi all, > > We acquired a company a while ago and last week we added all of their logs to > our Graylog environment which all come in from their Syslog server via UDP. > > After this, I noticed that the Graylog servers were maxing CPU so to > alleviate this I increased CPU resources to the existing servers and added > two new servers. > > I'm still seeing generally high CPU usage with peaks of 100% on all four of > the Graylog servers but I now have issues where they also seem to have issues > connecting to MongoDB. > > I see lots of "[NodePingThread] Did not find meta info of this node. > Re-registering." streaming through the log files but it only seems to happen > when I have more than two Graylog servers running. > > I have verified NTP is installed and configured and all servers including the > MongoDB and ElasticSearch servers are sync'ing with the same NTP servers. > > We're doing less than 10,000 messages per second so with the resources I've > allocated I would have expected no issues whatsoever. > > I have seen this link: > https://groups.google.com/forum/?hl=en#!topic/graylog2/bW2glCdBIUI > <https://groups.google.com/forum/?hl=en#!topic/graylog2/bW2glCdBIUI> but I > don't believe it is our issue. > > If it truly is being caused by doing lots of reverse DNS lookups, I would > expect tcpdump to show me that traffic to our DNS servers, but I see almost > no DNS lookups at all. > > We have 6 inputs in total but only one receives the bulk of the Syslog UDP > messages. Most of the other inputs are GELF UDP inputs. > > We also have 11 streams, however pausing these streams seems to have little > to no impact on the CPU usage. > > All the Graylog servers are virtualised on top of vSphere 5.5 Update 2 with > plenty of physical hardware available to service the workload (little to no > contention). > > The original two have 20 vCPU's and 32GB RAM, the additional two have 16 > vCPU's and 32GB RAM. > > Java heap on all is set to 16GB. > > This is all running on CentOS 6. > > Any input would be greatly appreciated as I'm a bit stumped on how to get > this resolved at present. > > Here is the config file I'm using (censored where appropriate): > > is_master = false > node_id_file = /etc/graylog2/server/node-id > password_secret = <Censored> > root_username = <Censored> > root_password_sha2 = <Censored> > plugin_dir = /usr/share/graylog2-server/plugin > rest_listen_uri = http://172.22.20.66:12900/ <http://172.22.20.66:12900/> > > elasticsearch_max_docs_per_index = 20000000 > elasticsearch_max_number_of_indices = 999 > retention_strategy = close > elasticsearch_shards = 4 > elasticsearch_replicas = 1 > elasticsearch_index_prefix = graylog2 > allow_leading_wildcard_searches = true > allow_highlighting = true > elasticsearch_cluster_name = graylog2 > elasticsearch_node_name = bne3-0002las > elasticsearch_node_master = false > elasticsearch_node_data = false > elasticsearch_discovery_zen_ping_multicast_enabled = false > elasticsearch_discovery_zen_ping_unicast_hosts = > bne3-0001lai.server-web.com:9300 > <http://bne3-0001lai.server-web.com:9300/>,bne3-0002lai.server-web.com:9300 > <http://bne3-0002lai.server-web.com:9300/>,bne3-0003lai.server-web.com:9300 > <http://bne3-0003lai.server-web.com:9300/>,bne3-0004lai.server-web.com:9300 > <http://bne3-0004lai.server-web.com:9300/>,bne3-0005lai.server-web.com:9300 > <http://bne3-0005lai.server-web.com:9300/>,bne3-0006lai.server-web.com:9300 > <http://bne3-0006lai.server-web.com:9300/>,bne3-0007lai.server-web.com:9300 > <http://bne3-0007lai.server-web.com:9300/>,bne3-0008lai.server-web.com:9300 > <http://bne3-0008lai.server-web.com:9300/>,bne3-0009lai.server-web.com:9300 > <http://bne3-0009lai.server-web.com:9300/> > elasticsearch_cluster_discovery_timeout = 5000 > elasticsearch_discovery_initial_state_timeout = 3s > elasticsearch_analyzer = standard > > output_batch_size = 5000 > output_flush_interval = 1 > processbuffer_processors = 20 > outputbuffer_processors = 5 > #outputbuffer_processor_keep_alive_time = 5000 > #outputbuffer_processor_threads_core_pool_size = 3 > #outputbuffer_processor_threads_max_pool_size = 30 > #udp_recvbuffer_sizes = 1048576 > processor_wait_strategy = blocking > ring_size = 65536 > > inputbuffer_ring_size = 65536 > inputbuffer_processors = 2 > inputbuffer_wait_strategy = blocking > > message_journal_enabled = true > message_journal_dir = /var/lib/graylog-server/journal > message_journal_max_age = 24h > message_journal_max_size = 150gb > message_journal_flush_age = 1m > message_journal_flush_interval = 1000000 > message_journal_segment_age = 1h > message_journal_segment_size = 1gb > > dead_letters_enabled = false > lb_recognition_period_seconds = 3 > > mongodb_useauth = true > mongodb_user = <Censored> > mongodb_password = <Censored> > mongodb_replica_set = bne3-0001ladb.server-web.com:27017 > <http://bne3-0001ladb.server-web.com:27017/>,bne3-0002ladb.server-web.com:27017 > <http://bne3-0002ladb.server-web.com:27017/> > mongodb_database = graylog2 > mongodb_max_connections = 200 > mongodb_threads_allowed_to_block_multiplier = 5 > > #rules_file = /etc/graylog2.drl > > # Email transport > transport_email_enabled = true > transport_email_hostname = <Censored> > transport_email_port = 25 > transport_email_use_auth = false > transport_email_use_tls = false > transport_email_use_ssl = false > transport_email_auth_username = y...@ <>example.com <http://example.com/> > transport_email_auth_password = secret > transport_email_subject_prefix = [graylog2] > transport_email_from_email = <Censored> > transport_email_web_interface_url = <Censored> > > message_cache_off_heap = false > message_cache_spool_dir = /var/lib/graylog2-server/message-cache-spool > #message_cache_commit_interval = 1000 > #input_cache_max_size = 0 > > #ldap_connection_timeout = 2000 > > versionchecks = false > > #enable_metrics_collection = false > > -- > You received this message because you are subscribed to the Google Groups > "graylog2" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. -- You received this message because you are subscribed to the Google Groups "graylog2" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
