Lets try some more options. I see you are running your stuf virtual. Then you can consider the following for centos6
In your startup kernel config you can add the following options (/etc/grub.conf) nohz=off (for high cpu intensive systems) elevator=noop (disc scheduling is done by the virtual layer, so disable that) cgroup_disable=memory (possibly not used, it fees up some memory and allocation) if you use the pvscsi device, add the following: vmw_pvscsi.cmd_per_lun=254 vmw_pvscsi.ring_pages=32 Check disk buffers on the virtual layer too. vmware kb 2053145 see http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=2053145&sliceId=1&docTypeID=DT_KB_1_1&dialogID=621755330&stateId=1%200%20593866502 Optimize your disk for performance (up to 30%!!! yes): for the filesystems were graylog and or elastic is located add the following to /etc/fstab example: /dev/mapper/vg_nagios-lv_root / ext4 defaults,noatime,nobarrier,data=writeback 1 1 and if you want to be more safe: /dev/mapper/vg_nagios-lv_root / ext4 defaults,noatime,nobarrier 1 1 is ES_HEAP_SIZE configured @ the correct place (I did that wrong at first) it is in /etc/systconfig/elasticsearch All these options together can improve system performance huge specially when they are virtial. ps did you correctly changed your file descriptors? /etc/sysctl.conf fs.file-max = 65536 /etc/security/limits.conf * soft nproc 65535 * hard nproc 65535 * soft nofile 65535 * hard nofile 65535 /etc/security/limits.d/90-nproc.conf * soft nproc 65535 * hard nproc 65535 * soft nofile 65535 * hard nofile 65535 check fs performance with iota -a to see how it is. hth,, Arie Op dinsdag 12 mei 2015 23:52:19 UTC+2 schreef Pete GS: > > No further input on this? > > The Graylog master node now seems to regularly drop out also with the "Did > not find meta info of this node. Re-registering." message and it is under > no load as our load balancer doesn't direct any input messages to it. > > Cheers, Pete > > On Thursday, 7 May 2015 07:44:41 UTC+10, Pete GS wrote: >> >> I've come back to the office this morning and discovered we had an >> ElasticSearch issue last night which has resulted in lots of unprocessed >> messages in the journal. >> >> All the Graylog nodes are busy processing these and it seems to be slowly >> crunching through them. >> >> Load average (using htop) varies across the four nodes but I'm seeing a >> minimum of 13.59 11.80 and a maximum of 24.81 24.64. >> >> Interestingly enough the process buffer is only full on one of the nodes, >> the other three appear to be 10% full or less. >> >> The output buffers are all empty. >> >> The issue with ElasticSearch was running out of disk space which I've >> resolved for the moment but my business case for new hardware should solve >> that permanently. >> >> What other info can I give you guys to help me look in the right >> direction? >> >> Cheers, Pete >> >> On Wednesday, 6 May 2015 07:33:31 UTC+10, Pete GS wrote: >>> >>> Thanks for the replies guys. I'm away from the office today but will >>> check these things tomorrow. >>> >>> Mathieu, I will check the load average but from memory the 5 minute >>> average was around 12 or 18. I will confirm this tomorrow though. >>> >>> As for the "co stop" metric, I haven't used esxtop on these hosts but I >>> have looked at the CPU Ready metric and it seems to be ok (sub 5% >>> sustained). One of the physical hosts has exactly the same number of CPU's >>> allocated as the VM"s running on it, but the other two physical hosts have >>> no over-subscription of CPU's at all.There is no memory over subscription >>> on any hosts either. >>> >>> For the moment I have simply increased the CPU's on the existing nodes >>> as well as adding the two new ones. I am putting together a business case >>> for new hardware for the ElasticSearch cluster and if this goes ahead I >>> will move to a model of more Graylog nodes with less CPU's and memory for >>> each node as I think that will scale better. >>> >>> Arie, I will increase the output buffer processors tomorrow to see what >>> happens, but I do know that the process buffer gets quite full at times >>> while the output buffer is usually almost empty. >>> >>> On Wed, May 6, 2015 at 3:05 AM, Mathieu Grzybek <[email protected] >>> <javascript:>> wrote: >>> >>>> Also check « co stop » metric on VMware. I am sure you have too many >>>> vCPUs. >>>> >>>> Le 5 mai 2015 à 16:21, Arie <[email protected] <javascript:>> a écrit >>>> : >>>> >>>> What happens when you raise "outputbuffer_processors = 5" to >>>> "outputbuffer_processors = 10" ? >>>> >>>> Op dinsdag 5 mei 2015 02:23:37 UTC+2 schreef Pete GS: >>>>> >>>>> Yesterday I did a yum update on all Graylog and MongoDB nodes and >>>>> since doing that and rebooting them all (there was a kernel update) it >>>>> seems that there are no longer issues connecting to the Mongo database. >>>>> >>>>> However, I'm still seeing excessively high CPU usage on the Graylog >>>>> nodes where all vCPU's are regularly exceeding 95%. >>>>> >>>>> What can contribute to this? I'm a little stumped at present. >>>>> >>>>> I would say our average messages/second is around 5,000 to 6,000 with >>>>> peaks up to about 12,000. >>>>> >>>>> Cheers, Pete >>>>> >>>>> On Friday, 1 May 2015 08:20:35 UTC+10, Pete GS wrote: >>>>>> >>>>>> Does anyone have any thoughts on this? >>>>>> >>>>>> Even if someone could identify some scenarios that would cause high >>>>>> CPU on Graylog servers and in what circumstances Graylog would have >>>>>> trouble >>>>>> contacting the MongoDB servers. >>>>>> >>>>>> Cheers, Pete >>>>>> >>>>>> On Wednesday, 29 April 2015 10:34:28 UTC+10, Pete GS wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> We acquired a company a while ago and last week we added all of >>>>>>> their logs to our Graylog environment which all come in from their >>>>>>> Syslog >>>>>>> server via UDP. >>>>>>> >>>>>>> After this, I noticed that the Graylog servers were maxing CPU so to >>>>>>> alleviate this I increased CPU resources to the existing servers and >>>>>>> added >>>>>>> two new servers. >>>>>>> >>>>>>> I'm still seeing generally high CPU usage with peaks of 100% on all >>>>>>> four of the Graylog servers but I now have issues where they also seem >>>>>>> to >>>>>>> have issues connecting to MongoDB. >>>>>>> >>>>>>> I see lots of "[NodePingThread] Did not find meta info of this node. >>>>>>> Re-registering." streaming through the log files but it only seems to >>>>>>> happen when I have more than two Graylog servers running. >>>>>>> >>>>>>> I have verified NTP is installed and configured and all servers >>>>>>> including the MongoDB and ElasticSearch servers are sync'ing with the >>>>>>> same >>>>>>> NTP servers. >>>>>>> >>>>>>> We're doing less than 10,000 messages per second so with the >>>>>>> resources I've allocated I would have expected no issues whatsoever. >>>>>>> >>>>>>> I have seen this link: >>>>>>> https://groups.google.com/forum/?hl=en#!topic/graylog2/bW2glCdBIUI but >>>>>>> I don't believe it is our issue. >>>>>>> >>>>>>> If it truly is being caused by doing lots of reverse DNS lookups, I >>>>>>> would expect tcpdump to show me that traffic to our DNS servers, but I >>>>>>> see >>>>>>> almost no DNS lookups at all. >>>>>>> >>>>>>> We have 6 inputs in total but only one receives the bulk of the >>>>>>> Syslog UDP messages. Most of the other inputs are GELF UDP inputs. >>>>>>> >>>>>>> We also have 11 streams, however pausing these streams seems to have >>>>>>> little to no impact on the CPU usage. >>>>>>> >>>>>>> All the Graylog servers are virtualised on top of vSphere 5.5 Update >>>>>>> 2 with plenty of physical hardware available to service the workload >>>>>>> (little to no contention). >>>>>>> >>>>>>> The original two have 20 vCPU's and 32GB RAM, the additional two >>>>>>> have 16 vCPU's and 32GB RAM. >>>>>>> >>>>>>> Java heap on all is set to 16GB. >>>>>>> >>>>>>> This is all running on CentOS 6. >>>>>>> >>>>>>> Any input would be greatly appreciated as I'm a bit stumped on how >>>>>>> to get this resolved at present. >>>>>>> >>>>>>> Here is the config file I'm using (censored where appropriate): >>>>>>> >>>>>>> is_master = false >>>>>>> node_id_file = /etc/graylog2/server/node-id >>>>>>> password_secret = <Censored> >>>>>>> root_username = <Censored> >>>>>>> root_password_sha2 = <Censored> >>>>>>> plugin_dir = /usr/share/graylog2-server/plugin >>>>>>> rest_listen_uri = http://172.22.20.66:12900/ >>>>>>> >>>>>>> elasticsearch_max_docs_per_index = 20000000 >>>>>>> elasticsearch_max_number_of_indices = 999 >>>>>>> retention_strategy = close >>>>>>> elasticsearch_shards = 4 >>>>>>> elasticsearch_replicas = 1 >>>>>>> elasticsearch_index_prefix = graylog2 >>>>>>> allow_leading_wildcard_searches = true >>>>>>> allow_highlighting = true >>>>>>> elasticsearch_cluster_name = graylog2 >>>>>>> elasticsearch_node_name = bne3-0002las >>>>>>> elasticsearch_node_master = false >>>>>>> elasticsearch_node_data = false >>>>>>> elasticsearch_discovery_zen_ping_multicast_enabled = false >>>>>>> elasticsearch_discovery_zen_ping_unicast_hosts = >>>>>>> bne3-0001lai.server-web.com:9300,bne3-0002lai.server-web.com:9300, >>>>>>> bne3-0003lai.server-web.com:9300,bne3-0004lai.server-web.com:9300, >>>>>>> bne3-0005lai.server-web.com:9300,bne3-0006lai.server-web.com:9300, >>>>>>> bne3-0007lai.server-web.com:9300,bne3-0008lai.server-web.com:9300, >>>>>>> bne3-0009lai.server-web.com:9300 >>>>>>> elasticsearch_cluster_discovery_timeout = 5000 >>>>>>> elasticsearch_discovery_initial_state_timeout = 3s >>>>>>> elasticsearch_analyzer = standard >>>>>>> >>>>>>> output_batch_size = 5000 >>>>>>> output_flush_interval = 1 >>>>>>> processbuffer_processors = 20 >>>>>>> outputbuffer_processors = 5 >>>>>>> #outputbuffer_processor_keep_alive_time = 5000 >>>>>>> #outputbuffer_processor_threads_core_pool_size = 3 >>>>>>> #outputbuffer_processor_threads_max_pool_size = 30 >>>>>>> #udp_recvbuffer_sizes = 1048576 >>>>>>> processor_wait_strategy = blocking >>>>>>> ring_size = 65536 >>>>>>> >>>>>>> inputbuffer_ring_size = 65536 >>>>>>> inputbuffer_processors = 2 >>>>>>> inputbuffer_wait_strategy = blocking >>>>>>> >>>>>>> message_journal_enabled = true >>>>>>> message_journal_dir = /var/lib/graylog-server/journal >>>>>>> message_journal_max_age = 24h >>>>>>> message_journal_max_size = 150gb >>>>>>> message_journal_flush_age = 1m >>>>>>> message_journal_flush_interval = 1000000 >>>>>>> message_journal_segment_age = 1h >>>>>>> message_journal_segment_size = 1gb >>>>>>> >>>>>>> dead_letters_enabled = false >>>>>>> lb_recognition_period_seconds = 3 >>>>>>> >>>>>>> mongodb_useauth = true >>>>>>> mongodb_user = <Censored> >>>>>>> mongodb_password = <Censored> >>>>>>> mongodb_replica_set = bne3-0001ladb.server-web.com:27017, >>>>>>> bne3-0002ladb.server-web.com:27017 >>>>>>> mongodb_database = graylog2 >>>>>>> mongodb_max_connections = 200 >>>>>>> mongodb_threads_allowed_to_block_multiplier = 5 >>>>>>> >>>>>>> #rules_file = /etc/graylog2.drl >>>>>>> >>>>>>> # Email transport >>>>>>> transport_email_enabled = true >>>>>>> transport_email_hostname = <Censored> >>>>>>> transport_email_port = 25 >>>>>>> transport_email_use_auth = false >>>>>>> transport_email_use_tls = false >>>>>>> transport_email_use_ssl = false >>>>>>> transport_email_auth_username = [email protected] >>>>>>> transport_email_auth_password = secret >>>>>>> transport_email_subject_prefix = [graylog2] >>>>>>> transport_email_from_email = <Censored> >>>>>>> transport_email_web_interface_url = <Censored> >>>>>>> >>>>>>> message_cache_off_heap = false >>>>>>> message_cache_spool_dir = >>>>>>> /var/lib/graylog2-server/message-cache-spool >>>>>>> #message_cache_commit_interval = 1000 >>>>>>> #input_cache_max_size = 0 >>>>>>> >>>>>>> #ldap_connection_timeout = 2000 >>>>>>> >>>>>>> versionchecks = false >>>>>>> >>>>>>> #enable_metrics_collection = false >>>>>>> >>>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "graylog2" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected] <javascript:>. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "graylog2" group. >>>> To unsubscribe from this topic, visit >>>> https://groups.google.com/d/topic/graylog2/h6Si-ckfts8/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to >>>> [email protected] <javascript:>. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- You received this message because you are subscribed to the Google Groups "graylog2" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
