Turns out this was a resource issue.. My 3 nodes were running under VMWare - only had 2 cores/4Gb and I was trying to throw about 150K log messages at it per second. :)
Increasing memory/cpu allocations, and tweaking the graylog mem values (orig 1G -> 4G) and doing the same on the shared elasticsearch configs (up to 18G instead of 2G) cleared the issue. On Thursday, May 12, 2016 at 1:17:23 PM UTC-7, Jeff McCombs wrote: > > Hi gang, > > > I'm running into a strange problem where my graylog nodes are > complaining about not being able to find their meta info: > > > 2016-05-12T11:50:09.691-07:00 WARN [NodePingThread] Did not find meta > info of this node. Re-registering. > > 2016-05-12T11:50:12.878-07:00 WARN [NodePingThread] Did not find meta > info of this node. Re-registering. > > 2016-05-12T11:50:13.417-07:00 WARN [ProxiedResource] Node > <00ac0ad1-b96f-46c0-a2bc-bc9e7a90777f> not found while trying to call > org.graylog2.shared.rest.resources.system.RemoteMetricsResource on it. > > 2016-05-12T11:50:15.808-07:00 WARN [NodePingThread] Did not find meta > info of this node. Re-registering. > > 2016-05-12T11:50:19.175-07:00 WARN [NodePingThread] Did not find meta > info of this node. Re-registering. > > 2016-05-12T11:50:24.767-07:00 WARN [NodePingThread] Did not find meta > info of this node. Re-registering. > > 2016-05-12T11:50:28.020-07:00 WARN [NodePingThread] Did not find meta > info of this node. Re-registering. > > 2016-05-12T11:50:37.849-07:00 WARN [NodePingThread] Did not find meta > info of this node. Re-registering. > > 2016-05-12T11:50:40.978-07:00 WARN [NodePingThread] Did not find meta > info of this node. Re-registering. > > 2016-05-12T11:50:41.904-07:00 WARN [ProxiedResource] Node > <00ac0ad1-b96f-46c0-a2bc-bc9e7a90777f> not found while trying to call > org.graylog2.shared.rest.resources.system.RemoteMetricsResource on it. > > 2016-05-12T11:50:47.400-07:00 WARN [NodePingThread] Did not find meta > info of this node. Re-registering. > > 2016-05-12T11:50:50.670-07:00 WARN [NodePingThread] Did not find meta > info of this node. Re-registering. > > > In addition to the log entries above, I see occasional timeouts and errors > in the web UI about master nodes no longer being available, or the web-UI > just disappears for a few seconds and comes back.. I've also seen nodes > drop in/out of the webUI.. I'm assuming these are related. > > > Doing some basic google searches on this, the only thing I've seen on the > log entries, is that the time for the nodes may be out of sync.. I've > checked this and that's not the case here. All three nodes are running NTP > and chiming off the local ntp server on the network: > > > [root@gray00 /data]# ntpdate -q ntp0 > > server 10.201.136.38, stratum 3, offset -0.000653, delay 0.02576 > > 12 May 12:29:33 ntpdate[317]: adjust time server 10.201.136.38 offset > -0.000653 sec > > [root@gray00 /data]# date > > Thu May 12 12:30:31 PDT 2016 > > > [root@gray01 graylog]# ntpdate -q ntp0 > > server 10.201.136.38, stratum 3, offset -0.000568, delay 0.02576 > > 12 May 12:29:22 ntpdate[31508]: adjust time server 10.201.136.38 offset > -0.000568 sec > > [root@gray01 graylog]# date > > Thu May 12 12:30:31 PDT 2016 > > > [root@gray02 /data]# ntpdate -q ntp0 > > server 10.201.136.38, stratum 3, offset -0.000055, delay 0.02580 > > 12 May 12:29:21 ntpdate[535]: adjust time server 10.201.136.38 offset > -0.000055 sec > > [root@gray02 /data]# date > > Thu May 12 12:30:32 PDT 2016 > > > So what am I doing wrong here? Is there some additional troubleshooting I > can perform to try and pinpoint the issue? Strangely, everything is fine if > I restart the graylog instances for about 5-10 minutes, then these log > entries start popping back up. > > > Here's some deets on how I have things configured: > > > 3x nodes - RHEL6 x64 (gray00, gray01, gray02). Installation via the repo's > for mongo, elasticsearch, and graylog. > > > all three nodes run: > > elasticsearch > > mongo > > graylog > > > In front is an F5 LTM, Virtual IP on the F5 is known as "graylog". > Services ports 9000, and 12900. Sticky sessions enabled on both. > > > Configuration data for graylog below. All nodes have the same core config > except for "is_master=false" and IP address changes: > > is_master = true > > node_id_file = /etc/graylog/server/node-id > > password_secret = > WQBdx6xgWTTykN9LHJhEGxfiSJbeYdaZhHhKEwbvAKQEWkVrl8lgTLvDDkfUtwhe7jgdFDFCBqpmVvY4aea1GyrbQ791UOCv > > root_password_sha2 = > e3ed009797ada49a3fd38a04069b13d5a7f62001a153ed4d9a3da22fa7a75c7b > > plugin_dir = /usr/share/graylog-server/plugin > > rest_listen_uri = http://10.201.137.208:12900/ > > rest_transport_uri = http://graylog.somewhere.com:12900/ > > rest_enable_gzip = true > > web_listen_uri = http://10.201.137.208:9000/ > > web_enable_gzip = true > > rotation_strategy = count > > elasticsearch_max_docs_per_index = 20000000 > > elasticsearch_max_number_of_indices = 20 > > retention_strategy = delete > > elasticsearch_shards = 4 > > elasticsearch_replicas = 1 > > elasticsearch_index_prefix = graylog > > allow_leading_wildcard_searches = false > > allow_highlighting = false > > elasticsearch_cluster_name = graylog > > elasticsearch_node_name_prefix = graylog- > > elasticsearch_discovery_zen_ping_unicast_hosts = gray00.somewhere.com:9300, > gray01.somewhere.com:9300, gray02.somewhere.com:9300, > gray00.somewhere.com:9350, gray01.somewhere.com:9350, > gray02.somewhere.com:9350 > > elasticsearch_transport_tcp_port = 9350 > > elasticsearch_discovery_zen_ping_multicast_enabled = false > > elasticsearch_network_host = gray00.somewhere.com > > elasticsearch_network_bind_host = gray00.somewhere.com > > elasticsearch_network_publish_host = gray00.somewhere.com > > elasticsearch_analyzer = standard > > output_batch_size = 500 > > output_flush_interval = 1 > > output_fault_count_threshold = 5 > > output_fault_penalty_seconds = 30 > > processbuffer_processors = 5 > > outputbuffer_processors = 3 > > processor_wait_strategy = blocking > > ring_size = 65536 > > inputbuffer_ring_size = 65536 > > inputbuffer_processors = 2 > > inputbuffer_wait_strategy = blocking > > message_journal_enabled = true > > message_journal_dir = /data/graylog/journal > > lb_recognition_period_seconds = 3 > > mongodb_uri = mongodb://gray00.somewherecom:27017,gray01.somewhere.com, > gray02.somewhere.com/graylog > > mongodb_max_connections = 1000 > > mongodb_threads_allowed_to_block_multiplier = 5 > > transport_email_enabled = true > > transport_email_hostname = smtp > > transport_email_port =25 > > transport_email_subject_prefix = [graylog] > > transport_email_from_email = [email protected] > > transport_email_web_interface_url = https://graylog.somewhere.com > > content_packs_dir = /usr/share/graylog-server/contentpacks > > content_packs_auto_load = grok-patterns.json > > > I'd appreciate any help or pointers anyone could give! > > > Thanks! > > -Jeff > -- You received this message because you are subscribed to the Google Groups "Graylog Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/graylog2/4aa95153-b3e2-4bce-9248-9e92f29ee801%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
