Turns out this was a resource issue.. My 3 nodes were running under VMWare 
- only had 2 cores/4Gb and I was trying to throw about 150K log messages at 
it per second. :)

Increasing memory/cpu allocations, and tweaking the graylog mem values 
(orig 1G -> 4G) and doing the same on the shared elasticsearch configs (up 
to 18G instead of 2G) cleared the issue. 


On Thursday, May 12, 2016 at 1:17:23 PM UTC-7, Jeff McCombs wrote:
>
> Hi gang,
>
>
>   I'm running into a strange problem where my graylog nodes are 
> complaining about not being able to find their meta info:
>
>
> 2016-05-12T11:50:09.691-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
> 2016-05-12T11:50:12.878-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
> 2016-05-12T11:50:13.417-07:00 WARN  [ProxiedResource] Node 
> <00ac0ad1-b96f-46c0-a2bc-bc9e7a90777f> not found while trying to call 
> org.graylog2.shared.rest.resources.system.RemoteMetricsResource on it.
>
> 2016-05-12T11:50:15.808-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
> 2016-05-12T11:50:19.175-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
> 2016-05-12T11:50:24.767-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
> 2016-05-12T11:50:28.020-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
> 2016-05-12T11:50:37.849-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
> 2016-05-12T11:50:40.978-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
> 2016-05-12T11:50:41.904-07:00 WARN  [ProxiedResource] Node 
> <00ac0ad1-b96f-46c0-a2bc-bc9e7a90777f> not found while trying to call 
> org.graylog2.shared.rest.resources.system.RemoteMetricsResource on it.
>
> 2016-05-12T11:50:47.400-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
> 2016-05-12T11:50:50.670-07:00 WARN  [NodePingThread] Did not find meta 
> info of this node. Re-registering.
>
>
> In addition to the log entries above, I see occasional timeouts and errors 
> in the web UI about master nodes no longer being available, or the web-UI 
> just disappears for a few seconds and comes back.. I've also seen nodes 
> drop in/out of the webUI.. I'm assuming these are related.
>
>
> Doing some basic google searches on this, the only thing I've seen on the 
> log entries, is that the time for the nodes may be out of sync.. I've 
> checked this and that's not the case here. All three nodes are running NTP 
> and chiming off the local ntp server on the network:
>
>
> [root@gray00 /data]# ntpdate -q ntp0
>
> server 10.201.136.38, stratum 3, offset -0.000653, delay 0.02576
>
> 12 May 12:29:33 ntpdate[317]: adjust time server 10.201.136.38 offset 
> -0.000653 sec
>
> [root@gray00 /data]# date
>
> Thu May 12 12:30:31 PDT 2016
>
>
> [root@gray01 graylog]# ntpdate -q ntp0
>
> server 10.201.136.38, stratum 3, offset -0.000568, delay 0.02576
>
> 12 May 12:29:22 ntpdate[31508]: adjust time server 10.201.136.38 offset 
> -0.000568 sec
>
> [root@gray01 graylog]# date
>
> Thu May 12 12:30:31 PDT 2016
>
>
> [root@gray02 /data]# ntpdate -q ntp0
>
> server 10.201.136.38, stratum 3, offset -0.000055, delay 0.02580
>
> 12 May 12:29:21 ntpdate[535]: adjust time server 10.201.136.38 offset 
> -0.000055 sec
>
> [root@gray02 /data]# date
>
> Thu May 12 12:30:32 PDT 2016
>
>
> So what am I doing wrong here? Is there some additional troubleshooting I 
> can perform to try and pinpoint the issue? Strangely, everything is fine if 
> I restart the graylog instances for about 5-10 minutes, then these log 
> entries start popping back up.
>
>
> Here's some deets on how I have things configured:
>
>
> 3x nodes - RHEL6 x64 (gray00, gray01, gray02). Installation via the repo's 
> for mongo, elasticsearch, and graylog.
>
>
> all three nodes run:
>
>    elasticsearch
>
>    mongo
>
>    graylog
>
>
> In front is an F5 LTM, Virtual IP on the F5 is known as "graylog". 
> Services ports 9000, and 12900. Sticky sessions enabled on both.
>
>
> Configuration data for graylog below. All nodes have the same core config 
> except for "is_master=false" and IP address changes:
>
> is_master = true
>
> node_id_file = /etc/graylog/server/node-id
>
> password_secret = 
> WQBdx6xgWTTykN9LHJhEGxfiSJbeYdaZhHhKEwbvAKQEWkVrl8lgTLvDDkfUtwhe7jgdFDFCBqpmVvY4aea1GyrbQ791UOCv
>
> root_password_sha2 = 
> e3ed009797ada49a3fd38a04069b13d5a7f62001a153ed4d9a3da22fa7a75c7b
>
> plugin_dir = /usr/share/graylog-server/plugin
>
> rest_listen_uri = http://10.201.137.208:12900/
>
> rest_transport_uri = http://graylog.somewhere.com:12900/
>
> rest_enable_gzip = true
>
> web_listen_uri = http://10.201.137.208:9000/
>
> web_enable_gzip = true
>
> rotation_strategy = count
>
> elasticsearch_max_docs_per_index = 20000000
>
> elasticsearch_max_number_of_indices = 20
>
> retention_strategy = delete
>
> elasticsearch_shards = 4
>
> elasticsearch_replicas = 1
>
> elasticsearch_index_prefix = graylog
>
> allow_leading_wildcard_searches = false
>
> allow_highlighting = false
>
> elasticsearch_cluster_name = graylog
>
> elasticsearch_node_name_prefix = graylog-
>
> elasticsearch_discovery_zen_ping_unicast_hosts = gray00.somewhere.com:9300, 
> gray01.somewhere.com:9300, gray02.somewhere.com:9300, 
> gray00.somewhere.com:9350, gray01.somewhere.com:9350, 
> gray02.somewhere.com:9350
>
> elasticsearch_transport_tcp_port = 9350
>
> elasticsearch_discovery_zen_ping_multicast_enabled = false
>
> elasticsearch_network_host = gray00.somewhere.com
>
> elasticsearch_network_bind_host = gray00.somewhere.com
>
> elasticsearch_network_publish_host = gray00.somewhere.com
>
> elasticsearch_analyzer = standard
>
> output_batch_size = 500
>
> output_flush_interval = 1
>
> output_fault_count_threshold = 5
>
> output_fault_penalty_seconds = 30
>
> processbuffer_processors = 5
>
> outputbuffer_processors = 3
>
> processor_wait_strategy = blocking
>
> ring_size = 65536
>
> inputbuffer_ring_size = 65536
>
> inputbuffer_processors = 2
>
> inputbuffer_wait_strategy = blocking
>
> message_journal_enabled = true
>
> message_journal_dir = /data/graylog/journal
>
> lb_recognition_period_seconds = 3
>
> mongodb_uri = mongodb://gray00.somewherecom:27017,gray01.somewhere.com,
> gray02.somewhere.com/graylog
>
> mongodb_max_connections = 1000
>
> mongodb_threads_allowed_to_block_multiplier = 5
>
> transport_email_enabled = true
>
> transport_email_hostname = smtp
>
> transport_email_port =25
>
> transport_email_subject_prefix = [graylog]
>
> transport_email_from_email = [email protected]
>
> transport_email_web_interface_url = https://graylog.somewhere.com
>
> content_packs_dir = /usr/share/graylog-server/contentpacks
>
> content_packs_auto_load = grok-patterns.json
>
>
> I'd appreciate any help or pointers anyone could give! 
>
>
> Thanks!
>
> -Jeff
>

-- 
You received this message because you are subscribed to the Google Groups 
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/graylog2/4aa95153-b3e2-4bce-9248-9e92f29ee801%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to