Re: [Ganglia-general] gmond's on same multicast port won't communicate at same time
Wow... I didn't know there was an O'Reilly book on Ganglia! I will look into that for sure. Thanks for mentioning it. To answer your questions: 1. deaf and mute are set to 'no'. that must be the default setting since I've never messed with those settings before in all my years of working with ganglia. 2. Based on my answer to #1, I suppose that yes all my gmond hosts are aggregators. So the total number of gmond's is 150 right now, and I've got a got about another 20 that I'm trying to bring online (most of these systems have recently been rebuilt as RHEL 6.5 systems - and installing gmond on them is where my problem started). 3. You didn't say *where* to issue the telnet from. I know that I can be logged into one of my gmond's (that is working) and 'telnet localhost port' and see a 'HOST NAME' line for itself and the other gmond's that share the same port. I see the same thing for the other ports that my gmond's are grouped on. Now for the gmond's that I'm having problems with, I only see one 'HOST NAME' per gmond. They're not seeing their gmond buddies. Picking one of those, the size of the XML content is 13Kb. When I tried this on a different gmond where it *does* see it's fellow gmond's on the same port (total of 41 hosts), the size of the XML content was 252Kb. 4. We don't make use of ACL's in anything ganglia related. So no... none set. 5. That definitely was some kind of typo with that extra space since I can't even find that. :) On 12/4/14, 3:44 PM, Maciej Lasyk wrote: Are you afraid that we could see performance data of the Curiosity? :D First of all I would really suggest you read the Monitoring with Ganglia book (2012). It answers many questions and solves major problems. About your issue: 1. How do you set deaf and mute in gmond nodes? 2. How many listening gmonds (aggregators, hosts with deaf=no) do you have? (if using multicast than probably by default all gmond hosts are aggregators) 3. What is the size of the downloaded XML (telnet to gmond aggregator on port set in tcp_accept_channel)? Does it contain all hosts you monitor (write XML content to file and grep looking for 'HOST NAME' or smt like that) 4. Do you have any ACLs set in gmond configuration? 5. Btw - in the config section you shared you have a white-space in port number 8 204: /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { mcast_join = 239.2.11.71 port = 8 204 bind = 239.2.11.71 } Cheers, Maciej Lasyk GPG key ID: 4FED49C5 GPG public key: http://maciek.lasyk.info/gpg_maciej_lasyk.asc On Thu, Dec 4, 2014 at 9:20 PM, Chris Jones christopher.r.jo...@nasa.gov mailto:christopher.r.jo...@nasa.gov wrote: Being that I work at NASA, I'd rather not put entire files out there with names of hosts and ports and the like. :) My initial post had in it part of the gmond config's. My datasource line in my gmetad.conf file (for this one port) is simply something like this: data_source my_name gmond_hostA:8204 gmond_hostB:8204 If there's anything else specifically, just ask and I'll give it (with names changed to protect the innocent). -chris On 12/4/14, 3:15 PM, Maciej Lasyk wrote: Plz share your configs via pastbin Cheers, On December 4, 2014 9:06:08 PM CET, Chris Jones christopher.r.jo...@nasa.gov mailto:christopher.r.jo...@nasa.gov wrote: I'm still racking my brain with this problem I'm having. I've even ran 'tcpdump -i any port 8204' on my gmetad server and watched the traffic when I've got two gmond clients sending out multicast packets on port 8204 I can see handshaking between my server and *one* client. The other client via the tcpdump just shows packets being sent out - and no replying. On the server gui, I see only the one client showing up. I then stop gmond on the client that's 'working' and immediately on my other client, the tcpdump output changes to handshaking between the client and server - and the server's tcpdump also then changes to show the new client (the old one stops). Then eventually on the server gui I stop seeing the old client updating (the icon for the host turns that block of red... 'host down') and my new client shows up like nothing ever happened. This mak es no sense. I don't believe I've oversubscribed the number of gmond's on my server (around 150 maybe?). The gmetad server is running RHEL 6.2, and my two gmond clients are running RHEL 6.5. The strange
Re: [Ganglia-general] gmond's on same multicast port won't communicate at same time
I'm still racking my brain with this problem I'm having. I've even ran 'tcpdump -i any port 8204' on my gmetad server and watched the traffic when I've got two gmond clients sending out multicast packets on port 8204 I can see handshaking between my server and *one* client. The other client via the tcpdump just shows packets being sent out - and no replying. On the server gui, I see only the one client showing up. I then stop gmond on the client that's 'working' and immediately on my other client, the tcpdump output changes to handshaking between the client and server - and the server's tcpdump also then changes to show the new client (the old one stops). Then eventually on the server gui I stop seeing the old client updating (the icon for the host turns that block of red... 'host down') and my new client shows up like nothing ever happened. This makes no sense. I don't believe I've oversubscribed the number of gmond's on my server (around 150 maybe?). The gmetad server is running RHEL 6.2, and my two gmond clients are running RHEL 6.5. The strange thing is, it appears that only my RHEL 6.5 clients are having this problem. every other gmond client is either RHEL 5.x or SuSE 11.1 or 11.2. I've googled this problem til I'm blue in the face, gone back through the last few years of the ganglia-general mailing list archives as best I could with keyword searches, consulted many of my system admin. co-workers, and even tried using unicast instead of multicast (that didn't make a difference either). Nothing seems to matter. There's got to be somebody out there reading this mailing list who's got RHEL6.5 gmond clients. Anybody? Please? :) Thanks, -chris On 9/4/14, 12:46 PM, Karol Korytkowski wrote: I'm curious as of what the correct answer would be, but.. We have similar problem (forgive if not, I just scanned through your email), and some kind of solution was to use different data_source (@gmetad) for each of such issues and give them same cluster { name = "" } (@gmond). I think this has something to do with multicasts between switches, but so far noone has looked into this.. KK On Thu, Sep 4, 2014 at 4:59 PM, Chris Jones christopher.r.jo...@nasa.gov wrote: Here's my scenario. I've got some systems that were happily reporting in ganglia and they had to have their OS'es rebuilt. They're now running RHEL 6.5. I can be on my gmetad server, and tcpdump looking for packets from host1 and host2 and only see one. Both host1 host2 are running with the exact same gmond.conf configuration... same port. They both appear to be running correctly. But one shows more activity than the other when I run a 'netstat -an | grep 8204' (8204 is the port they run on). When I run 'telnet localhost 8204' on them both, they show me all the xml data that they're sending out. Both gmond clients are sending their multicast traffic across the same network also. But the server only seems to want to pick up one at a time. In my gmetad.conf file, the data_source line for this port only has two entries... host1:8204 host2:8204 (and these hosts are the fully qualified domain names... on the same network that the two hosts are sending their multicast across on). I can have both gmond's running but only one seems to generate all the tcp connections (like you see via 'netstat -an | grep 8204') where the other one doesn't. The one that does is the one I see on my gmetad server. On the gmetad server, I can run tcpdump on the appropriate network interface and look for traffic coming from my host1 and host2. I can only see one at a time. I should see both my hosts. I make that assumption because I can run that same type of command on another port for other hosts that are on it and get
Re: [Ganglia-general] gmond's on same multicast port won't communicate at same time
I'm still racking my brain with this problem I'm having. I've even ran 'tcpdump -i any port 8204' on my gmetad server and watched the traffic when I've got two gmond clients sending out multicast packets on port 8204 I can see handshaking between my server and *one* client. The other client via the tcpdump just shows packets being sent out - and no replying. On the server gui, I see only the one client showing up. I then stop gmond on the client that's 'working' and immediately on my other client, the tcpdump output changes to handshaking between the client and server - and the server's tcpdump also then changes to show the new client (the old one stops). Then eventually on the server gui I stop seeing the old client updating (the icon for the host turns that block of red... 'host down') and my new client shows up like nothing ever happened. This makes no sense. I don't believe I've oversubscribed the number of gmond's on my server (around 150 maybe?). The gmetad server is running RHEL 6.2, and my two gmond clients are running RHEL 6.5. The strange thing is, it appears that only my RHEL 6.5 clients are having this problem. every other gmond client is either RHEL 5.x or SuSE 11.1 or 11.2. I've googled this problem til I'm blue in the face, gone back through the last few years of the ganglia-general mailing list archives as best I could with keyword searches, consulted many of my system admin. co-workers, and even tried using unicast instead of multicast (that didn't make a difference either). Nothing seems to matter. There's got to be somebody out there reading this mailing list who's got RHEL6.5 gmond clients. Anybody? Please? :) Thanks, -chris On 9/4/14, 12:46 PM, Karol Korytkowski wrote: I'm curious as of what the correct answer would be, but.. We have similar problem (forgive if not, I just scanned through your email), and some kind of solution was to use different data_source (@gmetad) for each of such issues and give them same cluster { name = } (@gmond). I think this has something to do with multicasts between switches, but so far noone has looked into this.. KK On Thu, Sep 4, 2014 at 4:59 PM, Chris Jones christopher.r.jo...@nasa.gov mailto:christopher.r.jo...@nasa.gov wrote: Here's my scenario. I've got some systems that were happily reporting in ganglia and they had to have their OS'es rebuilt. They're now running RHEL 6.5. I can be on my gmetad server, and tcpdump looking for packets from host1 and host2 and only see one. Both host1 host2 are running with the exact same gmond.conf configuration... same port. They both appear to be running correctly. But one shows more activity than the other when I run a 'netstat -an | grep 8204' (8204 is the port they run on). When I run 'telnet localhost 8204' on them both, they show me all the xml data that they're sending out. Both gmond clients are sending their multicast traffic across the same network also. But the server only seems to want to pick up one at a time. In my gmetad.conf file, the data_source line for this port only has two entries... host1:8204 host2:8204 (and these hosts are the fully qualified domain names... on the same network that the two hosts are sending their multicast across on). I can have both gmond's running but only one seems to generate all the tcp connections (like you see via 'netstat -an | grep 8204') where the other one doesn't. The one that does is the one I see on my gmetad server. On the gmetad server, I can run tcpdump on the appropriate network interface and look for traffic coming from my host1 and host2. I can only see one at a time. I should see both my hosts. I make that assumption because I can run that same type of command on another port for other hosts that are on it and get back results lots of different hosts showing up because I have lots of hosts on that particular port. Here's what I'm guessing are the relevant entries from the gmond.conf file on my two hosts in question: /* The host section describes attributes of the host, like the location */ host { location = unspecified } /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { #bind_hostname = yes # Highly recommended, soon to be default. # This option tells gmond to use a source address # that resolves to the machine's hostname. Without # this, the metrics may appear to come from any # interface and the DNS names associated with # those IPs will be used to create the RRDs. mcast_join = 239.2.11.71 port =
Re: [Ganglia-general] gmond's on same multicast port won't communicate at same time
Being that I work at NASA, I'd rather not put entire files out there with names of hosts and ports and the like. :) My initial post had in it part of the gmond config's. My datasource line in my gmetad.conf file (for this one port) is simply something like this: data_source my_name gmond_hostA:8204 gmond_hostB:8204 If there's anything else specifically, just ask and I'll give it (with names changed to protect the innocent). -chris On 12/4/14, 3:15 PM, Maciej Lasyk wrote: Plz share your configs via pastbin Cheers, On December 4, 2014 9:06:08 PM CET, Chris Jones christopher.r.jo...@nasa.gov wrote: I'm still racking my brain with this problem I'm having. I've even ran 'tcpdump -i any port 8204' on my gmetad server and watched the traffic when I've got two gmond clients sending out multicast packets on port 8204 I can see handshaking between my server and *one* client. The other client via the tcpdump just shows packets being sent out - and no replying. On the server gui, I see only the one client showing up. I then stop gmond on the client that's 'working' and immediately on my other client, the tcpdump output changes to handshaking between the client and server - and the server's tcpdump also then changes to show the new client (the old one stops). Then eventually on the server gui I stop seeing the old client updating (the icon for the host turns that block of red... 'host down') and my new client shows up like nothing ever happened. This mak es no sense. I don't believe I've oversubscribed the number of gmond's on my server (around 150 maybe?). The gmetad server is running RHEL 6.2, and my two gmond clients are running RHEL 6.5. The strange thing is, it appears that only my RHEL 6.5 clients are having this problem. every other gmond client is either RHEL 5.x or SuSE 11.1 or 11.2. I've googled this problem til I'm blue in the face, gone back through the last few years of the ganglia-general mailing list archives as best I could with keyword searches, consulted many of my system admin. co-workers, and even tried using unicast instead of multicast (that didn't make a difference either). Nothing seems to matter. There's got to be somebody out there reading this mailing list who's got RHEL6.5 gmond clients. Anybody? Please? :) Thanks, -chris On 9/4/14, 12:46 PM, Karol Korytkowski wrote: I'm curious as of what the correct answer would be, but.. We have similar problem (forgive if not, I just scanned through your email), and some kind of solution was to use different data_source (@gmetad) for each of such issues and give them same cluster { name = } (@gmond). I think this has something to do with multicasts between switches, but so far noone has looked into this.. KK On Thu, Sep 4, 2014 at 4:59 PM, Chris Jones christopher.r.jo...@nasa.gov mailto:christopher.r.jo...@nasa.gov wrote: Here's my scenario. I've got some systems that were happily reporting in ganglia and they had to have their OS'es rebuilt. They're now running RHEL 6.5. I can be on my gmetad server, and tcpdump looking for packet s from host1 and host2 and only see one. Both host1 host2 are running with the exact same gmond.conf configuration... same port. They both appear to be running correctly. But one shows more activity than the other when I run a 'netstat -an | grep 8204' (8204 is the port they run on). When I run 'telnet localhost 8204' on them both, they show me all the xml data that they're sending out. Both gmond clients are sending their multicast traffic across the same network also. But the server only seems to want to pick up one at a time. In my gmetad.conf file, the data_source line for this port only has two entries... host1:8204 host2:8204 (and these hosts are the fully qualified domain names... on the same network that the two hosts are sending their multicast across on). I can have both gmond's running but only one seems to generate all t he tcp connections (like you see via 'netstat -an | grep 8204') where the other one doesn't. The one that does is the one I see on my gmetad server. On the gmetad server, I can run tcpdump on the appropriate network interface and look for traffic coming from my host1 and host2. I can only see one at a time. I should see both my hosts. I make that assumption because I can run that same type of command on another port for other hosts that are on it and get back results lots of different
Re: [Ganglia-general] gmond's on same multicast port won't communicate at same time
Are you afraid that we could see performance data of the Curiosity? :D First of all I would really suggest you read the Monitoring with Ganglia book (2012). It answers many questions and solves major problems. About your issue: 1. How do you set deaf and mute in gmond nodes? 2. How many listening gmonds (aggregators, hosts with deaf=no) do you have? (if using multicast than probably by default all gmond hosts are aggregators) 3. What is the size of the downloaded XML (telnet to gmond aggregator on port set in tcp_accept_channel)? Does it contain all hosts you monitor (write XML content to file and grep looking for 'HOST NAME' or smt like that) 4. Do you have any ACLs set in gmond configuration? 5. Btw - in the config section you shared you have a white-space in port number 8 204: /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { mcast_join = 239.2.11.71 port = 8 204 bind = 239.2.11.71 } Cheers, Maciej Lasyk GPG key ID: 4FED49C5 GPG public key: http://maciek.lasyk.info/gpg_maciej_lasyk.asc On Thu, Dec 4, 2014 at 9:20 PM, Chris Jones christopher.r.jo...@nasa.gov wrote: Being that I work at NASA, I'd rather not put entire files out there with names of hosts and ports and the like. :) My initial post had in it part of the gmond config's. My datasource line in my gmetad.conf file (for this one port) is simply something like this: data_source my_name gmond_hostA:8204 gmond_hostB:8204 If there's anything else specifically, just ask and I'll give it (with names changed to protect the innocent). -chris On 12/4/14, 3:15 PM, Maciej Lasyk wrote: Plz share your configs via pastbin Cheers, On December 4, 2014 9:06:08 PM CET, Chris Jones christopher.r.jo...@nasa.gov wrote: I'm still racking my brain with this problem I'm having. I've even ran 'tcpdump -i any port 8204' on my gmetad server and watched the traffic when I've got two gmond clients sending out multicast packets on port 8204 I can see handshaking between my server and *one* client. The other client via the tcpdump just shows packets being sent out - and no replying. On the server gui, I see only the one client showing up. I then stop gmond on the client that's 'working' and immediately on my other client, the tcpdump output changes to handshaking between the client and server - and the server's tcpdump also then changes to show the new client (the old one stops). Then eventually on the server gui I stop seeing the old client updating (the icon for the host turns that block of red... 'host down') and my new client shows up like nothing ever happened. This mak es no sense. I don't believe I've oversubscribed the number of gmond's on my server (around 150 maybe?). The gmetad server is running RHEL 6.2, and my two gmond clients are running RHEL 6.5. The strange thing is, it appears that only my RHEL 6.5 clients are having this problem. every other gmond client is either RHEL 5.x or SuSE 11.1 or 11.2. I've googled this problem til I'm blue in the face, gone back through the last few years of the ganglia-general mailing list archives as best I could with keyword searches, consulted many of my system admin. co-workers, and even tried using unicast instead of multicast (that didn't make a difference either). Nothing seems to matter. There's got to be somebody out there reading this mailing list who's got RHEL6.5 gmond clients. Anybody? Please? :) Thanks, -chris On 9/4/14, 12:46 PM, Karol Korytkowski wrote: I'm curious as of what the correct answer would be, but.. We have similar problem (forgive if not, I just scanned through your email), and some kind of solution was to use different data_source (@gmetad) for each of such issues and give them same cluster { name = } (@gmond). I think this has something to do with multicasts between switches, but so far noone has looked into this.. KK On Thu, Sep 4, 2014 at 4:59 PM, Chris Jones christopher.r.jo...@nasa.gov mailto:christopher.r.jo...@nasa.gov wrote: Here's my scenario. I've got some systems that were happily reporting in ganglia and they had to have their OS'es rebuilt. They're now running RHEL 6.5. I can be on my gmetad server, and tcpdump looking for packet s from host1 and host2 and only see one. Both host1 host2 are running with the exact same gmond.conf configuration... same port. They both appear to be running correctly. But one shows more activity than the other when I run a 'netstat -an | grep 8204' (8204 is the port they run on). When I run 'telnet localhost
Re: [Ganglia-general] gmond's on same multicast port won't communicate at same time
On Dec 4, 2014, at 2:06 PM, Chris Jones christopher.r.jo...@nasa.gov wrote: This makes no sense. I don't believe I've oversubscribed the number of gmond's on my server (around 150 maybe?). The gmetad server is running RHEL 6.2, and my two gmond clients are running RHEL 6.5. The strange thing is, it appears that only my RHEL 6.5 clients are having this problem. every other gmond client is either RHEL 5.x or SuSE 11.1 or 11.2. I've googled this problem til I'm blue in the face, gone back through the last few years of the ganglia-general mailing list archives as best I could with keyword searches, consulted many of my system admin. co-workers, and even tried using unicast instead of multicast (that didn't make a difference either). Nothing seems to matter. There's got to be somebody out there reading this mailing list who's got RHEL6.5 gmond clients. Anybody? Please? :) We have a random array of systems falling somewhere between RHEL5.1 and RHEL6.6 and we don’t see any issues like you’re describing. Running gmond 3.6.0.. which is a little old, but only one dot release behind the latest and greatest. I am using unicast, but you said you tried that and saw the same issue so I don’t really have any suggestions on what to try next from ganglia’s perspective. 150 clients is not oversubscribing ganglia.. we have clusters with 300+ nodes in them. The fact that you can only see one host communicating with the gmetad server at a time is pretty suspicious, it points to some kind of network health issue. Do netmasks check out? Switch supports multicast properly? Jumbo frames enabled on some ports but not others? Is the switch saturated? smime.p7s Description: S/MIME cryptographic signature -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] gmond's on same multicast port won't communicate at same time
Here's my scenario. I've got some systems that were happily reporting in ganglia and they had to have their OS'es rebuilt. They're now running RHEL 6.5. I can be on my gmetad server, and tcpdump looking for packets from host1 and host2 and only see one. Both host1 host2 are running with the exact same gmond.conf configuration... same port. They both appear to be running correctly. But one shows more activity than the other when I run a 'netstat -an | grep 8204' (8204 is the port they run on). When I run 'telnet localhost 8204' on them both, they show me all the xml data that they're sending out. Both gmond clients are sending their multicast traffic across the same network also. But the server only seems to want to pick up one at a time. In my gmetad.conf file, the data_source line for this port only has two entries... host1:8204 host2:8204 (and these hosts are the fully qualified domain names... on the same network that the two hosts are sending their multicast across on). I can have both gmond's running but only one seems to generate all the tcp connections (like you see via 'netstat -an | grep 8204') where the other one doesn't. The one that does is the one I see on my gmetad server. On the gmetad server, I can run tcpdump on the appropriate network interface and look for traffic coming from my host1 and host2. I can only see one at a time. I should see both my hosts. I make that assumption because I can run that same type of command on another port for other hosts that are on it and get back results lots of different hosts showing up because I have lots of hosts on that particular port. Here's what I'm guessing are the relevant entries from the gmond.conf file on my two hosts in question: /* The host section describes attributes of the host, like the location */ host { location = "unspecified" } /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { #bind_hostname = yes # Highly recommended, soon to be default. # This option tells gmond to use a source address # that resolves to the machine's hostname. Without # this, the metrics may appear to come from any # interface and the DNS names associated with # those IPs will be used to create the RRDs. mcast_join = 239.2.11.71 port = 8204 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { mcast_join = 239.2.11.71 port = 8204 bind = 239.2.11.71 } /* You can specify as many tcp_accept_channels as you like to share an xml description of the state of the cluster */ tcp_accept_channel { port = 8204 } Any insight would be appreciated. :) Thanks, -chris -- Chris Jones SSAI - ASDC Senior Systems Administrator Note to self: Insert cool signature here. -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] gmond's on same multicast port won't communicate at same time
Here's my scenario. I've got some systems that were happily reporting in ganglia and they had to have their OS'es rebuilt. They're now running RHEL 6.5. I can be on my gmetad server, and tcpdump looking for packets from host1 and host2 and only see one. Both host1 host2 are running with the exact same gmond.conf configuration... same port. They both appear to be running correctly. But one shows more activity than the other when I run a 'netstat -an | grep 8204' (8204 is the port they run on). When I run 'telnet localhost 8204' on them both, they show me all the xml data that they're sending out. Both gmond clients are sending their multicast traffic across the same network also. But the server only seems to want to pick up one at a time. In my gmetad.conf file, the data_source line for this port only has two entries... host1:8204 host2:8204 (and these hosts are the fully qualified domain names... on the same network that the two hosts are sending their multicast across on). I can have both gmond's running but only one seems to generate all the tcp connections (like you see via 'netstat -an | grep 8204') where the other one doesn't. The one that does is the one I see on my gmetad server. On the gmetad server, I can run tcpdump on the appropriate network interface and look for traffic coming from my host1 and host2. I can only see one at a time. I should see both my hosts. I make that assumption because I can run that same type of command on another port for other hosts that are on it and get back results lots of different hosts showing up because I have lots of hosts on that particular port. Here's what I'm guessing are the relevant entries from the gmond.conf file on my two hosts in question: /* The host section describes attributes of the host, like the location */ host { location = unspecified } /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { #bind_hostname = yes # Highly recommended, soon to be default. # This option tells gmond to use a source address # that resolves to the machine's hostname. Without # this, the metrics may appear to come from any # interface and the DNS names associated with # those IPs will be used to create the RRDs. mcast_join = 239.2.11.71 port = 8204 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { mcast_join = 239.2.11.71 port = 8204 bind = 239.2.11.71 } /* You can specify as many tcp_accept_channels as you like to share an xml description of the state of the cluster */ tcp_accept_channel { port = 8204 } Any insight would be appreciated. :) Thanks, -chris -- Chris Jones SSAI - ASDC Senior Systems Administrator Note to self: Insert cool signature here. -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] gmond's on same multicast port won't communicate at same time
I'm curious as of what the correct answer would be, but.. We have similar problem (forgive if not, I just scanned through your email), and some kind of solution was to use different data_source (@gmetad) for each of such issues and give them same cluster { name = } (@gmond). I think this has something to do with multicasts between switches, but so far noone has looked into this.. KK On Thu, Sep 4, 2014 at 4:59 PM, Chris Jones christopher.r.jo...@nasa.gov wrote: Here's my scenario. I've got some systems that were happily reporting in ganglia and they had to have their OS'es rebuilt. They're now running RHEL 6.5. I can be on my gmetad server, and tcpdump looking for packets from host1 and host2 and only see one. Both host1 host2 are running with the exact same gmond.conf configuration... same port. They both appear to be running correctly. But one shows more activity than the other when I run a 'netstat -an | grep 8204' (8204 is the port they run on). When I run 'telnet localhost 8204' on them both, they show me all the xml data that they're sending out. Both gmond clients are sending their multicast traffic across the same network also. But the server only seems to want to pick up one at a time. In my gmetad.conf file, the data_source line for this port only has two entries... host1:8204 host2:8204 (and these hosts are the fully qualified domain names... on the same network that the two hosts are sending their multicast across on). I can have both gmond's running but only one seems to generate all the tcp connections (like you see via 'netstat -an | grep 8204') where the other one doesn't. The one that does is the one I see on my gmetad server. On the gmetad server, I can run tcpdump on the appropriate network interface and look for traffic coming from my host1 and host2. I can only see one at a time. I should see both my hosts. I make that assumption because I can run that same type of command on another port for other hosts that are on it and get back results lots of different hosts showing up because I have lots of hosts on that particular port. Here's what I'm guessing are the relevant entries from the gmond.conf file on my two hosts in question: /* The host section describes attributes of the host, like the location */ host { location = unspecified } /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { #bind_hostname = yes # Highly recommended, soon to be default. # This option tells gmond to use a source address # that resolves to the machine's hostname. Without # this, the metrics may appear to come from any # interface and the DNS names associated with # those IPs will be used to create the RRDs. mcast_join = 239.2.11.71 port = 8204 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { mcast_join = 239.2.11.71 port = 8204 bind = 239.2.11.71 } /* You can specify as many tcp_accept_channels as you like to share an xml description of the state of the cluster */ tcp_accept_channel { port = 8204 } Any insight would be appreciated. :) Thanks, -chris -- Chris Jones SSAI - ASDC Senior Systems Administrator Note to self: Insert cool signature here. -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Slashdot TV. Video for Nerds. Stuff that matters. http://tv.slashdot.org/___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] gmond's on same multicast port won't communicate at same time
Here's my scenario. I've got some systems that were happily reporting in ganglia and they had to have their OS'es rebuilt. They're now running RHEL 6.5. I can be on my gmetad server, and tcpdump looking for packets from host1 and host2 and only see one. Both host1 host2 are running with the exact same gmond.conf configuration... same port. They both appear to be running correctly. But one shows more activity than the other when I run a 'netstat -an | grep 8204' (8204 is the port they run on). When I run 'telnet localhost 8204' on them both, they show me all the xml data that they're sending out. Both gmond clients are sending their multicast traffic across the same network also. But the server only seems to want to pick up one at a time. In my gmetad.conf file, the data_source line for this port only has two entries... host1:8204 host2:8204 (and these hosts are the fully qualified domain names... on the same network that the two hosts are sending their multicast across on). I can have both gmond's running but only one seems to generate all the tcp connections (like you see via 'netstat -an | grep 8204') where the other one doesn't. The one that does is the one I see on my gmetad server. On the gmetad server, I can run tcpdump on the appropriate network interface and look for traffic coming from my host1 and host2. I can only see one at a time. I should see both my hosts. I make that assumption because I can run that same type of command on another port for other hosts that are on it and get back results lots of different hosts showing up because I have lots of hosts on that particular port. Here's what I'm guessing are the relevant entries from the gmond.conf file on my two hosts in question: /* The host section describes attributes of the host, like the location */ host { location = "unspecified" } /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { #bind_hostname = yes # Highly recommended, soon to be default. # This option tells gmond to use a source address # that resolves to the machine's hostname. Without # this, the metrics may appear to come from any # interface and the DNS names associated with # those IPs will be used to create the RRDs. mcast_join = 239.2.11.71 port = 8204 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { mcast_join = 239.2.11.71 port = 8204 bind = 239.2.11.71 } /* You can specify as many tcp_accept_channels as you like to share an xml description of the state of the cluster */ tcp_accept_channel { port = 8204 } Any insight would be appreciated. :) Thanks, -chris -- Chris Jones SSAI - ASDC Senior Systems Administrator Note to self: Insert cool signature here. -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general