Rick, Thanks for your help on this. That's disappointing. I guess I'll either move to unicast or set up multicast routing. I like the ease and redundancy of multicast, but I don't really like having all my nodes broadcast continually across the entire cluster.
Have a great weekend! dbr Rick Cobb wrote: > One more time trying to get this thread back on the list instead of > just between David & me. > > And I'll disclaim expertise on 3.1 here. On the other hand, I've been > in this code more times than I wanted to in 3.0, and I don't think the > fundamental design of gmetad was affected by 3.0 -> 3.1. > > I'm not claiming the behavior is consistent, of course -- but really, > the fact that this looks like it kind-of-works is the core bug. > > Gmetad thinks 1 datasource == 1 cluster. In particular, it runs one > thread per datasource, and that thread maintains the cluster summary > metrics. You can construct things so the front-end & cluster > directories pretend that 3 datasources == 1 cluster, but it doesn't > work, and that's what you're running into. I've done it > intentionally, actually, and simply ignored the fact that summaries > were wrong for that cluster, but I don't recommend it. In particular, > you're probably getting a ton of 'rrd_update' errors for summary_info > RRDfiles in your syslog. > > To get consistent behavior, your options are: > * Treat this as one *grid* of 3 clusters. Grid summaries will work, > but the meta view isn't nearly as functional as the cluster view, so > it's not really optimal for what you want. > * Get these to come in as one datasource. Since you have separate > multicast domains, you may have to resort to unicast to do this. > > Sorry to be the bearer of bad news, but it would take a fairly nasty > bit of gmetad hacking to fix this -- and it would deeply affect > scalability, since the single-thread-per-datasource solution removes a > lot of opportunities for lock contention. > > -- ReC > - Show quoted text - > > > > On Fri, Oct 29, 2010 at 9:52 AM, David B Ritch <[email protected] > <mailto:[email protected]>> wrote: > > Thanks, Rick. Unfortunately, that doesn't seem to be the problem > I'm running into. I do have the cluster name set to Datanodes in > all the client. Otherwise, I wouldn't expect it to show all of > them when I click Show Hosts. > > dbr > > Rick Cobb wrote: > > This is such a common misconception that the development team > should consider removing the name field from the data_source > configuration line entirely. > > Fundamentally, cluster names come from the gmond.conf files. > The names of datasources exist only to confuse the hell out > of you and create bugs. You need to change those gmond.conf's > to match the cluster names you want. > IIRC, it's a good idea for the datasources lines to match > those because they actually are used in a few places and > having them *not* match just confuses the next guy who > maintains your system. > > -- ReC > > > On Fri, Oct 29, 2010 at 6:15 AM, David B. Ritch > <[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>> > wrote: > > I'm running Ganglia-3.1.7 under RHEL-5.5 on a cluster. My > nodes are > divided into different classes for monitoring. My largest > class of > nodes, datanodes, spans 3 VLANs, and I don't route > multicast between > those domains. I have the following in gmetad.conf on my > master node: > > data_source "Datanodes" r01n40-ge:8649 r03n40-ge:8649 > r05n40-ge:8649 > data_source "Datanodes2" r11n40-ge:8649 r13n40-ge:8649 > r15n40-ge:8649 > data_source "Datanodes3" r21n40-ge:8649 r23n40-ge:8649 > r25n40-ge:8649 > > Each datanode has "Datanodes" specified as its cluster name. > > When I look at the web interface, at the grid level, the > summary of my > Datanodes only shows 1/3 of my datanodes. When I select > the Datanodes > cluster (Grid > Datanodes), and select Show Hosts: no, I > see the same > graph and the same number of nodes. However, when I select > Show > Hosts: > yes, The Hosts up: and CPUs Total both jump up to the > proper totals. > > Apparently, gmetad sees all the nodes and puts them in the > right > cluster, but doesn't calculate the summaries properly. > > Am I doing something wrong, or is the a problem in Ganglia? > > Thanks! > > David > > > > ------------------------------------------------------------------------------ > Nokia and AT&T present the 2010 Calling All Innovators-North > America contest > Create new apps & games for the Nokia N8 for consumers in U.S. > and Canada > $10 million total in prizes - $4M cash, 500 devices, nearly > $6M in > marketing > Develop with Nokia Qt SDK, Web Runtime, or Java and Publish > to Ovi > Store > http://p.sf.net/sfu/nokia-dev2dev > _______________________________________________ > Ganglia-general mailing list > [email protected] > <mailto:[email protected]> > <mailto:[email protected] > <mailto:[email protected]>> > > https://lists.sourceforge.net/lists/listinfo/ganglia-general > > > > ------------------------------------------------------------------------------ Nokia and AT&T present the 2010 Calling All Innovators-North America contest Create new apps & games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

