Hi Jeremy: I would suggest that you try running gmond and gmetad in debug mode and see if it gives you any additional info.
How big are your clusters? Cheers, Bernard On Tue, Jan 5, 2010 at 5:05 PM, Jeremy Stout <[email protected]> wrote: > On Thu, Dec 10, 2009 at 16:51:00PM +0100, Carlo Marcelo Arenas Belon wrote: >>On Thu, Dec 10, 2009 at 04:17:18PM +0100, Samuel Gimeno wrote: >>> All Xml of all gmond and gmetad are well formed, all echos OK. >>> >>> Did you say something about apparmor problems? What I can make to fix it? I >>> think that that can be the problem all the other things I tried are good... >> >>no idea as I don't use OpenSUSE but google suggested you try : >> >>http://developer.novell.com/wiki/index.php/Apparmor_FAQ#How_do_I_enable.2Fdisable_AppArmor.3F >> >>Carlo > > It has been almost a month since the last reply in this thread, but I > started running into this problem too on a couple of OpenSUSE > clusters. > > Here is my setup: > Cluster #1 - Distro: OpenSUSE 11.1, Architecture: x86, Ganglia > Version: 3.0.7, gmond port: 9551, gmetad port: standard > Cluster #2 - Distro: OpenSUSE 11.0, Architecture: x86_64, Ganglia > Version: 3.1.2, gmond port: 8670, gmetad port: standard > > Each cluster has a node with gmetad and the web-frontend installed. > > Ganglia had been working fine on these clusters for several months. I > suspect some recent security patches may have broken things in late > December. > > Like Samuel Gimeno, I was experiencing problems getting the > web-frontend to work. On both clusters, I was receiving the following > error message: > There was an error collecting ganglia data (127.0.0.1:8652): XML > error: Invalid document end at line_number > > I ran the output of gmond and gmetad through xmllint. The program > reported the XML was acceptable. So, I started reviewing the > web-frontend code. After spending a few hours going through the code, > I discovered the frontend was only reading in small portions of the > gmetad XML tree. Here is the offending code: > > File: ganglia.php, function: Gmetad > $start = gettimeofday(); > > while(!feof($fp)) > { > $data = fread($fp, 16384); > if (!xml_parse($parser, $data, feof($fp))) > { > $error = sprintf("XML error: %s at %d", > xml_error_string(xml_get_error_code($parser)), > xml_get_current_line_number($parser)); > fclose($fp); > return FALSE; > } > } > fclose($fp); > > fread was only reading a fraction of the 16,384 bytes that it was > supposed to on each pass. This caused xml_parse to fail on the first > pass for me. So, I rewrote the code as follows: > $start = gettimeofday(); > $data = ""; > while(!feof($fp)) > { > $data .= fread($fp, 32); > } > fclose($fp); > > if (!xml_parse($parser, $data)) > { > $error = sprintf("XML error: %s at %d", > xml_error_string(xml_get_error_code($parser)), > xml_get_current_line_number($parser)); > return FALSE; > } > > This eliminated the XML parsing errors. However, two other problems appeared: > > 1. The "Ganglia cannot find a data source. Is gmond running?" appears > intermittently. When this happens, the follow appears in > /var/log/messages: > n1 /.nfs/home/apps/ganglia/3.1.2/sbin/gmetad[24785]: server_thread() > 1136322896 unable to write root epilog > Refreshing will usually bring up the graph page. > > 2. On my x86_64 cluster, the number of nodes that are being reported > varies between refreshes. Sometimes I'll see all of them. Other times, > I'll only see a fraction. The number seems to vary randomly between > refreshes. What is weird is that the metrics are being added correctly > up in the "Overall Activity" graphs. It is just that the "Hosts Up", > "CPUs Total", and individual node graphs are not being displayed > correctly on all page displays. > > Disabling AppArmor has not resulted in any appreciable changes in > activity. I also compiled Ganglia 3.1.5 and tested it. I ran into the > same set of problems. > > If anyone can offer some debugging advice, I would appreciate it. > > ------------------------------------------------------------------------------ > This SF.Net email is sponsored by the Verizon Developer Community > Take advantage of Verizon's best-in-class app development support > A streamlined, 14 day to market process makes app distribution fast and easy > Join now and get one step closer to millions of Verizon customers > http://p.sf.net/sfu/verizon-dev2dev > _______________________________________________ > Ganglia-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/ganglia-general > ------------------------------------------------------------------------------ This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

