On Thu, Dec 10, 2009 at 16:51:00PM +0100, Carlo Marcelo Arenas Belon wrote:
>On Thu, Dec 10, 2009 at 04:17:18PM +0100, Samuel Gimeno wrote:
>> All Xml of all gmond and gmetad are well formed, all echos OK.
>>
>> Did you say something about apparmor problems? What I can make to fix it? I
>> think that that can be the problem all the other things I tried are good...
>
>no idea as I don't use OpenSUSE but google suggested you try :
>
>http://developer.novell.com/wiki/index.php/Apparmor_FAQ#How_do_I_enable.2Fdisable_AppArmor.3F
>
>Carlo

It has been almost a month since the last reply in this thread, but I
started running into this problem too on a couple of OpenSUSE
clusters.

Here is my setup:
Cluster #1 - Distro: OpenSUSE 11.1, Architecture: x86, Ganglia
Version: 3.0.7, gmond port: 9551, gmetad port: standard
Cluster #2 - Distro: OpenSUSE 11.0, Architecture: x86_64, Ganglia
Version: 3.1.2, gmond port: 8670, gmetad port: standard

Each cluster has a node with gmetad and the web-frontend installed.

Ganglia had been working fine on these clusters for several months. I
suspect some recent security patches may have broken things in late
December.

Like Samuel Gimeno, I was experiencing problems getting the
web-frontend to work. On both clusters, I was receiving the following
error message:
There was an error collecting ganglia data (127.0.0.1:8652): XML
error: Invalid document end at line_number

I ran the output of gmond and gmetad through xmllint. The program
reported the XML was acceptable. So, I started reviewing the
web-frontend code. After spending a few hours going through the code,
I discovered the frontend was only reading in small portions of the
gmetad XML tree. Here is the offending code:

File: ganglia.php, function: Gmetad
   $start = gettimeofday();

   while(!feof($fp))
      {
         $data = fread($fp, 16384);
         if (!xml_parse($parser, $data, feof($fp)))
            {
               $error = sprintf("XML error: %s at %d",
                  xml_error_string(xml_get_error_code($parser)),
                  xml_get_current_line_number($parser));
               fclose($fp);
               return FALSE;
            }
      }
   fclose($fp);

fread was only reading a fraction of the 16,384 bytes that it was
supposed to on each pass. This caused xml_parse to fail on the first
pass for me. So, I rewrote the code as follows:
   $start = gettimeofday();
   $data = "";
   while(!feof($fp))
      {
         $data .= fread($fp, 32);
      }
   fclose($fp);

   if (!xml_parse($parser, $data))
      {
         $error = sprintf("XML error: %s at %d",
         xml_error_string(xml_get_error_code($parser)),
         xml_get_current_line_number($parser));
         return FALSE;
      }

This eliminated the XML parsing errors. However, two other problems appeared:

1. The "Ganglia cannot find a data source. Is gmond running?" appears
intermittently. When this happens, the follow appears in
/var/log/messages:
n1 /.nfs/home/apps/ganglia/3.1.2/sbin/gmetad[24785]: server_thread()
1136322896 unable to write root epilog
Refreshing will usually bring up the graph page.

2. On my x86_64 cluster, the number of nodes that are being reported
varies between refreshes. Sometimes I'll see all of them. Other times,
I'll only see a fraction. The number seems to vary randomly between
refreshes. What is weird is that the metrics are being added correctly
up in the "Overall Activity" graphs. It is just that the "Hosts Up",
"CPUs Total", and individual node graphs are not being displayed
correctly on all page displays.

Disabling AppArmor has not resulted in any appreciable changes in
activity. I also compiled Ganglia 3.1.5 and tested it. I ran into the
same set of problems.

If anyone can offer some debugging advice, I would appreciate it.

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to