Jesse Becker <[EMAIL PROTECTED]> wrote:
> On Fri, Oct 17, 2008 at 16:24, Ofer Inbar <[EMAIL PROTECTED]> wrote:
> > Ganglia 3.1.0 on CentOS 4.
> 
> Ganglia 3.1.1, Solaris 10, Sparc.

> I'm also seeing a blocked gmond, although my situation may be slightly
> different.

> > I checked that gmond was running on that host, and it was.
> > However, attempts to connect to its port 8649 would indeed timeout.
> 
> Same here.  Gmond will run fine for a while, then fail to respond to
> TCP connections.  Running 'telnet localhost 8649' fails to connect.
> In my case, "a while" ranges from minutes to hours--I've been testing
> this off and on since yesterday.
> 
> Restarting gmond on the aggregation host will fix the problem...for a while.
> 
> Another important point is that gmond has *not* completely hung.
> Running it under debug mode (-d5) shows that it is both collecting
> metrics from the local system, and accepting metrics from the two
> other hosts.  The problem appears to be specifically with responding
> to TCP connections.

That does sound somewhat different.

In my case, tracing the running gmond showed:

  # strace -p 16830
  Process 16830 attached - interrupt to quit
  write(7, "<EXTRA_DATA>\n", 13 <unfinished ...>
  Process 16830 detached

I had to ^C to get the <unfinished ...>, so when I watching it was
just sitting there waiting for the write to finish.  I think it was
trying to write to a TCP socket, because the lsof sample I took a
little bit later shows no file descriptor 7, but I *had* had an
attempt to connect to its port 8649, which I broke off before the lsof.

Also in my case, once I restarted it, it hasn't happened again.
Doesn't mean it won't ever, but it at least isn't frequent.  I have
more than 40 gmonds running and have only seen one of them do this.
  -- Cos

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to