Pavel Shevaev <pacha.shevaev <at> gmail.com> writes:

> 
> On Thu, Jul 9, 2009 at 9:57 PM, Bernard Li<bernard <at> vanhpc.org> wrote:
> > Hi Pavel:
> >
> > On Wed, Jul 8, 2009 at 10:12 PM, Pavel Shevaev<pacha.shevaev <at> gmail.com>
wrote:
> >
> >> What are the best ways to pinpoint the problem? I guess it makes sense
> >> to run gmond with debug_level > 0...does it use syslog facility?
> >
> > gmond would actually stay in the foreground and log to the terminal.
> 
> So it can't be running in background and log its actions via syslog?
> 
> > Have you tried stracing gmond when it was hung to see what it was
> > doing?
> 
> Thanks for the tip, I'll try it the next time it hangs
> 
> > What OS/arch are you running?
> 
> Oh, it's Linux 2.6.24-gentoo x86_64
> 



Hi, 
I have the same issue and any help will be appreciated. I have looked up
archives, but have been unable to resolve it. 
I installed ganglia3.1.7.  The
collector gmond starts up and I see the host
as "up" for sometime and then the
host goes down. I ran gmond with debug=9 and redirected it to a log. 
I don't see
any errors. In fact, I do see these messages coming into the
 log even after the
host is being reported as down which means gmond is running : 
"              metric 'lwrite_sec' has value_threshold 1.000000
        metric 'phread_sec' being collected now
        metric 'phread_sec' has value_threshold 1.000000
        metric 'phwrite_sec' being collected now
        metric 'phwrite_sec' has value_threshold 1.000000
        sent message 'heartbeat' of length 52 with 0 errors
Processing a metric value message from machine1
Got a heartbeat message 1298316893
"

1. I am unable to telnet:
telnet localhost 8649
Trying ::1...
telnet: connect to address ::1: Network is unreachable
Trying 127.0.0.1...

2. Telnet to port 8651 works and sends an xml. I notice TN>TMAX in the XML if
that means anything (as I read in some other thread):
TN="21416" TMAX="180" 
3. `gstat -a` hangs
4. The only difference I see in gmond log and I am not sure
 if that is expected
behavior is that, initially  there are messages like and
 they are no longer there: 
"saving metadata for metric: phwrite_sec host: machine1
Processing a metric value message from machine1
***Allocating value packet for host--machine1-- and metric 
--phwrite_sec-- ****
"

I am running the simplest config of one gmetad and 
gmond on the same machine
(SunOS). I am using default multicast config. 
udp_send_channel {
  #bind_hostname = yes # Highly recommended, soon to be default.
                       # This option tells gmond to use a source address
                       # that resolves to the machine's hostname.  Without
                       # this, the metrics may appear to come from any
                       # interface and the DNS names associated with
                       # those IPs will be used to create the RRDs.
  mcast_join = 239.2.11.71
  port = 8649
  ttl = 1
}

/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
  mcast_join = 239.2.11.71
  #family = inet4
  port = 8649
  bind = 239.2.11.71
}

Thanks for helping,
-Avani





------------------------------------------------------------------------------
Index, Search & Analyze Logs and other IT data in Real-Time with Splunk 
Collect, index and harness all the fast moving IT data generated by your 
applications, servers and devices whether physical, virtual or in the cloud.
Deliver compliance at lower cost and gain new business insights. 
Free Software Download: http://p.sf.net/sfu/splunk-dev2dev
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to