Re: [Ganglia-general] gmond hangs every time

Arne Brutschy Mon, 15 Nov 2010 09:46:00 -0800

Hi,

> And `gmond -d 2` gives you no additional information?


No, at least not that I understand what's going on. It seems to collect
information, but only locally (?). From time to time it seems to get
packets from nodes, but this happens very rarely (like, once every
minute). When it's running, I cannot terminate it using CTRL-C, I have
to kill it with a -9. Usually, it's waiting with an output similar to
this:

                metric 'cpu_wio' has value_threshold 10.000000
                metric 'proc_total' being collected now
                metric 'proc_total' has value_threshold 10.000000
                metric 'cpu_num' being collected now
                metric 'cpu_num' has value_threshold 10.000000
                metric 'cpu_speed' being collected now
                metric 'cpu_speed' has value_threshold 10.000000
                metric 'pkts_out' being collected now
         ********** pkts_out:  9779.381265
                metric 'pkts_out' has value_threshold 10.000000
                metric 'swap_free' being collected now
                metric 'swap_free' has value_threshold 10.000000
                metric 'ps' being collected now
                metric 'ps' has value_threshold 5.000000
                metric 'queue-state' being collected now
                metric 'queue-state' has value_threshold 5.000000
                sent message 'location' of length 64 with 0 errors
                sent message 'load_one' of length 56 with 0 errors
                sent message 'mem_total' of length 60 with 0 errors
                sent message 'cpu_intr' of length 56 with 0 errors
                sent message 'proc_run' of length 56 with 0 errors
                sent message 'load_five' of length 60 with 0 errors
                sent message 'disk_free' of length 64 with 0 errors
                sent message 'mem_cached' of length 60 with 0 errors
                sent message 'mtu' of length 52 with 0 errors
                sent message 'cpu_sintr' of length 60 with 0 errors
                sent message 'pkts_in' of length 56 with 0 errors
                sent message 'bytes_in' of length 56 with 0 errors
                sent message 'bytes_out' of length 60 with 0 errors
                sent message 'swap_total' of length 60 with 0 errors
                sent message 'mem_free' of length 56 with 0 errors
                sent message 'load_fifteen' of length 60 with 0 errors
                sent message 'boottime' of length 56 with 0 errors
                sent message 'cpu_idle' of length 56 with 0 errors
                sent message 'cpu_aidle' of length 60 with 0 errors
                sent message 'cpu_user' of length 56 with 0 errors
                sent message 'cpu_nice' of length 56 with 0 errors
                sent message 'sys_clock' of length 60 with 0 errors
                sent message 'mem_buffers' of length 60 with 0 errors
                sent message 'cpu_system' of length 60 with 0 errors
                sent message 'part_max_used' of length 64 with 0 errors
                sent message 'disk_total' of length 64 with 0 errors
                sent message 'heartbeat' of length 60 with 0 errors
                sent message 'mem_shared' of length 60 with 0 errors
                sent message 'machine_type' of length 64 with 0 errors
                sent message 'cpu_wio' of length 56 with 0 errors
                sent message 'proc_total' of length 60 with 0 errors
                sent message 'cpu_num' of length 56 with 0 errors
                sent message 'cpu_speed' of length 60 with 0 errors
                sent message 'pkts_out' of length 56 with 0 errors
                sent message 'swap_free' of length 60 with 0 errors
                sent message 'ps' of length 52 with 0 errors
                sent message 'queue-state' of length 68 with 0 errors
        Processing a metric value message from compute-2-7.local
        ***Allocating value packet for host--compute-2-7.local-- and metric 
--queue-job-3167478-- ****
        
                metric 'ps' being collected now
                metric 'ps' has value_threshold 5.000000
                metric 'queue-state' being collected now

This does not change even when I restart the gmond's on all nodes. There
should be a wave of updates after that, no?

> Any other system-level issues on the server?

Not that I am aware of. I am not sure about the ganglia installation
though. I am using the installation that came with cluster rocks, and
problems might arise on this side. But I guess not, because nobody using
rocks seems to have this before (and I didn't either, in an older
version of rocks/ganglia).

> Assuming you are using multicast, would it be possible for you to setup
> gmetad to poll _another_ gmond to see if the issue persists?

Euh, in general yes, but how would I need to configure such a setup? Do
you mean that I set up 2 gmond's on the same machine - one running with
current config, another one polling the first? Sorry, I don't really get
it...

> What OS and arch are you running Ganglia on?

Cluster Rocks 5.3, CentOS 5.3, Linux 2.6.18-164.11.1.el5_lustre.1.8.3,
i386

Cheers,
Arne


------------------------------------------------------------------------------
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] gmond hangs every time

Reply via email to