I did the updates ... No joy....
I know I updated to 2.5.5 for the webfront end, but the footer says 2.5.4
(Actually, I pulled the tarball, exploded it... and checked the conf.php file...
<?php
# $Id: conf.php,v 1.16 2003/07/29 23:55:01 sacerdoti Exp $
#
# Gmetad-webfrontend version. Used to check for updates.
#
$majorversion = 2;
$minorversion = 5;
$microversion = 4;
So, they never updated the microversion ....
naughty...)
So, I've got:
webfronted 2.5.5-1
gemetad 2.5.6
gmond 2.5.6-1
The old gmonds.... do not have this issue...
What else did I change .
Here's a diff from one of old compute nodes ... vs new
# $Id: gmond.conf,v 1.3 2004/01/20 19:15:23 sacerdoti Exp $
---
> # $Id: gmond.conf,v 1.2 2002/09/19 00:37:18 sacerdoti Exp $
11c11
< name "K-Cluster"
---
> name "Linux Compute"
17c17
< owner "Schlumberger"
---
> owner "Denver WesternGeco SLB"
23a24
> latlong "N39.75 W104.87"
28a30
> url "http://ddclx01.denver.nam.slb.com/"
45c47
< # mcast_if eth1
---
> mcast_if eth0
69c71,73
< 1xx.1xx.147.179
---
> # 2.3.2.3 3.4.3.4 5.6.5.6
note: ips are x'd by me...
> trusted_hosts 1xx.2xx.6.201 1xx.2xx.12.151 192.168.1.1
> #trusted_hosts 192.168.1.1
74c78
< # num_nodes 1024
---
> num_nodes 128
99c103
< #no_setuid on
---
> # no_setuid on
113,120c117
< # rpr - on temporarily ... till where gmetad server will live is decided.
< all_trusted on
< #
< # If you want dead nodes to "time out", enter a nonzero value here. If
specified,
< # a host will be removed from our state if we have not heard from it in this
< # number of seconds.
< # default: 0 (immortal)
< host_dmax 3600
---
> # all_trusted on
Ok, so num_nodes is different ... i.e. defautls to 1024 ..
If this is truely a cluster metric (not a grid metric) that should not matter...
I added the host_dmax ... Since I've never seen ganglia display a host as
down....
I think I need to start using deaf as well.
Bernard Li wrote:
Hi Ron:
I would actually try to use consistent versions for both gmetad and
gmond (and the webfrontend too but I don't think it has been updated
recently).
I have tried to use mis-matching versions before and it seems okay, but
I guess it's best to keep things consistent to eliminate all the
possibilities...
Cheers,
Bernard
-----Original Message-----
From: Ron Reeder [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 17, 2004 12:58
To: Bernard Li
Cc: [email protected]
Subject: Re: [Ganglia-general] Jittery Displays - number of
nodes changing erratically.
gmond was at
Bernard Li wrote:
Hey Ron:
Which version did you upgrade from?
gmond 2.5.1
BUT, ... making it more interesting ....
A co-worker installed a new Ganglia setup (server/clients) in England.
He's seeing the same thing... on different versions of server
backend/frontend httpd software.
Site - front - back
Denver 2.5.1 - 2.5.5
Gatwick 2.5.4 - 2.5.6
all with gmond 2.5.6-1 are seeing this "jittery" issue...
Ok, I'll upgrade the Denver center to latest - see if that
doesn't help.
I kept my Ganglia web server at: gmetad:
I have upgraded from a previous version without any problems...
2.5.4...?
Cheers,
Bernard
-----Original Message-----
From: Ron Reeder [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 17, 2004 11:37
To: Bernard Li
Cc: [email protected]
Subject: Re: [Ganglia-general] Jittery Displays - number of nodes
changing erratically.
Yes,
I've been running Ganglia - for well over a year... no
problems, after
initial install.
I've upgraded several times... again no biggie....
Bernard Li wrote:
Hi Ron:
Did you recently upgrade from an older version of Ganglia?
This is
really an odd behaviour...
Cheers,
Bernard
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On
Behalf Of Ron
Reeder
Sent: Thursday, June 17, 2004 11:15
To: [email protected]
Subject: [Ganglia-general] Jittery Displays - number of
nodes changing
erratically.
Sirs,
With new gmond 2.5.6-1 - We are getting 'jittery' displays
- where the
number of nodes and number of CPU's is varying wildly on
the 'Overview
of <Cluster>' page.
The summed LOAD and MEM charts are particularly bad .
Yes, when ever I go to the page is always shows: 82 hosts
(164 CPUs) up and running none down.
I do have the value:
host_dmax 3600
in gmond.conf
'Cause it seems that Ganglia _NEVER_ thinks hosts die....
(Maybe a seperate problem)
How could the node/CPU lines graph as horrible zig-zags (not
horizontal-lines as they should) Yet, the host count is
always the
same?
Chart is attached gif file.