from:"Martin Knoblauch"

Re: [Ganglia-developers] Re: [Ganglia-general] Ganglia issues I've been experiencing

2005-03-16 Thread Martin Knoblauch

--- Matt Massie [EMAIL PROTECTED] wrote:

 actually.  i just updated gmetad to allow custom RRAs to be defined. 
 i 
 just dropped the code into CVS so if you use the CVS code (which will
 be 
 released as 3.0.1 very soon)... you can specify
 
 RRAs RRA:AVERAGE:0.5:1:240 \
   RRA:AVERAGE:0.5:24:240 \
   RRA:AVERAGE:0.5:168:240 \
   RRA:AVERAGE:0.5:672:240 \
   RRA:AVERAGE:0.5:5760:370
 
 in gmetad.conf to alter the round-robin archive format.  this was a 
 simple feature to add and i know it's in big demand ... no sense
 waiting 
 until later to add it.
 
 forget everything that i wrote below... just use CVS for now or wait
 for 
 3.0.1.  :)
 
 -matt
 
 
Matt,

 I assume above settings are what we are using today?

Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

[Ganglia-general] New ganglia-3.0.2 snapshot

2005-11-02 Thread Martin Knoblauch

Hi,

 the hopefully last snapshot of ganglia-3.0.2 has been downloaded to

http://www.knobisoft.de/ganglia/ganglia-3.0.2.200511021403.tar.gz

 Please test, especially on non-Linux-ia32 platforms. If no serious
regressions show up, this could be 3.0.2.

 Changes compared to 3.0.1 are:

Changes since 20051018:

- More compile fixes for MacOS/Tiger
- Fix references to www.rrdtools.org
- Fix umask screwup for new --pid-file option

Changes since 20051004:

- Bugzilla 72: --pid-file for gmetad/gmond
- Bugzilla 70: Fix Debian /dev2/ weirdness
- Bugzilla 68: gmond now honors location via commandline and config
file.
- Bugzilla 49: Cleanup php for web-frontend
- Bugzilla 27: Let gmetad reconnect to the last good source
- New AIX metrics code

More changes:

- Fix 64-bit core-dumps for disk metrics on Linux
- Lots of compile time watnings
- ...

Cheers
Martin



--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Gmetric Repository offline?

2005-11-03 Thread Martin Knoblauch

Hi Ken,

 I am afraid that we are victim of the MySQL changes at SF - which we
apparently ignored :-(

 Matt, could you contact them and ask about moving our stuff?

Cheers
Martin

--- Kenneth Young [EMAIL PROTECTED] wrote:

 Hi all,
 
 Opening browser to http://ganglia.info/gmetric/
 returns the following error.  Is the page temporarily
 offline?
 
 *Warning*: mysql_connect(): Can't connect to MySQL server on 'mysql' 
 (113) in */home/groups/g/ga/ganglia/htdocs/gmetric/header.php* on
 line *6*
 Could not connect to database
 
 Ken Young
 
 
 ---
 SF.Net email is sponsored by:
 Tame your development challenges with Apache's Geronimo App Server.
 Download
 it for free - -and be entered to win a 42 plasma tv or your very own
 Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] TCP/IP Bad Data

2005-11-03 Thread Martin Knoblauch

Hi,

 hmm. interesting. Was the Bytes-Out the only metric showing problems
at that time? What about Packets-Out.

 Loss of metric data is not unheard of, but only one metric affected is
strange.

 What platform and version (gmond, gmetad web-frontend) are you
running?

Cheers
Martin

--- G. Francisco Perin [EMAIL PROTECTED] wrote:

 I am having a strange issue with ganglia reporting (not reporting)
 network traffic on a high volume web site.  The graphs are reporting
 points of zero (0) data when SAR data is not showing the same
 information.  Its disconcerting because if Ganglia is reporting good
 information then I have a problem.  But all other indications are
 that
 things are fine.  
 
 Here's an example of the SAR data for the interface:
 
 Time Rx Tx
 05:14:00 845.78 945.85
 05:15:00 722.51 816.59
 05:16:00 752.33 840.65
 05:17:00 796.62 886.42
 05:18:00 888.71 990.34
 05:19:00 802.17 891.59
 05:20:00 760.13 851.22
 05:21:00 797.95 908.55
 05:22:00 909.58 1009.56
 
 And attached is a graph from ganglia during the same time period.
 Notice there is a big drop in RX between 5:15-5:20?  
 
 Any ideas what might be causing the chart to do this?  Do you think I
 am
 looking at a real problem or something with ganglia reporting?
 
 --
 cp
 
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] problem with SummaryInfo

2005-11-04 Thread Martin Knoblauch

Hi Branimir,

 those servers look great. What are they? :-)

 Anyway, could you please post the two different gmond.conf files and
the gmetad.conf file?

 I have the impression that the machines in the two groups do not see
each other. At least one machine in each group should see the metrics
of its partner machines. In gmetad.conf you would use that machine as
data source. Basically you should only have two data sources in your
gmetad.conf

 Simple test. Log into one of the servers and do a telnet localhost
gmond-port. It should show you the data of all hosts in that group
(grep for HOST NAME). If it only shows its own data you have found
the problem.

Cheers
Martin

--- Branimir Ackovic [EMAIL PROTECTED] wrote:

 
 Hi,
 
 I configured Ganglia 3.0.1 to monitor Grid site with 4 servers and 8
 nodes. I 
 put it in two groups: AEGIS01-PHY-SCL Core Services and
 AEGIS01-PHY-SCL
 There is problem with summary report. I see only one node in each of
 this 
 sources. I also have problem with grid summary because it use source
 summary.
 
 You can see it on:
 http://se.phy.bg.ac.yu/site/ganglia
 
 How can I configure Ganglia to see all propertly.
 
 All servers have in /etc/gmond.conf:
 
 cluster {
   name = AEGIS01-PHY-SCL Core Services
 }
 
 and all nodes have in /etc/gmond.conf:
 cluster {
   name = AEGIS01-PHY-SCL
 }
 
 There is gmetad and web frontend on one of servers 
 (se.phy.bg.ac.yu/site/ganglia). In /etc/gmetad.conf I put:
 
 data_source AEGIS01-PHY-SCL Core Services1 ce.phy.bg.ac.yu
 data_source AEGIS01-PHY-SCL Core Services2 se.phy.bg.ac.yu
 data_source AEGIS01-PHY-SCL Core Services3 grid.phy.bg.ac.yu
 data_source AEGIS01-PHY-SCL Core Services4 rb.phy.bg.ac.yu
 data_source AEGIS01-PHY-SCL1 wn01.phy.bg.ac.yu
 data_source AEGIS01-PHY-SCL2 wn02.phy.bg.ac.yu
 data_source AEGIS01-PHY-SCL3 wn03.phy.bg.ac.yu
 data_source AEGIS01-PHY-SCL4 wn04.phy.bg.ac.yu
 data_source AEGIS01-PHY-SCL5 wn05.phy.bg.ac.yu
 data_source AEGIS01-PHY-SCL6 wn06.phy.bg.ac.yu
 data_source AEGIS01-PHY-SCL7 wn07.phy.bg.ac.yu
 data_source AEGIS01-PHY-SCL8 wn08.phy.bg.ac.yu
 
 and 
 
 gridname AEGIS01 PHY SCL
 
 -
 Branimir Ackovic
 E-mail: [EMAIL PROTECTED]
 Web: http://scl.phy.bg.ac.yu/
 
 Phone: +381 11 3160260, Ext. 152
 Fax: +381 11 3162190
 
 Scientific Computing Laboratory
 Institute of Physics, Belgrade
 Serbia and Montenegro
 -
 
 
 ---
 SF.Net email is sponsored by:
 Tame your development challenges with Apache's Geronimo App Server.
 Download
 it for free - -and be entered to win a 42 plasma tv or your very own
 Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] problem with SummaryInfo

2005-11-07 Thread Martin Knoblauch

Hi Branimir,

 apparently Rick pushed you into the right direction already :-) Just a
few comments

Martin

--- Branimir Ackovic [EMAIL PROTECTED] wrote:

 
 Thank You Rick and Martin for quick response!
 
 I allready tried configuration that Rick suggest, but it doesn't
 work. In that configuration I see only one node per data_source
 (the last one). One week ago,  Michael Chang helped me to solve
 problem with this configuration:
 
 data_source AEGIS01-PHY-SCL1 147.91.83.201
 data_source AEGIS01-PHY-SCL2 147.91.83.202
 data_source AEGIS01-PHY-SCL3 147.91.83.203
 ...
 
 If I understand, Martin suggest that I need two machines with
 gmetad (one for each data_source). Now I have gmetad only on
 server with web frontend
 
 (se.phy.bg.ac.yu). 


 that is totally fine. You only need one gmetad running. Your problem
was that the nodes *within* your two *clusters* did not communicate
correctly. MCs setup allowed you to query each node individually, but
you lost the cluster concept that way.

 It is true that  the machines in the two groups do not see each
 other. Even in  same group. I tried:
 
 [EMAIL PROTECTED] root]# telnet localhost 8649 | grep grid
 Connection closed by foreign host.
 [EMAIL PROTECTED] root]#  
 
 Both machines ce and grid are in the same data_source with same
 gmond.conf files. As you said, Martin, I found the problem, but
 I don't found solution 
 for them. :(


 That was the most important step :-). Your gmond.conf files look
like a multicast setup, but apparently sometning went wrong. Possible
causes:

- no route for the multicast IP
- your switch does not like IGMP
- also, both of your clusters were talking on the same port. This can
be a problem with MC.

 So, going unicast is the right way to go in my opinion. Advantages
are:

- your networking infrastructure will not screw you up
- less network traffic. In a working multicast network you will have
N*N messages going around. In a large cluster that can be a lot of
traffic just for Ganglia.

Cheers
Martin


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] windows gmond client

2005-11-07 Thread Martin Knoblauch

Hi Richard,


--- [EMAIL PROTECTED] wrote:

 All,
 
 against all probability, but for reasonable historical reasons, we
 run windows based HPC applications.

 What kind of HPC stuff is a financial institution running? Just
curious :-)

 
 If we were to cheat, and create a windows agent that only produced
 the XML via the tcp interface, and not the udp niceness, can anyone
 give me an idea of how this will scale? This obviously moves
 more work to gmetad. Will gmetad poop with 5 data sources, 100?


 Not knowing the Cygwin implementation at all, but what is wrong with
using the unicast TCP setup. Just select one or two nodes per *cluster*
to run gmond in TCP receive mode and let all other nodes send data to
them. Use the selected node(s) as data source for gmetad. Much better
network usage compared to the multicast mode, which produces traffic
going up with N*N. And you don't have to worry about switches blocking
IGMP traffic.

 5 Datasources schould be no problem for gmetad. I have no idea about
100 or more.

 Can someone suggest something clever to get windows node producing
 ganglia data in a lightweight way?

 This likely needs a native client.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] gmond configuration question

2005-11-07 Thread Martin Knoblauch

Hi Prakash,

 basically what you describe is the expected behaviour. Without the
extra routing information, the multicast packet will be sent through
the default gateway interface, which is eth0 for all three groups.
As a result group 2 and 3 end up disconnected from group 1.

 You should use a different extra route though. Do a:

% route add -host 239.2.11.71 dev eth1

That will keep the default routes for group 2 and 3, but all packets
from the gmond multicast group will go through eth1. This is, btw.,
in the FAQ.

 Another solution is to drop multicast and move to unicast
communication. Select one or two of your nodes as gmond receivers and
have the following directives in gmond.conf (you need 3.0.1 for that):

udp_send_channel {
  host = 192.168.2.X
  port = 8649
}
udp_send_channel {
  host = 192.168.2.Y
  port = 8649
}
udp_recv_channel {
  port = 8649
}


 The nodes X and Y will then have all information from the other nodes
and can be used as redundant data sources for gmetad.

Hope this helps
Martin

--- Prakash Velayutham [EMAIL PROTECTED] wrote:

 Hi,
 
 Could someone explain how my configuration directives should be with
 the 
 following setup?
 
 Total of 18 compute nodes
 
 5 compute nodes with eth0 connected to a switch. (192.168.2.*
 network)
 6 compute nodes with eth1 connected to this switch (192.168.2.*
 network) 
 and eth0 connected to a different switch (10.1.21.* network)
 7 compute nodes with eth1 connected to this switch (192.168.2.*
 network) 
 and eth0 connected to a different switch (10.1.74.* network)
 
 The routing table on each group of nodes looks like this
 
 Group 1:
 Destination Gateway Genmask Flags Metric Ref   
 Use 
 Iface
 192.168.2.0 0.0.0.0 255.255.255.0   U 0  0   
 0 
 eth0
 169.254.0.0 0.0.0.0 255.255.0.0 U 0  0   
 0 
 eth0
 127.0.0.0   0.0.0.0 255.0.0.0   U 0  0   
 0 lo
 0.0.0.0 192.168.2.254   0.0.0.0 UG0  0   
 0 
 eth0
 
 Group 2:
 Destination Gateway Genmask Flags Metric Ref   
 Use 
 Iface
 192.168.2.0 0.0.0.0 255.255.255.0   U 0  0   
 0 
 eth1
 10.1.21.0   0.0.0.0 255.255.255.0   U 0  0   
 0 
 eth0
 169.254.0.0 0.0.0.0 255.255.0.0 U 0  0   
 0 
 eth0
 127.0.0.0   0.0.0.0 255.0.0.0   U 0  0   
 0 lo
 0.0.0.0 10.1.21.1   0.0.0.0 UG0  0   
 0 
 eth0
 
 Group 3:
 Destination Gateway Genmask Flags Metric Ref   
 Use 
 Iface
 10.1.74.0   0.0.0.0 255.255.255.0   U 0  0   
 0 
 eth0
 192.168.2.0 0.0.0.0 255.255.255.0   U 0  0   
 0 
 eth1
 169.254.0.0 0.0.0.0 255.255.0.0 U 0  0   
 0 
 eth0
 127.0.0.0   0.0.0.0 255.0.0.0   U 0  0   
 0 lo
 0.0.0.0 10.1.74.1   0.0.0.0 UG0  0   
 0 
 eth0
 
 When I set the default configuration for all the nodes (without a 
 mcast_if directive), each of these groups of nodes only show up
 within 
 their subnet, so the collection agent only sees one group of nodes 
 (depending on which node is first in the data_source line for that 
 cluster).
 
 Later I set the configuration for the first group as default and
 changed 
 the configuration for the rest of the nodes by adding an mcast_if
 eth1 
 to the udp_send_channel and udp_recv_channel groups, but still
 the 
 result is the same.
 
 I get the desired result of all nodes multicasting to all the other 
 nodes only when I add the following route to the tables of the nodes
 in 
 group 2  group 3. Is there a reason why and is there a way around
 it. 
 If I do this change to the routing table, I lose the ability to login
 
 directly to a node.
 
 0.0.0.0 192.168.2.254   0.0.0.0 UG0  
 00 eth1
 
 Hoping to get an answer to this rather intriguing issue.
 
 Thanks,
 Prakash
 
 
 ---
 SF.Net email is sponsored by:
 Tame your development challenges with Apache's Geronimo App Server.
 Download
 it for free - -and be entered to win a 42 plasma tv or your very own
 Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 



--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] gmond configuration question

2005-11-08 Thread Martin Knoblauch

Hi Prakash,

 please send gmond.conf (you are using the same file for all three
groups, I suppose) and gmetad.conf. Likely something simple. Unicast
is usually pretty simple to setup. Are you using Version 3.0.1? §.0 had
some problems.

Cheers
Martin

--- Prakash Velayutham [EMAIL PROTECTED] wrote:

 For some reason, only the route solution works for me. The unicast
 packets do not seem to reach the collection agent in the first group
 of
 nodes.
 
 The route solution works though, giving some relief.
 
 Thanks,
 Prakash
 
  Martin Knoblauch [EMAIL PROTECTED] 11/07/05 6:49 PM 
 Hi Prakash,
 
  basically what you describe is the expected behaviour. Without the
 extra routing information, the multicast packet will be sent through
 the default gateway interface, which is eth0 for all three
 groups.
 As a result group 2 and 3 end up disconnected from group 1.
 
  You should use a different extra route though. Do a:
 
 % route add -host 239.2.11.71 dev eth1
 
 That will keep the default routes for group 2 and 3, but all packets
 from the gmond multicast group will go through eth1. This is, btw.,
 in the FAQ.
 
  Another solution is to drop multicast and move to unicast
 communication. Select one or two of your nodes as gmond receivers and
 have the following directives in gmond.conf (you need 3.0.1 for
 that):
 
 udp_send_channel {
   host = 192.168.2.X
   port = 8649
 }
 udp_send_channel {
   host = 192.168.2.Y
   port = 8649
 }
 udp_recv_channel {
   port = 8649
 }
 
 
  The nodes X and Y will then have all information from the other
 nodes
 and can be used as redundant data sources for gmetad.
 
 Hope this helps
 Martin
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

[Ganglia-general] 3.0.2 is released

2005-11-08 Thread Martin Knoblauch

Hi,

 this is to notify you of the release of Ganglia 3.0.2. Below is the
description from SF.

 The homepage still needs to  be updated, but you can download the
tarball. Hopefully RPMs for some platforms will follow soon.

 If you find bugs with 3.0.2, please continue to use the bugzilla
service at:

http://bugzilla.ganglia.info/

Cheers
Martin

-

 The Ganglia Development Team is pleased to announce the release of
Ganglia 3.0.1 (Wilbur) which is available for immediate download from
http://ganglia.info/downloads.php.

 This release is mainly fixing bugs. For a detailed description of the
changes see the Changelog included in the tarball.

 Some of the highlight are:

- New AIX metrics code
- NetBSD support
- --pid-file option for gmond and gmetad
- Old gmond location staments are now handled correcly
- gmond --location now works correctly
- Compile fixes for MacOS Tiger
- Gmond no longer core-dumps on 64-bit Linux platforms
- cpu_wio is now reported correctly
- PHP fixes in the web-frontend
- ...

 The following Bugzilla entries are adresses: 27, 49, 54,62, 63, 68,
70, 72.

 This release has been tested on the following platforms:

- Fedora FC4 / ia32
- SuSe 9.0 / x86_64
- RHEL3 / ia64
- Mac OS Tiger
- Solaris 2.8 / Sparc-64
- AIX 5.2, 5.3

Enjoy
The release team

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

RE: [Ganglia-general] windows gmond client

2005-11-10 Thread Martin Knoblauch


--- [EMAIL PROTECTED] wrote:

 Exactly.
 
 I should have been clearer. The default windows/cygwin client is
 neither correct enough (cygwin's fault) nor provides all the 
 metrics we want (in fact, because some of our farms are not just HPC
 farms, we want some other metrics as well). I remain grateful to
 whoever developed it, none-the-less.


 As I said, I do not know the Cygwin client very well.
 
 I am not a windows man, but we are looking at the possibility of
 developing a fully native (no cygwin) client ourselves. The reason
 for the TCP question is that my feeling was that it would be 
 much easier to produce a native first pass windows gmond client
 deliverying TCP only, rather that all that clever UDP stuff as
 well.


 Not really liking Windows myself,  I believe the contribution of a
native client would be very welcome.
 
 But of course with the TCP route, I have fears of scaling. But there
 is a GEM in Martins reply (and a Doh moment for me), in that I
 assumed that every node would have to be polled by a gmetad to get
 the cluster info. But you remind me this is not so, I can do the 
 structural equivalent of the udp unicast to a head node using TCP
 to a head node, that gmetad then interogates.
 
 Have I got this right guys?


 Unfortunatelly I think the answer is no. I made the mistake to somehow
associate gmond unicast with TCP which is wrong. Communication
between the gmonds in a host group is always UDP. One ore more clients
listen, while all push their data out (either multicast, or unicast).

 But you are right that gmetad only needs to communicate with the heads
of the host groups. This communication is TCP.

 And the other thing for the community is asking whether anyone else
 out there is considering developing a native windows gmond.
 

 not me :-)

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Fwd: Solaris-first page works-selecting drop down buttons fails

2005-11-13 Thread Martin Knoblauch


--- michael chang [EMAIL PROTECTED] wrote:

 
   Because expat has no check target ? :-) Question is how to fix
 that.
  expat is one of the external packages and I do not want to mess
 with
  it if possible.
 
 Maybe ask if upstream will accept a blank check target, have a proper
 set of checks, or put in a ganglia-specific patch that returns true
 on a check call, I suppose...
 

 it is even simpler :-) The expat project has a check target since
1.95.2. Our version is just very old. I have checked in a dummy target
to make the process happy.

Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

RE: [Ganglia-general] Gmetad and rrd

2005-11-21 Thread Martin Knoblauch


--- [EMAIL PROTECTED] wrote:

 So then your implying that gmetad has the 
 intelligence to create it own rrds databases?
 

 Not sure about the intelligence of gmetad (or any other computer
program :-), but it will create the rrds on the fly. There are only two
requirements:

- the root of the rrds tree (defined in gmetad.conf) has to exists
- gmetad needs write permission to it

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Grid with over 4TB mem

2005-11-29 Thread Martin Knoblauch

Hi Alex,

 do the gmond's themselves report the correct size for each host?

 Where do you get the problem? In gmetad, or in the webfrontend?
Those are separate pieces of software.

Cheers
Martin

--- Alex Balk [EMAIL PROTECTED] wrote:

 Hi all,
 
 
 I'm running Ganglia on a grid with thousands of hosts, each with at
 least 4GB RAM installed.
 
 Since aggregating the total amount of RAM exceeds 4294967296 (the max
 value of uint32), I'm getting incorrect data on the total memory in
 the
 grid.
 
 
 I've peeked at gmond and gmetad code and adding a uint64 doesn't seem
 trivial.
 
 I've also searched the Net for anyone that's possibly encountered
 this
 before and came up only with this:
 
 

http://sourceforge.net/mailarchive/forum.php?forum_id=9584max_rows=25style=nestedviewmonth=200312
 
 
 It mentions that Ganglia 3.x should solve the problem. I'm running
 3.0.2, but still experiencing it. I've also failed to find any method
 for changing collection of memory metrics from KB to MB, other than
 modifying the source code, which I'd rather avoid so as to keep as
 close
 to the original tree as possible.
 
 
 My questions are:
 
 1. Is there really a solution in Ganglia 3.x and if so, what is it?
 
 2. If not, are you aware of anyone who's implemented such a solution
 or
 documented the work needed, so I may carry on from there?
 
 
 Note that the gmetad collectors are running on a SuSE x86_64 machine
 and
 were compiled with 64bit libs. Ganglia is deployed in a hierarchy and
 reporting is done via unicast. In essence, this isolates the problem
 to
 the gmetads only, as the gmonds report on groups with less than 4TB
 RAM
 (but it may definitely surface there someday as well).
 
 
 Thanks,
 
 Alex
 
 
 
 
 
 
 ---
 This SF.net email is sponsored by: Splunk Inc. Do you grep through
 log files
 for problems?  Stop!  Download the new AJAX search engine that makes
 searching your log files as easy as surfing the  web.  DOWNLOAD
 SPLUNK!
 http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Unicast issue

2005-11-29 Thread Martin Knoblauch

Markus,

 if you want unicast, I would leave out the bind thing. That is for
multicast, AFAIK.

telnet w.x.y.z 8649

Should give you a correct list of metrices.

Cheers
Martin

--- Markus Törnqvist [EMAIL PROTECTED] wrote:

 Hi!
 
 I'm experiencing the weirdest issue here with unicasting; not even
 the mail archives helped so I hope someone here can give me a hand.
 
 Shouldn't it suffice to have the config file look like this:
 udp_send_channel {
   host = w.x.y.z
   port = 8649
 }
 
 udp_recv_channel {
   bind = w2.x2.y2.z2
   port = 8649
 }
 
 for those parts?
 
 Nothing anywhere that points to multicasts?
 
 Right now, with that kind of configuration, I get an empty result
 set;
 GANGLIA_XML VERSION=3.0.2 SOURCE=gmond
 CLUSTER NAME=unspecified LOCALTIME=1133291540
 OWNER=unspecified LATLONG=unspecified URL=unspecified
 /CLUSTER
 /GANGLIA_XML
 Connection closed by foreign host.
 
 It's somewhat annoying because we can't use multicast really and even
 if
 we did it seems some very faux IPs are sent back, which may be
 another
 error on my part, but irrelevant if it's due to multicasting..
 
 Any help is highly appreciated, thanks!
 
 -- 
 mjt
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Unicast issue

2005-11-29 Thread Martin Knoblauch

Markus,

 that is still a multicast configuration. Remove both binds and the
mcast_join.

 Could you post the complete gmond.conf (IP-censored, if you must)?

Thanks
Martin

--- Markus Törnqvist [EMAIL PROTECTED] wrote:

 
 /* You can specify as many udp_recv_channels as you like as well. */
 udp_recv_channel {
   /*
   mcast_join = 239.2.11.71
   bind = 239.2.11.71
   bind = p.q.r.s
   */
   port = 8649
 }
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Unicast issue

2005-11-30 Thread Martin Knoblauch

Ramon, Markus,

 actually, below one works fine for me. The same config file is used on
all gmond-hosts in the cluster (actually pretty beautiful :-).

- host 172.17.17.103 receives the metrics from all participating
gmonds.
- all other hosts will report empty metrics if queried. If you want
them to report their own metrics, add a upd_send_channel for
localhost.
- host 172.17.33.108 is the only one allowed to query the TCP port.
This is the host where gmetad would be running (no gmond necessary on
this host). If you leave out the acl all hosts may query the TCP
port.

 The bind in the udp_recv_channel maybe needed if you have more than
one network interface and the traffic does not come on the first one.
For the upd-send-channel, no bind should ever be *neccessary*. But I am
really not sure about this.



udp_send_channel {
  host = 172.17.17.103
  port = 9649
}

udp_recv_channel {
  port = 9649
}

tcp_accept_channel {
  acl {
default = deny
access {
  ip = 172.17.33.108
  mask = 32
  action = allow
}
  }
  port = 9649
}
-

Cheers
Martin

--- Ramon Bastiaans [EMAIL PROTECTED] wrote:

 Actually, bind is needed to specify what local ip to bind to and
 listen 
 on in a unicast setup.
 mcast_join is used when listening to multicasting.
 
 However, why are you using 2 different ip adresses in the recv and
 send 
 channel? This will never work.
 You need to set you send channel to the same ip/port as your recv
 channel.
 Else you are sending the information to 1 place and listening for
 that 
 information on another place.
 
 Kind regards,
 - Ramon.
 
 Martin Knoblauch wrote:
 
 Markus,
 
  if you want unicast, I would leave out the bind thing. That is
 for
 multicast, AFAIK.
 
 telnet w.x.y.z 8649
 
 Should give you a correct list of metrices.
 
 Cheers
 Martin
 
 --- Markus Törnqvist [EMAIL PROTECTED] wrote:
 
   
 
 Hi!
 
 I'm experiencing the weirdest issue here with unicasting; not even
 the mail archives helped so I hope someone here can give me a hand.
 
 Shouldn't it suffice to have the config file look like this:
 udp_send_channel {
   host = w.x.y.z
   port = 8649
 }
 
 udp_recv_channel {
   bind = w2.x2.y2.z2
   port = 8649
 }
 
 for those parts?
 
 Nothing anywhere that points to multicasts?
 
 Right now, with that kind of configuration, I get an empty result
 set;
 GANGLIA_XML VERSION=3.0.2 SOURCE=gmond
 CLUSTER NAME=unspecified LOCALTIME=1133291540
 OWNER=unspecified LATLONG=unspecified URL=unspecified
 /CLUSTER
 /GANGLIA_XML
 Connection closed by foreign host.
 
 It's somewhat annoying because we can't use multicast really and
 even
 if
 we did it seems some very faux IPs are sent back, which may be
 another
 error on my part, but irrelevant if it's due to multicasting..
 
 Any help is highly appreciated, thanks!
 
 -- 
 mjt
 
 
 
 
 
 
 --
 Martin Knoblauch
 email: k n o b i AT knobisoft DOT de
 www:   http://www.knobisoft.de
 
 
 ---
 This SF.net email is sponsored by: Splunk Inc. Do you grep through
 log files
 for problems?  Stop!  Download the new AJAX search engine that makes
 searching your log files as easy as surfing the  web.  DOWNLOAD
 SPLUNK!
 http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
   
 
 
 -- 
 ..
 | ing. Ramon Bastiaans   |
 | HPC - Systems Programmer   |
 ||
 | SARA - Computing and Networking Services   |
 | Kruislaan 415   PO Box 194613  |
 | 1098 SJ Amsterdam   1090 GP Amsterdam  |
 ||
 | Mail:  bastiaans ( a t ) sara ( d o t ) nl |
 | Web:   http://www.sara.nl/ |
 | Phone: +31 (0)20 592 80 19 |
 | Fax:   +31 (0)20 668 31 67 |
 `'
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Unicast issue

2005-11-30 Thread Martin Knoblauch

Hi,

 some more info:

- udp_send_channel does not have a bind attribute, just forget my
comment below. Looking at the code sometimes helps. 
- udp_recv_channel: if you specify mcast_join and bind with
different IP adresses, no unicast processing will take place (from the
gmond.conf man page)

 And forget the comment about localhost. It is a bit more complicated
like that

Martin
--- Martin Knoblauch [EMAIL PROTECTED] wrote:

 Ramon, Markus,
 
  actually, below one works fine for me. The same config file is used
 on
 all gmond-hosts in the cluster (actually pretty beautiful :-).
 
 - host 172.17.17.103 receives the metrics from all participating
 gmonds.
 - all other hosts will report empty metrics if queried. If you want
 them to report their own metrics, add a upd_send_channel for
 localhost.
 - host 172.17.33.108 is the only one allowed to query the TCP port.
 This is the host where gmetad would be running (no gmond necessary on
 this host). If you leave out the acl all hosts may query the TCP
 port.
 
  The bind in the udp_recv_channel maybe needed if you have more
 than
 one network interface and the traffic does not come on the first one.
 For the upd-send-channel, no bind should ever be *neccessary*. But I
 am
 really not sure about this.
 
 
 
 udp_send_channel {
   host = 172.17.17.103
   port = 9649
 }
 
 udp_recv_channel {
   port = 9649
 }
 
 tcp_accept_channel {
   acl {
 default = deny
 access {
   ip = 172.17.33.108
   mask = 32
   action = allow
 }
   }
   port = 9649
 }
 -
 
 Cheers
 Martin
 
 --- Ramon Bastiaans [EMAIL PROTECTED] wrote:
 
  Actually, bind is needed to specify what local ip to bind to and
  listen 
  on in a unicast setup.
  mcast_join is used when listening to multicasting.
  
  However, why are you using 2 different ip adresses in the recv and
  send 
  channel? This will never work.
  You need to set you send channel to the same ip/port as your recv
  channel.
  Else you are sending the information to 1 place and listening for
  that 
  information on another place.
  
  Kind regards,
  - Ramon.
  
  Martin Knoblauch wrote:
  
  Markus,
  
   if you want unicast, I would leave out the bind thing. That is
  for
  multicast, AFAIK.
  
  telnet w.x.y.z 8649
  
  Should give you a correct list of metrices.
  
  Cheers
  Martin
  
  --- Markus Törnqvist [EMAIL PROTECTED] wrote:
  

  
  Hi!
  
  I'm experiencing the weirdest issue here with unicasting; not
 even
  the mail archives helped so I hope someone here can give me a
 hand.
  
  Shouldn't it suffice to have the config file look like this:
  udp_send_channel {
host = w.x.y.z
port = 8649
  }
  
  udp_recv_channel {
bind = w2.x2.y2.z2
port = 8649
  }
  
  for those parts?
  
  Nothing anywhere that points to multicasts?
  
  Right now, with that kind of configuration, I get an empty result
  set;
  GANGLIA_XML VERSION=3.0.2 SOURCE=gmond
  CLUSTER NAME=unspecified LOCALTIME=1133291540
  OWNER=unspecified LATLONG=unspecified URL=unspecified
  /CLUSTER
  /GANGLIA_XML
  Connection closed by foreign host.
  
  It's somewhat annoying because we can't use multicast really and
  even
  if
  we did it seems some very faux IPs are sent back, which may be
  another
  error on my part, but irrelevant if it's due to multicasting..
  
  Any help is highly appreciated, thanks!
  
  -- 
  mjt
  
  
  
  
  
  
  --
  Martin Knoblauch
  email: k n o b i AT knobisoft DOT de
  www:   http://www.knobisoft.de
  
  
  ---
  This SF.net email is sponsored by: Splunk Inc. Do you grep through
  log files
  for problems?  Stop!  Download the new AJAX search engine that
 makes
  searching your log files as easy as surfing the  web.  DOWNLOAD
  SPLUNK!
  http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
  ___
  Ganglia-general mailing list
  Ganglia-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/ganglia-general

  
  
  -- 
  ..
  | ing. Ramon Bastiaans   |
  | HPC - Systems Programmer   |
  ||
  | SARA - Computing and Networking Services   |
  | Kruislaan 415   PO Box 194613  |
  | 1098 SJ Amsterdam   1090 GP Amsterdam  |
  ||
  | Mail:  bastiaans ( a t ) sara ( d o t ) nl |
  | Web:   http://www.sara.nl/ |
  | Phone: +31 (0)20 592 80 19 |
  | Fax:   +31 (0)20 668 31 67 |
  `'
  
  
 
 
 --
 Martin Knoblauch
 email: k n o b i AT knobisoft DOT de
 www:   http://www.knobisoft.de

RE: [Ganglia-general] Unicast issue

2005-11-30 Thread Martin Knoblauch

Hi Lawrence,

 

--- [EMAIL PROTECTED] wrote:

 Hi:
 
 My gmetad host is a double NIC machine that runs gmonds and servers
 as the head node to on cluster. Gmond runs on the workers nodes.
 I cannot get the webfrontend to display statistics for the worker
 nodes.


 Do you get any error messages in your webservers logfiles? Is the
webserver the same as the gmetad server?
 
 From the gmetad host I can successfully get output from: telnet node1
 8649, where node1 is a worker node.


This sounds good. gmetad is basically doing the same. 

 On the host running the webserver, can you try to connect to gmetad.
There are two ports. The XML port (default 8651) and the interactive
port (default 8652) that the webfrontend uses.

$telnet gmetad-host 8651
$telnet gmetad-host 8652
quit


 
 Here is my gmond.conf file:
 
 /* global variables */
 globals {
   mute = no
   deaf = no
   debug_level = 0
   setuid = yes
   user=nobody
   gexec = yes
   host_dmax = 3600
 }
 
 /* info about cluster  */
 cluster {
   name = X
   owner = 
   latlong = N37.0303 W76.34
   url=http://xxx.xxx.xxx/web;
 }
 
 /* info about host */
 host {
   location = 
 }
 
 /* channel to send multicast on mcast_channel:mcast_port */
 udp_send_channel {
   mcast_join = 239.2.11.71
   port = 8649
   ttl=1
 /*  mcast_if = eth1 */
 }
 
 /* channel to receive multicast from mcast_channel:mcast_port */
 udp_recv_channel {
   mcast_join = 239.2.11.71
   port = 8649
   bind = 239.2.11.71
 /*  mcast_if = eth1 */
 }
 
 /* channel to export xml on xml_port */
 tcp_accept_channel {
   port = 8649
 /* your trusted_hosts assuming ipv4 mask*/
 acl{
   default=deny
   access {
   ip=10.1.1.0
   mask = 24
   action = allow
   }
   access {
   ip=xxx.xxx.xxx.xxx
   mask = 32
   action = allow
   }
 }
 }
 ..
 ..
 ..
 
 Is it best to use unicast here? I don't understand why this wont
 work.
 Thanks
 

 Hard to what is better. How big is your cluster? Multicast will likely
create more traffic than unicast. Also some switches create trouble for
multicast.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia fails after building oscar cluster.

2005-12-09 Thread Martin Knoblauch

Hi Satish,

 first of all: which version of ganglia?

Cheers
Martin

--- grid computing [EMAIL PROTECTED] wrote:

 Dear All,
 
  We are building and oscar Cluster using oscar 4.0. on redhat
 9.0.  The
 installation goes through fine. Every thing get installed fine. and
 the
 complete cluster works fine. but when open the web browser and and
 check for
 ganglia we are only getting the status of graph of only the head node
 and
 not the compute nodes.
 
 Can any one help us out in this case.
 
 Regards,
 satish
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Rack,rank,plane

2005-12-15 Thread Martin Knoblauch

Stefan,

 plane = Ebene in this case. Just consider it the z-coordinate of
the location.

Martin

--- Stefan Schustereit [EMAIL PROTECTED] wrote:

 Hello all together,
 
 since version 3.0.2 the location parameter is working again, and now
 I 
 want to use it in our gmond configurations.
 
 I know, what a rack is. Yes, I even know, what a rank is. Maybe, I
 come 
 out as an idiot, but what is a plane? All I could find out looking
 into 
 my dictionaries was:
 
 Airplane: fixed-wing aircraft
 
 Uhm, we have our hosts in a data center, not on the airport... is 
 anybody out there to light up my lack of knowledge?
 
 Thanks,
 Stefan
 
 
 -- 
 Mapsolute GmbH
 Stefan Schustereit
 Map24 Systems and Networks
 http://www.map24.com
 
 
 ---
 This SF.net email is sponsored by: Splunk Inc. Do you grep through
 log files
 for problems?  Stop!  Download the new AJAX search engine that makes
 searching your log files as easy as surfing the  web.  DOWNLOAD
 SPLUNK!
 http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] gmond is not reporting network stats

2005-12-28 Thread Martin Knoblauch

Alexei,

 three questions:

- which version of ganglia/gmond are you running? If possible, please
try out 3.0.2.
- are you using the first or the second of the two NICs?
- how are your NICs named? The code drops everything starting with 'l'
or 'o'.

 Unfortunatelly Solaris/AMD64 might not be very well tested?

 You can try to put some debug statements into the extract_if_data
routine.

Cheers
Martin
--- Alexei Rodriguez [EMAIL PROTECTED] wrote:

 Greetings. We have been running ganglia on a set of Linux systems and
 have not had any issues. We are now trying ganglia on Solaris 10 on
 x86 (AMD) systems and the cpu/memory reporting is accurate, but we
 are
 not getting any network interface information.
 
 The systems have 2 network interfaces but we only use 1 of them.
 
 Has anyone come across this problem? Any suggestions?
 
 thanks.
 
 Alexei
 



--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia Not showing all the Graphs Properly,

2006-01-03 Thread Martin Knoblauch

Jai,

 just as an experiment, could you make alle the Image* functions in
that file into all-lowercase?

 e.g. ImageCreate - imagecreate

 The man pages show them that way.

Cheers
Martin

--- Jai Rangi [EMAIL PROTECTED] wrote:

 Hello Martin,
 Here is the error message I am getting for pie chart,
 
 [client client_machine] PHP Fatal error:  Call to undefined function 
 ImageCreate() in /var/www/html/ganglia/pie.php on line 117, referer: 

http://server_name:/ganglia/?c=Linux%20Clusterm=r=hours=descendinghc=4
 
 Thank you so much for your help...
 Jai
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia Not showing all the Graphs Properly,

2006-01-04 Thread Martin Knoblauch

Jai,

 Ramon is right. You are at least missing php-gd. I could reproduce
your problem on my FC4 installation. I did not get the pie charts also.
Installing php-gd via yum and restarting Apache solved the problem. You
may also need gd-devel, which provides /usr/lib/libgd.so.

 Check with php -m or the following small php-web-script:

[EMAIL PROTECTED] html]# cat  /var/www/html/phpinfo.php
?php

// Show all information, defaults to INFO_ALL
phpinfo();

?

Both should show gd support in some way.

 You can forget my E-Mail regarding case-sensitivity. I just found out
that (unlike almost everything else) function names are not
case-sensitive. Weird decision. I know why I do not like PHP that much
...

Cheers
Martin
--- Ramon Bastiaans [EMAIL PROTECTED] wrote:

 That means you are still missing libgd and the php-gd extension, as I
 
 mailed before.
 
 - Ramon.
 
 Jai Rangi wrote:
 
  Hello Martin,
  Here is the error message I am getting for pie chart,
 
  [client client_machine] PHP Fatal error:  Call to undefined
 function 
  ImageCreate() in /var/www/html/ganglia/pie.php on line 117,
 referer: 
 

http://server_name:/ganglia/?c=Linux%20Clusterm=r=hours=descendinghc=4
 
 
 
  Thank you so much for your help...
  Jai
 
 
 
  Martin Knoblauch wrote:
 
  Dear Jai,
 
   good to hear that 3.0.2 fixed most of the problems for you. I am
 not
  sure about the PIE stuff, but there should be some error messages
 in
  the log-files of you r web server. They could give you a hint. And
  there still may be PHP bugs preventing 5.0 to work correctly. As I
  said, we want to know.
 
  Cheers
  Martin
 
  --- Jai Rangi [EMAIL PROTECTED] wrote:
 
   
 
  Thanks Martin,
  Upgrading to 3.0.2 worked just fine. Still missing PIE though,
 but I 
  guess I am missing some package for that...
 
  Thank you so much,
  -Jai
 
  Martin Knoblauch wrote:
  
 
 
 
  --
  Martin Knoblauch
  email: k n o b i AT knobisoft DOT de
  www:   http://www.knobisoft.de

 
 
 -- 
 There are really only three types of people:
 
   Those who make things happen,
those who watch things happen,
and those who say, What happened?
 
 ---
 ing. R. Bastiaans
 HPC  - Systems Programmer
 
 SARA - Computing and Networking Services
 Kruislaan 415  PO Box 194613
 1098 SJ Amsterdam  1090 GP Amsterdam
 
 
 
 ---
 This SF.net email is sponsored by: Splunk Inc. Do you grep through
 log files
 for problems?  Stop!  Download the new AJAX search engine that makes
 searching your log files as easy as surfing the  web.  DOWNLOAD
 SPLUNK!
 http://ads.osdn.com/?ad_id=7637alloc_id=16865op=click
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

RE: [Ganglia-general] PHP front end: has anyone modified the load metric color / computation?

2006-01-04 Thread Martin Knoblauch

Alexei,

 Richard seems to be closer to the solution. The problem is the
definition of the funtion load_color in functions.php. Everything
above a load of 1.0 is considered to be a problem case. Same with the
function load_image. It would likely make sense to introduce a
scaling variable in conf.php (default 1.0) and work that into the two
functions. Can you play a bit around and show us the code that makes
you happy?

 The problem is that the threshold for high load is very subjective. On
a HPC Machine everything above 1 (per CPU or core) is likely bad. For a
web/file/database server, this might be totally different.

Cheers
Martin

--- [EMAIL PROTECTED] wrote:

 Of you could hack the load value itself by dividing by 5 in
 cluster_view.php.
  
 regards,
 richard
  
 p.s.
 this is a bit yuk, but is certainly easy.
 
   -Original Message-
   From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of
 Alexei
 Rodriguez
   Sent: 04 January 2006 07:05
   To: ganglia-general@lists.sourceforge.net
   Subject: [Ganglia-general] PHP front end: has anyone modified
 the load metric color / computation?
   
   
   Greetings. First off, I want to say that ganglia rocks. It has
 been a very valuable tool in the short time we have had it deployed,
 and
 we are only using the very basic things.
   
   The load on our systems tends to be high (5.0 and above), on
 Solaris 10 systems (on AMD Opteron servers). The problem is that the
 graphs being generated are all of the same color (bright, bloody
 red).
 Given that all the systems have such high (relative) loads, I wanted
 to
 see what the best way of changing the PHP front end to reflect my
 local
 colors and load scheme.
   
   If I change $load_colors in php.conf, such that the number
 ranges are multiplied by 5x, would that work or is there a better
 way?
   
   I just want to make sure that the solution I implement does not
 make upgrades difficult :)
   
   
   thanks!
   
   
   Alexei
   
   
 
 
 


 For more information about Barclays Capital, please
 visit our web site at http://www.barcap.com.
 
 
 Internet communications are not secure and therefore the Barclays 
 Group does not accept legal responsibility for the contents of this 
 message.  Although the Barclays Group operates anti-virus programmes,
 
 it does not accept responsibility for any damage whatsoever that is 
 caused by viruses being passed.  Any views or opinions presented are 
 solely those of the author and do not necessarily represent those of
 the 
 Barclays Group.  Replies to this email may be monitored by the
 Barclays 
 Group for operational or business reasons.
 


 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] PHP front end: has anyone modified the load metric color / computation?

2006-01-04 Thread Martin Knoblauch

--- Alexei Rodriguez [EMAIL PROTECTED] wrote:

 These changes accomplish what I was looking for. Thanks! Now I don't
 have a sea of red that my users ask me about ;)
 
 I do think this is a good knob to have. Thank you very much!
 
 Alexei
 
Alexei,

 good. This will be in 3.0.3.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] intermittent blanks in graphs

2006-01-24 Thread Martin Knoblauch


--- Ben Hartshorne [EMAIL PROTECTED] wrote:

 Hi,
 
 I have been running ganglia for most of the last year, quite happily.
 My hosts are configured to send unicast data to a single gmetad
 server.
 
 Recently, large portions of the cluster's graphs are empty.  A sample
 
 Any thoughts?  What logs should I be looking at?  


 just a thought - are your cluster nodes time-synched? Are they [still]
in-synch?
 
 
 [*] for those interested - I added an 8-hour and 3-day view; I find
 the
 8-hour view the most useful by far.  I also changed the size of the
 graphs to fit my 20 screen.  Finally, I added a Disk summary graph,
 in
 addition to the Load, CPU, Memory, and Network.  Is there any
 interest
 in patching these into the source?
 

 definitely. Could you post a diff -u patch, preferably against
3.0.2?

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia on Irix: gmond only

2006-01-24 Thread Martin Knoblauch

Luc,

 unfortunatelly, you need apr. That is why the code is shipped with
ganglia and build. apr is the Apache Portable Runtime library that
is used all over the code. Just grep for apr_ in the gmond
directory.

 You do not mention what problems you have building it. Without that we
cannot help you.

Regards
Martin
PS: In which line did you add the fcntl-include for irix/metrics.c ?

--- Luc Gauthier [EMAIL PROTECTED] wrote:

 Hi all,
 
 I just downloaded the source of Ganglia v3.0.2 and I'm planning to
 set
 it up over a couple of machines we have here. One of these is a SGI
 box
 running Irix 6.5.27, and I only want to have gmond running on it. All
 the other stuff will be running on a linux machine.
 
 I tried to compile the source out-of-the-box but got a couple of
 errors.
 The first one was easily fixed by following a tip given by Rene
 Salmon
 on the ganglia-developers mailing list a couple of months ago:
 
 --
  I justs tried compiling ganglia-3.0.1 on Irix 6.5.27.
  All we really want is gmond we will run gmetad, www, and other
 stuff
 on a better supported box running linux.
  So we did the configure and make for just gmond on the Irix box.
  The make failed
  so I added this line to
 ganglia-3.0.1/srclib/libmetrics/irix/metrics.c
#include fcntl.h
 --
 
 The make went a litte further but broke when trying to compile 'apr'.
 Now I don't need apr. As I was saying, I only want gmond. Rene Salmon
 says, in the cited message So we did the configure and make for just
 gmond on the Irix box. I would like to do the same but unfortunately
 don't know how.
 
 Can anyone give me a hint ?
 
 Thanks in advance for your help and have a good day,
 Luc Gauthier
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] intermittent blanks in graphs

2006-01-25 Thread Martin Knoblauch

Hi Ben,

 see below. In any case, could you please open a case in bugzilla and
assign it to me?

Martin

--- Ben Hartshorne [EMAIL PROTECTED] wrote:

 
 Everyone,
 
 thanks very much for your suggestions.  I've replied to each below.
 
 
 On Tue, Jan 24, 2006 at 04:16:08AM -0800, Martin Knoblauch wrote:
   just a thought - are your cluster nodes time-synched? Are they
 [still]
  in-synch?
 
 to within a second or so.  I also have several gmetrics that are
 running
 at a 2-min interval, and they exhibit the same behavior.  I would be
 suprised to see them reporting the same second, 2 minutes apart...


 OK. That seems clean.

 [snip]
 
 
 On Tue, Jan 24, 2006 at 04:46:50PM -0500, Rick Mohr wrote:
  Also, you could use rrdtool to generate the exact same graph that
 is shown 
  on the web page for one of these metrice and dump it straight into
 a file.  
  Then you could compare that with the image seen on the web page (to
 check 
  for the unlikely event that the generated image if fine, but the
 web server 
  is messing something up).
 
 hmm... that's a good suggestion.  
 
 Here's an excerpt from 'rrdtool dump':
 
 !-- 2006-01-24 17:36:45 PST / 1138153005 -- rowv
 9.315467e+00 /v/row
 !-- 2006-01-24 17:37:00 PST / 1138153020 -- rowv
 8.80e+00 /v/row
 !-- 2006-01-24 17:37:15 PST / 1138153035 -- rowv
 8.80e+00 /v/row
 !-- 2006-01-24 17:37:30 PST / 1138153050 -- rowv
 8.80e+00 /v/row
 !-- 2006-01-24 17:37:45 PST / 1138153065 -- rowv
 8.80e+00 /v/row
 !-- 2006-01-24 17:38:00 PST / 1138153080 -- rowv NaN /v/row
 !-- 2006-01-24 17:38:15 PST / 1138153095 -- rowv NaN /v/row
 !-- 2006-01-24 17:38:30 PST / 1138153110 -- rowv NaN /v/row
 !-- 2006-01-24 17:38:45 PST / 1138153125 -- rowv NaN /v/row
 !-- 2006-01-24 17:39:00 PST / 1138153140 -- rowv NaN /v/row
 
 Correspondingly, in the graph seen through ganglia, the data ends
 about
 17:38.  I'm suprised it's registering these things every 15 seconds! 
 I
 thought the period was slower than that (every min).
 
 I checked a few other rrds at different resolutions, and the NaN
 sections do correspond to the blank parts.
 
 So what does it mean?  This tells us that the data is not getting put
 into the rrds.  We know that the values are getting to the collector
 host, because clicking on the 'gmetric' portion of the website shows
 current data.  But that data is not making it into the RRD somehow...
 
 I thought maybe the RRDs had become corrupted somehow, so tried out
 moving the rrds out of place so ganglia would recreate them all.  The
 symptom was still in evidence.
 
 
 I don't see that error message, but while looking for it, I did see
 this
 error message:
 
 Jan 24 17:24:18 localhost /usr/sbin/gmetad[30443]: RRD_update
 (/var/lib/ganglia/rrds/production/raiden-8-db1/users.rrd): conversion
 of
 'min,' to float not complete: tail 'min,'
 
 This seems to relate to a recent change I made that I had forgotten
 about.  :)  I added the following line to my crontab:
 
 */2 * * * * /usr/bin/gmetric --name=users --value=`w | head -1 |
 awk '{print $6}'` --type=int16
 
 The purpose of this line is to create a graph representing the number
 of
 logged in users to the host.  it seems right to me - do any of you
 see a
 problem with this line?
 

 Not sure. How does the live users metric from gmond look like?
Definitely an interesting coincidence.

 In any case, we need to look into how gmetad operates with rrdtool.
Unfortunatelly, I am more the gmond guy.

 Most importatn, we need to find out what triggers the behaviour.
Thanks for your patience.

 
 
 In the course of this investigation, I have come across another
 stange
 happening.  Some of the metrics seem to be ... off.  I have no idea
 if
 these things are related. I was suprised to notice that many of my
 servers show excessive time in the CPU_report graph as having all
 their
 time spent in CPU Wait.  That didn't seem right and also didn't jive
 with the output of vmstat.  Looking at the individual metrics that
 make
 up the cpu_report, I see:
 
 * cpu_aidle: 1388
 * cpu_idle: 66.00
 * cpu_nice: 0.00
 * cpu_system: 2.30
 * cpu_user: 31.70
 * cpu_wio: 1388
 
 All 6 of these metrics are supposed to be percentages.  What's up
 with
 1,388?  Bouth cpu_aidle and cpu_wio are linearly decreasing graphs
 with
 the same slope (and same current value).  They look to be the same
 back
 into the shown history, but it's hard to be exact.  This seems to be
 the
 case (with different current values) on a number of hosts.  
 
 Two .pngs of hosts exhibiting this behavior are at
 http://cryptio.net/~ben/ganglia/host_report.png and
 http://cryptio.net/~ben/ganglia/host_report2.png
 
 Note that these stats are all created since I moved the old files out
 of
 place earlier today, so there is no chance of left over corruption.  
 
 Are my hosts dying?  restarting gmond on the host seems to have no
 effect.
 
 Would it be possible to create this kind of error by upgrading the
 server to gmetad 3.0.2

Re: [Ganglia-general] intermittent blanks in graphs

2006-01-25 Thread Martin Knoblauch


--- Martin Knoblauch [EMAIL PROTECTED] wrote:

  error message:
  
  Jan 24 17:24:18 localhost /usr/sbin/gmetad[30443]: RRD_update
  (/var/lib/ganglia/rrds/production/raiden-8-db1/users.rrd):
 conversion
  of
  'min,' to float not complete: tail 'min,'
  
  This seems to relate to a recent change I made that I had forgotten
  about.  :)  I added the following line to my crontab:
  
  */2 * * * * /usr/bin/gmetric --name=users --value=`w | head -1 |
  awk '{print $6}'` --type=int16
  
  The purpose of this line is to create a graph representing the
 number
  of
  logged in users to the host.  it seems right to me - do any of you
  see a
  problem with this line?
  
 
  actually, on my system (FC4) your command results in:
 
 $ w | head -1 | awk '{print $6}'
 users,
 $
 
  which is not really what you want to put into that metric :-)
 Apparently yours seem to report min, which would be $4 on my
 system. The number of users would be $5. Maybe different versions
 of
 procps?
 
  Hmm. Weird. Just played around with the setting of LANG and not
 the
 command reports load instead of users,. Really weird .
 

 ha !!! The format of the w output changes with the uptime. The
position of the #users definitely flows around. Guess you need to work
on the awk. You need to look for users and take the token before
that.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia on Irix: gmond only

2006-01-25 Thread Martin Knoblauch

Hi Luc,

 yes, the native IRIX does not support --version. Actually, using
gmake is the right thing to do.

 Your toolchain seems older than mine (automake-1.9.5, autoconf-2.59,
libtool-1.5.20), but newer than the recommended (1.6.3, 2.53, 1.4.2).
What is the version of libtool?

 In any case, what worries me are the syntax errors from configure.
Maybe you can check what they are about.

 Configuring expat ...
 
 configure: loading cache
 /home/master/shared/ganglia_src/ganglia-3.0.2/config.cache
 ./configure[1347]: syntax error at line 157 : `(' unexpected
 
 Configuring apr ...
 
 configure: loading cache
 /home/master/shared/ganglia_src/ganglia-3.0.2/config.cache
 ./configure[1398]: syntax error at line 157 : `(' unexpected
 
 Configuring libconfuse ...
 
 configure: loading cache
 /home/master/shared/ganglia_src/ganglia-3.0.2/config.cache
 ./configure[1428]: syntax error at line 157 : `(' unexpected

 Also, could you reproduce a list of files with those wrong pathes
(after configure).

 And no, IRIX 6.5.24m vs. 6.5.27m should not make a difference.

Thanks
Martin

--- Luc Gauthier [EMAIL PROTECTED] wrote:

 Hi Martin,
 
 Quite surprisingly, I was unable to determine the version of 'make'
 that
 is installed on the machine. There is indeed no option or way to get
 the
 version. So I guess we could describe it as the version of make that
 comes with Irix 6.5.24m.
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] intermittent blanks in graphs

2006-01-25 Thread Martin Knoblauch


--- Ben Hartshorne [EMAIL PROTECTED] wrote:

 
 Jan 24 17:24:18 localhost /usr/sbin/gmetad[30443]: RRD_update
 (/var/lib/ganglia/rrds/production/raiden-8-db1/users.rrd): conversion
 of
 'min,' to float not complete: tail 'min,'
 
 This seems to relate to a recent change I made that I had forgotten
 about.  :)  I added the following line to my crontab:
 
 */2 * * * * /usr/bin/gmetric --name=users --value=`w | head -1 |
 awk '{print $6}'` --type=int16
 

 OK, as I discovered before, your command can put funny things like
min, into the metrics stream. Unfortunatelly, gmetric or gmond are
stupid enough to handle that.

 I can now kind of reproduce your problem by inserting the following
into the stream:

gmetric --name=users --type=int16 --value=min,

 This appears then in both the gmond and gmetad XML. As a result, the
report graphs on my cluster view show the gaps. As soon as I insert a
number into the stream, the graphs work fine.

 But - I only see the gaps in the cluster overview. The node displays
are not affected (both in the cluster overview and on the node pages).

 Seems we need to make gmetric or gmond more robust against junk. Or we
need to see what the problem in the web interface is. Or both :-)

Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

RE: [Ganglia-general] intermittent blanks in graphs

2006-01-25 Thread Martin Knoblauch

As you wish. You are old fashioned :-)

Martin

--- [EMAIL PROTECTED] wrote:

 Call me old fashioned, but:
 
 who | wc -l | awk '{print $1}'
 
 strikes me as safer
 
 regard,
 richard
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] gmond not multicasting to other nodes

2006-01-25 Thread Martin Knoblauch

 
 value_threshold = 1.0 
   } 
   metric { 
 name = cpu_sintr 
 value_threshold = 1.0 
   } 
   */ 
 } 
 
 collection_group { 
   collect_every = 20 
   time_threshold = 90 
   /* Load Averages */ 
   metric { 
 name = load_one 
 value_threshold = 1.0 
   } 
   metric { 
 name = load_five 
 value_threshold = 1.0 
   } 
   metric { 
 name = load_fifteen 
 value_threshold = 1.0 
   }
 } 
 
 /* This group collects the number of running and total processes */ 
 collection_group { 
   collect_every = 80 
   time_threshold = 950 
   metric { 
 name = proc_run 
 value_threshold = 1.0 
   } 
   metric { 
 name = proc_total 
 value_threshold = 1.0 
   } 
 }
 
 /* This collection group grabs the volatile memory metrics every 40
 secs and 
sends them at least every 180 secs.  This time_threshold can be
 increased 
significantly to reduce unneeded network traffic. */ 
 collection_group { 
   collect_every = 40 
   time_threshold = 180 
   metric { 
 name = mem_free 
 value_threshold = 1024.0 
   } 
   metric { 
 name = mem_shared 
 value_threshold = 1024.0 
   } 
   metric { 
 name = mem_buffers 
 value_threshold = 1024.0 
   } 
   metric { 
 name = mem_cached 
 value_threshold = 1024.0 
   } 
   metric { 
 name = swap_free 
 value_threshold = 1024.0 
   } 
 } 
 
 collection_group { 
   collect_every = 40 
   time_threshold = 300 
   metric { 
 name = bytes_out 
 value_threshold = 4096 
   } 
   metric { 
 name = bytes_in 
 value_threshold = 4096 
   } 
   metric { 
 name = pkts_in 
 value_threshold = 256 
   } 
   metric { 
 name = pkts_out 
 value_threshold = 256 
   } 
 }
 
 /* Different than 2.5.x default since the old config made no sense */
 
 collection_group { 
   collect_every = 1800 
   time_threshold = 3600 
   metric { 
 name = disk_total 
 value_threshold = 1.0 
   } 
 }
 
 collection_group { 
   collect_every = 40 
   time_threshold = 180 
   metric { 
 name = disk_free 
 value_threshold = 1.0 
   } 
   metric { 
 name = part_max_used 
 value_threshold = 1.0 
   } 
 }
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] intermittent blanks in graphs

2006-01-25 Thread Martin Knoblauch

Hi Ben,

 just for your info. In 3.0.3 gmetric will have a check to prevent
non-numbers being inserted into the XML stream.

 In the meanwhile below patch may help you. I will discuss it with on
the developers list.

Martin

--- Martin Knoblauch [EMAIL PROTECTED] wrote:

 Ben,
 
  are you able to rebuild gmetad with the follwing qick fix? This
 seems
 to solve it for me:
 
 --- rrd_helpers.c-orig  2006-01-25 16:14:16.0 +0100
 +++ rrd_helpers.c   2006-01-25 16:10:27.0 +0100
 @@ -54,7 +54,7 @@
{
   err_msg(RRD_update (%s): %s, rrd, rrd_get_error());
   pthread_mutex_unlock( rrd_mutex );
 - return 1;
 + return 0;
}
 /* debug_msg(Updated rrd %s with value %s, rrd, val); */
 pthread_mutex_unlock( rrd_mutex );
 
 
 --- Martin Knoblauch [EMAIL PROTECTED] wrote:
 
  
  
  --- Ben Hartshorne [EMAIL PROTECTED] wrote:
  
   
   Jan 24 17:24:18 localhost /usr/sbin/gmetad[30443]: RRD_update
   (/var/lib/ganglia/rrds/production/raiden-8-db1/users.rrd):
  conversion
   of
   'min,' to float not complete: tail 'min,'
   
   This seems to relate to a recent change I made that I had
 forgotten
   about.  :)  I added the following line to my crontab:
   
   */2 * * * * /usr/bin/gmetric --name=users --value=`w | head -1
 |
   awk '{print $6}'` --type=int16
   
  
   OK, as I discovered before, your command can put funny things like
  min, into the metrics stream. Unfortunatelly, gmetric or gmond
 are
  stupid enough to handle that.
  
   I can now kind of reproduce your problem by inserting the
 following
  into the stream:
  
  gmetric --name=users --type=int16 --value=min,
  
   This appears then in both the gmond and gmetad XML. As a result,
 the
  report graphs on my cluster view show the gaps. As soon as I
 insert
  a
  number into the stream, the graphs work fine.
  
   But - I only see the gaps in the cluster overview. The node
 displays
  are not affected (both in the cluster overview and on the node
  pages).
  
   Seems we need to make gmetric or gmond more robust against junk.
 Or
  we
  need to see what the problem in the web interface is. Or both :-)
  
  Martin
  
  --
  Martin Knoblauch
  email: k n o b i AT knobisoft DOT de
  www:   http://www.knobisoft.de
  
  
  ---
  This SF.net email is sponsored by: Splunk Inc. Do you grep through
  log files
  for problems?  Stop!  Download the new AJAX search engine that
 makes
  searching your log files as easy as surfing the  web.  DOWNLOAD
  SPLUNK!
 

http://sel.as-us.falkag.net/sel?cmd=lnkkid=103432bid=230486dat=121642
  ___
  Ganglia-general mailing list
  Ganglia-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/ganglia-general
  
  
 
 
 --
 Martin Knoblauch
 email: k n o b i AT knobisoft DOT de
 www:   http://www.knobisoft.de
 
 
 ---
 This SF.net email is sponsored by: Splunk Inc. Do you grep through
 log files
 for problems?  Stop!  Download the new AJAX search engine that makes
 searching your log files as easy as surfing the  web.  DOWNLOAD
 SPLUNK!

http://sel.as-us.falkag.net/sel?cmd=lnkkid=103432bid=230486dat=121642
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] 2 clusters in same subnet

2006-01-30 Thread Martin Knoblauch


--- regatta [EMAIL PROTECTED] wrote:

 Hi everyone
 
 I have one comment about ganglia document and one question, my
 comment
 is that there is no REALLY document about how to use/ ganglia (please
 don't ask me to read http://ganglia.info/docs/, it's the worse
 document I every saw, it suppose that you are expert in ganglia)


 you may have a point here. why notv become an expert and write new
docs ? :-)
 
 Now my question :)  :
 
 I have two clusters in the same subnet (each cluster has 24 nodes) ,
 now why they are the same subnet, this is different subject but they
 must be :)
 
 now how can I configure one node in each cluster to run gmetad and
 the
 php-web to display the 2 clusters as 2 clusters or grids
 
 what I did is that I installed gmond in all nodes (in both clusters),
 I changed /etc/gmod.conf in cluster A to :
 cluster {
   name = Cluster A
 }
 
 
 and in cluster B
 
 cluster {
   name = Cluster B
 }
 
 
 but when I go to gmetad I find it sometime it collect them all
 together or it put some node in A to be B and some B to A !!
 
 Any help ?
 

 You need to separate the ports where your clusters multicast. Default
is 8649. Select another port for (8648) for your second cluster.

 Then you need to define two datasources in gmetad.conf (you only need
one of those).

data_source cluster 1 node_in_cluster_1:8649
data_source cluster 2 node_in_cluster_2:8648

 That should do the trick.

Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Pointers on architecting a largescale ganglia setup??

2006-01-31 Thread Martin Knoblauch


--- Joel Krauska [EMAIL PROTECTED] wrote:

 Rick Mohr wrote:
  The unicast approach does save on gmond memory usage as you
 mentioned.  
  It's up to each site to determine just how much memory the metrics
 will 
  take up, and if it is considered a significant amount.  (But it can
 get 
  somewhat big on a large cluster like mine with a bunch of added
 metrics.)
 
 Can you share any code you've written for additional metrics?


 just in case you did not know:

 http://ganglia.sourceforge.net/gmetric/

 Everyone is invited to contribute to the repository.

Cheers
Martin


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Pointers on architecting a largescale ganglia setup??

2006-01-31 Thread Martin Knoblauch

Joel,

 2gmetric (at least in 3.0.x) takes a -c argument where you can
specify the path to gmond.conf. gmetric will then use any transport
defined for gmond. Simple, isn't it?

Martin

--- Joel Krauska [EMAIL PROTECTED] wrote:

 Martin Knoblauch wrote:
   just in case you did not know:
  
   http://ganglia.sourceforge.net/gmetric/
 
 Hadn't known about this -- thanks.
 
 Question:
 I just went an covnerted to using UDP unicasts.
 
 The gmetric man page seems to imply that it only supports the
 multicast 
 comm method.  Is there a way for gmetric just to report to the local
 gmond?
 
 DESCRIPTION
 The Ganglia Metric Client (gmetric) announces a metric value
 to 
 all Ganglia Monitoring Daemons (gmonds) that are listening on the 
 cluster multicast channel.
 
 I'll likely figure this out soon, but I thought I'd bring it up.
 
 --joel
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

[Ganglia-general] gmond binary for hp-ux-11.11/hppa

2006-02-01 Thread Martin Knoblauch

Hi,

 anybody who could provide me with a 3.0.2 gmond executable for the
following arch:

HP-UX hdsdm3 B.11.11 U 9000/800

 Unfortunatelly the systems I want to look at have no decent
development environment.

TIA
Martin


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Solaris 9 compile of 3.02

2006-02-02 Thread Martin Knoblauch

Hi Russel,

 gcc (3.X onward) is the only compiler that we (I) recommend on any
platform. And we never promised anything else :-)

Cheers
Martin

--- Russell Nordquist [EMAIL PROTECTED] wrote:

 I am having problems compiling ganglia 3.02 on Sparc Solaris 9
 machine
 using Forte 7 (an older version of Sun Studio). This is the error I
 get:
 
 [EMAIL PROTECTED]:~/ganglia-3.0.2$make
 make  all-recursive
 Making all in srclib
 Making all in libmetrics
 make  all-recursive
 Making all in solaris
 source='metrics.c' object='metrics.lo' libtool=yes \
 DEPDIR=.deps depmode=none /bin/bash ../build/depcomp \
 /bin/bash ../libtool --tag=CC --mode=compile cc -DHAVE_CONFIG_H  -I.
 -I.
 -I.. -I.. -I../lib -g -D__STDC__ -D_POSIX_C_SOURCE=199506L
 -DHAVE_STRERROR -c -o metrics.lo metrics.c
  cc -DHAVE_CONFIG_H -I. -I. -I.. -I.. -I../lib -g -D__STDC__
 -D_POSIX_C_SOURCE=199506L -DHAVE_STRERROR -c metrics.c  -KPIC -DPIC
 -o
 .libs/metrics.o
 command line: warning: macro redefined: __STDC__
 /usr/include/sys/resource.h, line 126: incomplete struct/union/enum
 timeval: ru_utime
 ../unpifi.h, line 34: syntax error before or at: u_char
 ../unpifi.h, line 34: cannot recover from previous errors
 cc: acomp failed for metrics.c
 *** Error code 1
 make: Fatal error: Command failed for target `metrics.lo'
 Current working directory
 /home/russelln/ganglia-3.0.2/srclib/libmetrics/solaris
 *** Error code 1
 continues
 
 googleing has lead me to the same error for others (including
 sunfreeware) without a posted solution. I have not tried with gcc
 since
 it is not part of our local builds.
 
 Has anyone successfully compiled ganglia for Solaris 9? If so what
 compiler did you use.
 
 thanks
 russell
 
 
 ---
 This SF.net email is sponsored by: Splunk Inc. Do you grep through
 log files
 for problems?  Stop!  Download the new AJAX search engine that makes
 searching your log files as easy as surfing the  web.  DOWNLOAD
 SPLUNK!

http://sel.as-us.falkag.net/sel?cmd=lnkkid=103432bid=230486dat=121642
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Solaris 9 compile of 3.02

2006-02-02 Thread Martin Knoblauch

Russel,

 personally I can only report successful building on Solaris 8 (64-bit
Sparc). I know of at least one guy who was able to successful build on
Solaris 10 (AMD-64).

 I see no reason why Solaris 9 should be a problem.

Martin

--- Russell Nordquist [EMAIL PROTECTED] wrote:

 Ok. I can understand that. Any reports of successful Solaris 9
 compiles?
  with gcc?
 
 russell
 
 On 2/2/06 3:43 PM, Martin Knoblauch wrote:
  Hi Russel,
  
   gcc (3.X onward) is the only compiler that we (I) recommend on any
  platform. And we never promised anything else :-)
  
  Cheers
  Martin
  


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia on Irix: gmond only

2006-02-14 Thread Martin Knoblauch

Hi Luc,

 good to know you are happy now. As I do not remember exactely what I
told you back then, would you be willing to do a final experiment? Just
remove all config.cache files from the tree before doing configure
(with the original configure script). And/or do a make distclean
before.

 Could be a sed problem. Mine is newer (4.1.4), but we never had a
requirement on the sed-version.

Cheers
Martin

--- Luc Gauthier [EMAIL PROTECTED] wrote:

 Hi Martin,
 
 I finally had time to come back to my problem compiling ganglia on
 Irix...
 
 Thanks to your hint, I managed to have everything working ! If you
 remember, you suggested I should investigate those error messages I
 got
 when running the 'configure' script:
 


 Configuring expat ...
 
 configure: loading
 cache /home/master/shared/lgauthie/ganglia/ganglia-3.0.2/config.cache
 ./configure[1347]: syntax error at line 157 : `(' unexpected
 
 Configuring apr ...
 
 configure: loading
 cache /home/master/shared/lgauthie/ganglia/ganglia-3.0.2/config.cache
 ./configure[1398]: syntax error at line 157 : `(' unexpected
 
 Configuring libconfuse ...
 
 configure: loading
 cache /home/master/shared/lgauthie/ganglia/ganglia-3.0.2/config.cache
 ./configure[1428]: syntax error at line 157 : `(' unexpected
 


 
 Those error messages did not stop the 'configure' script from running
 but I had those strange error messages (hard links not adjusted to
 our
 directory architecture) when gmake'ing:
 gmake[3]: *** No rule to make target
 `/home/scratch/ganglia-cvs/ganglia-
 post-2_5_7/monitor-core/srclib/apr/build/apr_rules.mk'.  Stop.
 
 So I decided to take a look at line 157 of the config.cache file that
 was giving the errors at configuration time and here it is:
 test ${lt_cv_sys_global_symbol_to_c_name_address+set} = set ||
 lt_cv_sys_global_symbol_to_c_name_address='sed -n -e '\''s/^\'^: \([^
 \'^ ]*\) $/  {\\1\, (lt_ptr) 0},/p'\'' -e '\''s/^\'^[BCDEGRST] \([^
 \'^ ]*\) \([^\'^ ]*\)$/  {\2, (lt_ptr) \\2},/p'\'
 
 Unfortunately, I must admit I am not advanced enough to rapidly see
 where is that unexpected '('. And since I did not have enough time to
 investigate it, I decided to refrain 'expat', 'apr' and 'libconfuse'
 from using the cache file at configuration time. To do so, I simply
 modified three lines in the 'configure' file (line number between
 square
 brackets):
 


 [2119] cd srclib/expat  ./configure --cache-file=
 $ganglia_popdir/config.cache
 became
 [2119] cd srclib/expat  ./configure
 
 [2123] cd srclib/apr  ./configure --cache-file=
 $ganglia_popdir/config.cache
 became
 [2123] cd srclib/apr  ./configure
 
 [2127] cd srclib/confuse  ./configure --cache-file=
 $ganglia_popdir/config.cache --disable-nls
 became
 [2127] cd srclib/confuse  ./configure --disable-nls
 


 
 So those three modules do the whole configuration step from scratch.
 Of
 course, it takes more time and this is not the best way to do the job
 but in the end, the gmake step goes all the way to the end and I get
 the
 binaries I've been waiting for ! :) Now that I have them, I can go on
 and try to set up ganglia on our network. I'll come back to the list
 if
 I have problems there.
 
 Thanks again for your help and if it can help, here is a summary of
 the
 tools I have on the machine I compiled ganglia on :
 - OS: IRIX 6.5.24m
 - sed: GNU sed version 4.0.7
 - test: test (GNU sh-utils) 2.0
 - automake: automake (GNU automake) 1.7.5
 - autoconf: autoconf (GNU Autoconf) 2.57
 - gmake: GNU Make 3.80
 - install: install (fileutils) 4.1
 
 
 Best regards,
 Luc Gauthier
 
 
 
 
 
 
 Le mercredi 25 janvier 2006 ï¿½ 06:19 -0800, Martin Knoblauch a
ï¿½crit :
 
  Hi Luc,
  
   yes, the native IRIX does not support --version. Actually,
 using
  gmake is the right thing to do.
  
   Your toolchain seems older than mine (automake-1.9.5,
 autoconf-2.59,
  libtool-1.5.20), but newer than the recommended (1.6.3, 2.53,
 1.4.2).
  What is the version of libtool?
  
   In any case, what worries me are the syntax errors from configure.
  Maybe you can check what they are about.
  
   Configuring expat ...
   
   configure: loading cache
   /home/master/shared/ganglia_src/ganglia-3.0.2/config.cache
   ./configure[1347]: syntax error at line 157 : `(' unexpected
   
   Configuring apr ...
   
   configure: loading cache
   /home/master/shared/ganglia_src/ganglia-3.0.2/config.cache
   ./configure[1398]: syntax error at line 157 : `(' unexpected
   
   Configuring libconfuse ...
   
   configure: loading cache
   /home/master/shared/ganglia_src/ganglia-3.0.2/config.cache
   ./configure[1428]: syntax error at line 157 : `(' unexpected
  
   Also, could you reproduce a list of files with those

Re: [Ganglia-general] Using Ganglia to monitor JVM based services and DB servers.

2006-02-15 Thread Martin Knoblauch

Dear Miguel,


--- Josï¿½ Miguel Pereira Tavares [EMAIL PROTECTED] wrote:

 
   Hi all!
 
   As far as I could find out gmond has a set of metrics built-in at 
 compile time (a rather convenient set most of the time). If more 
 information regarding the node or the software running on that node
 is 
 necessary then gmetric can be used to publish that metric.
   Hoping that the previous paragraph affirmation is correct I would
 like 
 to ask around some question, though I will also provide some possible
 
 answers/thoughts about them:
 
   1. Doesn't using gmetric (forking a process) consume a bit too much
 of 
 the system resources?


 This really depends on:

a) the resources of your system
b) what is the number of new metreics that you want to insert into the
XML stream
c) what is the frequency you are calling gmetric.

 I personally would not worry to much. At least not before measuring
the impcat on a life system :-)

 In any case, if gmetric is to heavy for yoo, you could always
integrate your metrics into the metrics reported by gmond. This is
not trivial, but not impossible. The drawback is that it may make that
gmond incompatible with the standard one.
 
 Another solution would be to look at the gemetric source and write
your own version that collects all interesting metrics and submits them
together. That way you would reduce the number of forks. Seems you
already contemplated this.

   2. I need to monitor a JVM profile. Has anyone tried something like 
 similar? Any thoughts or ideas on best way to achieve this with 
 Ganglia?
 
   3. I also need to monitor some database services... thoughts and
 ideas 
 most welcome.


 for both 2 and three - you need to retrieve the metrics and feed them
into the stream. Ok, probably not what you wanted to hear :-)

Cheers
Martin 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] No RRDs Created On MacOSX

2006-02-21 Thread Martin Knoblauch


--- Mike Walker [EMAIL PROTECTED] wrote:

 However, if I run gstat -a  I do get the data I would expect.  But  
 when I run anything with gmetric I get nothing (no errors no  
 output).  Of course I might be doing gmetric wrong, so here is  
 what I tried.
 
  'gmetric -n mem_free -v mem_free -t uint32'
 
Mike,

 this could be a real killer. Up to/including 3.0.2 gmetad has a bug
that will stop any host reporting metrics if a integer/floating typed
metric has a value that does not represent a number. Unfortunatelly
gmetric is not very picky about the strings that get passed via -v.
The next release (3.0.3, no planned date) will have a fix that makes
gmetric check whether the -v string translates into a number.

 In the meanwhile, you could try the following fix to gmetad:

--- rrd_helpers.c-orig  2006-01-25 16:14:16.0 +0100
+++ rrd_helpers.c   2006-01-25 16:10:27.0 +0100
@@ -54,7 +54,7 @@
   {
  err_msg(RRD_update (%s): %s, rrd, rrd_get_error());
  pthread_mutex_unlock( rrd_mutex );
- return 1;
+ return 0;
   }
/* debug_msg(Updated rrd %s with value %s, rrd, val); */
pthread_mutex_unlock( rrd_mutex );

 If it fixes your problems, please report back.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Missing stats on Irix

2006-02-21 Thread Martin Knoblauch


--- Alex Balk [EMAIL PROTECTED] wrote:

 Network stats are also missing on HPUX.
 
 Not hardly a dying species...

 yeah, but I would not call it growing like fungus either :-)

 If only I had the time, the system and a working gcc environment for
HP-UX. At least I would be able to compile 3.0.2. I still need a binary
for a HPPA machine under 11.11.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia truncating larger status messages

2006-02-22 Thread Martin Knoblauch

Hi Ian,

 beating me with this advice :-)

 Chris - Please assign the bug to me. Not that I know how to fix this
short term, as upgrading the whole of apr might be a challenge. I
could imagine a way to specify a different apr-location with
configure.

Cheers
Martin

--- Ian Cunningham [EMAIL PROTECTED] wrote:

 Chris,
 
 Thanks for the notice. Please file a bug with your work around here:
 http://bugzilla.ganglia.info/
 
 Thanks again,
 Ian
 
 Chris Black wrote:
 
  Hello list,
 
  We're using Ganglia at the University of Michigan to monitor
 cluster  
  nodes, and we found an issue with 3.0.2.  When sending status  
  messages from gmond to gmetad, messages over ~66600 bytes would be 
 
  truncated and the trailing /GANGLIA tag (among a few others at
 the  
  end) would be missing, and the gmetad host would mark that client
 as  
  missing.
 
  We found the problem to be in version 0.9.5 of the Apache Portable 
 
  Runtime (APR) that shipped with Ganglia 3.0.2.  Upgrading to the  
  newest APR (0.9.7) fixed the problem.
 
  We used the following procedure to correct the problem on Mac OS X 
 
  Server 10.4.4 Buid 8G32:
 
  1) untar the ganglia sources
  2) cd into the ganglia-3.0.2/srclib directory
  3) remove the 'apr' directory
  4) download the 0.9.7 sources of apr into this directory  
  (ganglia-3.0.2/srclib)
  5) untar the apr sources
  6) rename the resulting apr-0.9.7 directory to apr (or create a
 symlink)
  7) move up one directory to ganglia-3.0.2
  8) build/install as normal
 
  Hopefully this will be of assistance to anyone seeing a similar
 problem.
 
  Chris Black
  LSA-IT
  University of Michigan
 
 
 
 
 ---
 This SF.net email is sponsored by: Splunk Inc. Do you grep through
 log files
 for problems?  Stop!  Download the new AJAX search engine that makes
 searching your log files as easy as surfing the  web.  DOWNLOAD
 SPLUNK!

http://sel.as-us.falkag.net/sel?cmd=lnkkid=103432bid=230486dat=121642
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

[Ganglia-general] First prerelease of ganglia-3.0.3 ready for testing

2006-02-23 Thread Martin Knoblauch

Hi friends of Ganglia,

 please find the first test drop of the upcoming ganglia-3.0.3 release
at:

http://www.knobisoft.de/ganglia/ganglia-3.0.3.200602231926-apr0.9.7.tar.gz

 This is again planned to be a minor bug-fix release which is supposed
to be compatible to earlier 3.0.X releases of ganglia.

 So far, the differences against 3.0.2 are minimal:

- minor fixes to the documentation
- make gmetric more robust against illegal numeric values, which
could cause gmetad to stop recording complete nodes.
- fix the libconfuse.spec file (Copyright - License, Swedish Locales).
- fix make check. Expat would not know how to do it.
- AIX: fix proc_total, proc_run, swap_free and swap_total. Implement
mem_cached
- introduce a scaling factor for the load - colour-code transformation
in the web-frontend. The default of 1.0 is only good for HPC nodes.
Fileservers and similar would go red to early.
- replace apr with version 0.9.7. This is supposed to fix some
problems with large chunks of XML being truncated. In fact this is the
biggest change in this release and needs testing !!!

 So, please download and test. Especially on the non-Linux platforms.

Thanks
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] gmond install problems

2006-03-08 Thread Martin Knoblauch

Dan,

 you actually need to rebuild gmond on your box. Sorry.

 Either get the tarball, or get the source RPM and rebuild from that.

Martin

--- Dan Roberts [EMAIL PROTECTED] wrote:

 Hello All
 
 
 How do I get around this error without going to glibc 2.3.3 as
 suggested?
 
 sudo rpm -Uvh ganglia-3.0.2-1/ganglia-gmond-3.0.2-1.i386.rpm
 Preparing...   
 ###
 [100%]
1:ganglia-gmond 
 ###
 [100%]
 Starting GANGLIA gmond: /usr/sbin/gmond: relocation
 error: /usr/sbin/gmond: symbol sys_siglist, version GLIBC_2.3.3 not
 defined in file libc.so.6 with link time reference
 [FAILED]
 
  rpm -qa | grep glibc | sort
 glibc-2.3.2-27.9.7
 glibc-common-2.3.2-27.9.7
 glibc-devel-2.3.2-27.9.7
 glibc-kernheaders-2.4-8.10
 glibc-profile-2.3.2-27.9.7
 glibc-utils-2.3.2-27.9.7
 
 
 I have another system which supports the same version gmond using a
 slightly different version of glibc as shown below..
 How can I get the above system working correctly without upgrading to
 glibc 2.3.3?!
 I noted that my working system below has the glibc-headers rpm
 installed
 while my failing system doesn't.  Might this be the problem?  If YEs,
 could someone point me to the location of the rpm which I could
 download.  I couldn't find it on the net.
 Thanks for any help!
 Dan
 
 
 rpm -qa | grep glibc
 glibc-headers-2.3.2-95.20
 glibc-kernheaders-2.4-8.34
 glibc-2.3.2-95.20
 glibc-common-2.3.2-95.20
 glibc-utils-2.3.2-95.20
 glibc-devel-2.3.2-95.20
 compat-glibc-7.x-2.2.4.32.6
 glibc-profile-2.3.2-95.20
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Changing node name

2006-03-12 Thread Martin Knoblauch

Richard,

 that is on my todo list. Just having some time being the issue ...

Martin

--- Richard Lefebvre [EMAIL PROTECTED] wrote:

 Is there a way to set the nodename in gmond.conf? Instead of using 
 reverse hostname lookup using the IP. I'm running ganglia 3.0.1 on an
 
 Cray XD1 and the IP gmond uses is the external on instead of the 
 internal one. The external IP has no hostname associated with it is
 is 
 given at random.
 
 Richard
 
 
 
 ---
 This SF.Net email is sponsored by xPML, a groundbreaking scripting
 language
 that extends applications into web and mobile media. Attend the live
 webcast
 and join the prime developer group breaking into this new coding
 territory!

http://sel.as-us.falkag.net/sel?cmd=lnkkid=110944bid=241720dat=121642
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

[Ganglia-general] Upgrade to apr-0.9.7

2006-03-29 Thread Martin Knoblauch

Hi,

 everyone monitoring ganglia-cvs will by now have seen that I have
upgraded the apr sources within the ganglia CVS tree to version 0.9.7.
This was done to fix some reported problems with the old version.

 So, if you are using CVS sources to build ganglia, please do a

cvs update -Pd

 or just do a new checkout.

 The new tree builds fine on my FC4 notebook, including RPM building. I
plan to do an Aprils Fool tarball release very soon. Please check in
anything you think is valuable.

Cheers
Martin
PS: Sorry for the many notification mails on ganglia-cvs.

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] gmond unreliable on one cluster, must be constantly restarted

2006-03-29 Thread Martin Knoblauch

, a groundbreaking scripting
 language
 that extends applications into web and mobile media. Attend the live
 webcast
 and join the prime developer group breaking into this new coding
 territory!

http://sel.as-us.falkag.net/sel?cmd=lnkkid=110944bid=241720dat=121642
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Re: gmetad not updating RRD's/hosts that are proper in gmond XML

2006-03-29 Thread Martin Knoblauch

 
  DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=proc_total VAL=128 TYPE=uint32 UNITS= TN=8 
  TMAX=950 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=mem_free VAL=1328356 TYPE=uint32 UNITS=KB
 TN=8 
  TMAX=180 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=mem_shared VAL=0 TYPE=uint32 UNITS=KB TN=8 
  TMAX=180 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=mem_buffers VAL=199232 TYPE=uint32 UNITS=KB
 TN=8 
  TMAX=180 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=mem_cached VAL=4569200 TYPE=uint32 UNITS=KB
 TN=8 
  TMAX=180 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=swap_free VAL=2101964 TYPE=uint32 UNITS=KB
 TN=8 
  TMAX=180 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=gexec VAL=ON TYPE=string UNITS= TN=188
 TMAX=300 
  DMAX=0 SLOPE=zero SOURCE=gmond/
  METRIC NAME=bytes_out VAL=6066.85 TYPE=float
 UNITS=bytes/sec 
  TN=8 TMAX=300 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=bytes_in VAL=203006.30 TYPE=float
 UNITS=bytes/sec 
  TN=8 TMAX=300 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=numthreads VAL=2 TYPE=int8 UNITS= TN=324 
  TMAX=60 DMAX=0 SLOPE=both SOURCE=gmetric/
  METRIC NAME=numjobs VAL=2 TYPE=int8 UNITS= TN=324
 TMAX=60 
  DMAX=0 SLOPE=both SOURCE=gmetric/
  /HOST
  
  
  Good host:
  
gmond:
  Processing a Ganglia_message from goodhost
gmetad:
  Updating host goodhost, metric numjobs
  server_thread() received request 
  /Opteron_Production-Desktop_Droid_Cluster/goodhost from 127.0.0.1
XML:
  HOST NAME=goodhost IP=10.73.16.225 REPORTED=1143682838
 TN=1 
  TMAX=20 DMAX=0 LOCATION=unspecified
 GMOND_STARTED=1143137198
  METRIC NAME=cpu_num VAL=2 TYPE=uint16 UNITS=CPUs TN=838 
  TMAX=1200 DMAX=0 SLOPE=zero SOURCE=gmond/
  METRIC NAME=disk_total VAL=71.047 TYPE=double UNITS=GB 
  TN=2039 TMAX=1200 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=disk_free VAL=46.667 TYPE=double UNITS=GB
 TN=178 
  TMAX=180 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=cpu_speed VAL=2411 TYPE=uint32 UNITS=MHz
 TN=838 
  TMAX=1200 DMAX=0 SLOPE=zero SOURCE=gmond/
  METRIC NAME=part_max_used VAL=70.5 TYPE=float UNITS=
 TN=178 
  TMAX=180 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=mem_total VAL=8147640 TYPE=uint32 UNITS=KB
 TN=838 
  TMAX=1200 DMAX=0 SLOPE=zero SOURCE=gmond/
  METRIC NAME=swap_total VAL=2104504 TYPE=uint32 UNITS=KB 
  TN=838 TMAX=1200 DMAX=0 SLOPE=zero SOURCE=gmond/
  METRIC NAME=boottime VAL=1142553979 TYPE=uint32 UNITS=s 
  TN=838 TMAX=1200 DMAX=0 SLOPE=zero SOURCE=gmond/
  METRIC NAME=machine_type VAL=x86_64 TYPE=string UNITS=
 TN=838 
  TMAX=1200 DMAX=0 SLOPE=zero SOURCE=gmond/
  METRIC NAME=os_name VAL=Linux TYPE=string UNITS= TN=838 
  TMAX=1200 DMAX=0 SLOPE=zero SOURCE=gmond/
  METRIC NAME=os_release VAL=2.6.13.4_K8+NUMA+NV TYPE=string 
  UNITS= TN=838 TMAX=1200 DMAX=0 SLOPE=zero
 SOURCE=gmond/
  METRIC NAME=cpu_user VAL=73.1 TYPE=float UNITS=% TN=8 
  TMAX=90 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=cpu_system VAL=3.9 TYPE=float UNITS=% TN=8 
  TMAX=90 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=load_one VAL=1.99 TYPE=float UNITS= TN=9 
  TMAX=70 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=proc_run VAL=2 TYPE=uint32 UNITS= TN=149 
  TMAX=950 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=proc_total VAL=156 TYPE=uint32 UNITS= TN=149
 
  TMAX=950 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=mem_free VAL=2359176 TYPE=uint32 UNITS=KB
 TN=28 
  TMAX=180 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=mem_shared VAL=0 TYPE=uint32 UNITS=KB TN=28 
  TMAX=180 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=mem_buffers VAL=36384 TYPE=uint32 UNITS=KB
 TN=28 
  TMAX=180 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=mem_cached VAL=4162056 TYPE=uint32 UNITS=KB
 TN=28 
  TMAX=180 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=swap_free VAL=1786428 TYPE=uint32 UNITS=KB
 TN=28 
  TMAX=180 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=gexec VAL=ON TYPE=string UNITS= TN=229
 TMAX=300 
  DMAX=0 SLOPE=zero SOURCE=gmond/
  METRIC NAME=bytes_out VAL=305162.19 TYPE=float
 UNITS=bytes/sec 
  TN=28 TMAX=300 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=bytes_in VAL=40802.30 TYPE=float
 UNITS=bytes/sec 
  TN=28 TMAX=300 DMAX=0 SLOPE=both SOURCE=gmond/
  METRIC NAME=numthreads VAL=1 TYPE=int8 UNITS= TN=844 
  TMAX=60 DMAX=0 SLOPE=both SOURCE=gmetric/
  METRIC NAME=numjobs VAL=1 TYPE=int8 UNITS= TN=844
 TMAX=60 
  DMAX=0 SLOPE=both SOURCE=gmetric/
  /HOST
  
  
 
 
 
 ---
 This SF.Net email is sponsored by xPML, a groundbreaking scripting
 language
 that extends applications into web and mobile media. Attend the live
 webcast
 and join the prime developer group breaking into this new coding
 territory!

http://sel.as-us.falkag.net/sel?cmd=lnkkid=110944bid=241720dat=121642
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i

RE: [Ganglia-general] Re: gmetad not updating RRD's/hosts that are proper in gmond XML

2006-03-30 Thread Martin Knoblauch

Eli,

 OK, the messages coming from RRDTOOL, just telling that you tried to
update the same metric with exactely the same timestamp.

 Do you see any messages prefixed RRD_create in your logfiles? 

 The problem is that if one of the rrd_updates fails, gmetad stops
working on anything.

 Do you have a chance to rebuild gmetad with the following patch? It is
against current CVS, but should apply against 3.0.2. If it helps, all
hosts (metrics) except the one causing problems should be OK. It might
not be the real solution, but may help us to track it down.

[gmetad]$ diff -udp rrd_helpers.c rrd_helpers.c-new
--- rrd_helpers.c   2005-03-15 19:11:33.0 +0100
+++ rrd_helpers.c-new   2006-03-30 11:28:26.0 +0200
@@ -54,7 +54,7 @@ RRD_update( char *rrd, const char *sum,
   {
  err_msg(RRD_update (%s): %s, rrd, rrd_get_error());
  pthread_mutex_unlock( rrd_mutex );
- return 1;
+ return 0;
   }
/* debug_msg(Updated rrd %s with value %s, rrd, val); */
pthread_mutex_unlock( rrd_mutex );


 In addition,  do you see any messages prefixed RRD_create in your
logfiles? You should, as some of the RRD files seem to be missing.

Cheers
Martin



--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

RE: [Ganglia-general] Re: gmetad not updating RRD's/hosts that are proper in gmond XML

2006-03-30 Thread Martin Knoblauch

Richard, [adding ganglia-developers for comments]

 pretty good explanation of what is likely happening, or what can go
wrong. I sent Eli a patch I found useful a while ago, but which is not
in CVS yet (because I fixed the root-problem of the illegal updates).
This should prevent gmetad from ignoring all hosts/metrics if just one
of them is corrupt. Somewhere in the code we go nuts on an error
return.

[gmetad]$ diff -udp rrd_helpers.c rrd_helpers.c-new
--- rrd_helpers.c   2005-03-15 19:11:33.0 +0100
+++ rrd_helpers.c-new   2006-03-30 11:28:26.0 +0200
@@ -54,7 +54,7 @@ RRD_update( char *rrd, const char *sum,
   {
  err_msg(RRD_update (%s): %s, rrd, rrd_get_error());
  pthread_mutex_unlock( rrd_mutex );
- return 1;
+ return 0;
   }
/* debug_msg(Updated rrd %s with value %s, rrd, val); */
pthread_mutex_unlock( rrd_mutex );


--- [EMAIL PROTECTED] wrote:

 Eli,
 
 Martin is most surely right. If you are running an unpatched 3.0.2,
 let me share with you the many ways it can all go wrong.
 
 gmond generates the hostnames found in the XML stream by reverse DNS
 lookup only. Its internal structures treat every different IP address
 it sees as a different host, regardless of what the reverse DNS entry
 is.
 
 So, if you have
 1) Incorrect reverse DNS entries such that 2 different hosts reverse
 map
   to the same hostname,
 2) Or 2 NICs on a host that are not teamed (i.e. 2 different
 addresses)
 and
   the routing allows packets to exit either NIC, hence either source
 address
   may be used.
 3) Or a DHCP lease renewal that results in a host changing IP
 addresses.
 
 Then what will happen is that the XML stream from the cluster will
 contain
 2 (or more) entries with different IP addrs, but the same name. Even
 in
 the DHCP
 case when only 1 source address is used at a time, gmond will keep
 the
 old IP address
 entry until a timeout, even though it is not being updated. So dups
 arise again.
 
 Now unfortunately, gmetad only uses the HOSTNAME for the RRD files
 and
 its own
 processing. So if there is a duplicated hostname in the XML stream,
 it
 will update
 the RRDs after parsing the first entry, and then again after parsing
 the
 second.
 As these 2 updates to the same RRD files will occur in less than one
 second, this
 results in an RRD update error.
 
 On unpatched 3.0.2, this then causes THE ENTIRE PROCESSING OF THE
 CLUSTER TO BE ABANDONED.
 So some hosts get updated, some not, and the cluster view does not
 get
 updated.
 If you patch this particular issue, you will still get double
 processing
 for duped
 hosts, which can result in them erroneouly being reported as down
 (for
 example).
 
 phew.
 long mail.
 
 - richard
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of
 Martin
 Knoblauch
 Sent: 30 March 2006 08:05
 To: Eli Stair
 Cc: Ganglia-general@lists.sourceforge.net
 Subject: Re: [Ganglia-general] Re: gmetad not updating RRD's/hosts
 that
 are proper in gmond XML
 
 
 Eli,
 
  yup. That could definitely cause problems. Do you see anything in
 the
 /var/log/messages of the gmetad host?
 
  Hmm. You may have to restart *all* gmonds, as well as the gmetad.
 This
 is something that I usually do when my ganglia setup was hosed
 somehow.
 Definitely the case for multicast clusters. Not really sure about
 unicast.
 
  And yes - this is not optimal.
 
 --- Eli Stair [EMAIL PROTECTED] wrote:
 
  
  The only issue I can find at all with this config is that the new 
  hosts have been deployed by someone with two PTR records, both the 
  proper one
  pointing to the A hostname, as well as all having an improper PTR
 - 
  linux.FQDN.
  
  Is there a potential that gmetad is doing a lookup of both the
 forward
  and reverse entries for a host before populating it?  Unfortunately
 
  removing the invalid entry for a host and restarting gmetad as well
  as 
  the gmond aggregator and the host did not resolve it.
  
  /eli
  
  Eli Stair wrote:
   
   My installation started having an issue yesterday afternoon that
 I
  have
   yet to explain or remedy.  One cluster that I have unicasting,
 has
   started losing hosts... the directory entries on disk never get
 
   created for newly deployed hosts, and gmond reports receiving
  messages
   for the host (and outputs metrics) but gmetad does not report an
   updating host message, and never creates the RRD's even though
  the
   host is up.
   
   The critical problem is that the report graphs for this cluster
  have
   stopped being updated as well, which nix'es my ability to view
  cluster
   load/job level... in addition to not being able to alert on the
 RRD
  
   values for the individual hosts that are malfunctioning.  Those
  hosts
   that are good continue to update their metric RRD's properly,
  their
   host reports are populated etc.  The bad ones I cannot explain...
   
   The two questions, if anyone has insight:
   
   1) What is causing

RE: [Ganglia-general] Re: gmetad not updating RRD's/hosts that are proper in gmond XML

2006-03-30 Thread Martin Knoblauch

Eli,

 if the patch helps, I tend to put it into 3.0.3 (if CVS is working
again :-(

Martin

--- Eli Stair [EMAIL PROTECTED] wrote:

 
 Richard, Martin, et al:
 
 Thanks for all your assistance describing the workings and why it is
 going wrong... the glomming together of all the host XML and the
 organization to disk of it has been quite a black box to me.  
 
 You explain how this is can occur on an unpatched 3.0.2; is the
 recommended patch that which martin posted or is there something else
 suggested?  I'll give his a shot, and if it isn't successfull try the
 CVS build.  I've been trying to wait for 3.0.3 before making any more
 changes than just PHP interface stuff.
 
 Cheers,
 
 /eli
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED]
 Sent: Thu 3/30/2006 1:35 AM
 To: [EMAIL PROTECTED]; Eli Stair
 Cc: Ganglia-general@lists.sourceforge.net
 Subject: RE: [Ganglia-general] Re: gmetad not updating RRD's/hosts
 that areproper in gmond XML
  
 Eli,
 
 Martin is most surely right. If you are running an unpatched 3.0.2,
 let me share with you the many ways it can all go wrong.
 
 gmond generates the hostnames found in the XML stream by reverse DNS
 lookup only. Its internal structures treat every different IP address
 it sees as a different host, regardless of what the reverse DNS entry
 is.
 
 So, if you have
 1) Incorrect reverse DNS entries such that 2 different hosts reverse
 map
   to the same hostname,
 2) Or 2 NICs on a host that are not teamed (i.e. 2 different
 addresses)
 and
   the routing allows packets to exit either NIC, hence either source
 address
   may be used.
 3) Or a DHCP lease renewal that results in a host changing IP
 addresses.
 
 Then what will happen is that the XML stream from the cluster will
 contain
 2 (or more) entries with different IP addrs, but the same name. Even
 in
 the DHCP
 case when only 1 source address is used at a time, gmond will keep
 the
 old IP address
 entry until a timeout, even though it is not being updated. So dups
 arise again.
 
 Now unfortunately, gmetad only uses the HOSTNAME for the RRD files
 and
 its own
 processing. So if there is a duplicated hostname in the XML stream,
 it
 will update
 the RRDs after parsing the first entry, and then again after parsing
 the
 second.
 As these 2 updates to the same RRD files will occur in less than one
 second, this
 results in an RRD update error.
 
 On unpatched 3.0.2, this then causes THE ENTIRE PROCESSING OF THE
 CLUSTER TO BE ABANDONED.
 So some hosts get updated, some not, and the cluster view does not
 get
 updated.
 If you patch this particular issue, you will still get double
 processing
 for duped
 hosts, which can result in them erroneouly being reported as down
 (for
 example).
 
 phew.
 long mail.
 
 - richard
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of
 Martin
 Knoblauch
 Sent: 30 March 2006 08:05
 To: Eli Stair
 Cc: Ganglia-general@lists.sourceforge.net
 Subject: Re: [Ganglia-general] Re: gmetad not updating RRD's/hosts
 that
 are proper in gmond XML
 
 
 Eli,
 
  yup. That could definitely cause problems. Do you see anything in
 the
 /var/log/messages of the gmetad host?
 
  Hmm. You may have to restart *all* gmonds, as well as the gmetad.
 This
 is something that I usually do when my ganglia setup was hosed
 somehow.
 Definitely the case for multicast clusters. Not really sure about
 unicast.
 
  And yes - this is not optimal.
 
 --- Eli Stair [EMAIL PROTECTED] wrote:
 
  
  The only issue I can find at all with this config is that the new 
  hosts have been deployed by someone with two PTR records, both the 
  proper one
  pointing to the A hostname, as well as all having an improper PTR
 - 
  linux.FQDN.
  
  Is there a potential that gmetad is doing a lookup of both the
 forward
  and reverse entries for a host before populating it?  Unfortunately
 
  removing the invalid entry for a host and restarting gmetad as well
  as 
  the gmond aggregator and the host did not resolve it.
  
  /eli
  
  Eli Stair wrote:
   
   My installation started having an issue yesterday afternoon that
 I
  have
   yet to explain or remedy.  One cluster that I have unicasting,
 has
   started losing hosts... the directory entries on disk never get
 
   created for newly deployed hosts, and gmond reports receiving
  messages
   for the host (and outputs metrics) but gmetad does not report an
   updating host message, and never creates the RRD's even though
  the
   host is up.
   
   The critical problem is that the report graphs for this cluster
  have
   stopped being updated as well, which nix'es my ability to view
  cluster
   load/job level... in addition to not being able to alert on the
 RRD
  
   values for the individual hosts that are malfunctioning.  Those
  hosts
   that are good continue to update their metric RRD's properly,
  their
   host reports are populated etc.  The bad ones I cannot explain...
   
   The two

[Ganglia-general] Ganglia 3.0.3 released

2006-04-17 Thread Martin Knoblauch

Hi,

 for those who do not track the SF release system, I have today
released version 3.0.3 of Ganglia. The home page will be changed
accordingly.

 Files can be downloaded from the SourceForge site. Source is available
as tarball and SRPM. Binary RPMs have been built for RedHat FC4/i386.

 Development of version 3.0.4 is now open.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia 3.0.2 on Solaris 9

2006-04-18 Thread Martin Knoblauch

Hi Aravindh,

 a few questions:

- which gcc are you using (gcc --version)? I can build 3.0.2 and 3.0.3
on Solaris 8 with gcc-3.3.1.
- which make (gnu-make is recommended)?
- On 64-bit platforms configure using:  CC=gcc -m64 ./configure

 Oh yes, please try 3.0.3. Released yesterday :-)

Martin

--- [EMAIL PROTECTED] wrote:

 Hi all,
  
 I am getting the following error while installing Ganglia 3.0.2 on
 Solaris 9 machine.  The error is as follows:
  
 #PosixConnector#make
 make  all-recursive
 Making all in srclib
 Making all in libmetrics
 make  all-recursive
 Making all in solaris
 if /bin/bash ../libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H 
 -I.
 -I. -I..   -I/tmp/rrdtool/lb/include
 -I/tmp/rrdtool/lb/include/libart-2.0
 -I/tmp/rrdtool/lb/include/freetype2
 -I/tmp/rrdtool/lb/include/libpng  -I.. -I../lib -O3 -D__EXTENSIONS__
 -D_POSIX_C_SOURCE=199506L -DHAVE_STRERROR -MT metrics.lo -MD -MP -MF
 .deps/metrics.Tpo -c -o metrics.lo metrics.c; \
 then mv -f .deps/metrics.Tpo .deps/metrics.Plo; else rm -f
 .deps/metrics.Tpo; exit 1; fi
 mkdir .libs
  gcc -DHAVE_CONFIG_H -I. -I. -I.. -I/tmp/rrdtool/lb/include
 -I/tmp/rrdtool/lb/include/libart-2.0
 -I/tmp/rrdtool/lb/include/freetype2
 -I/tmp/rrdtool/lb/include/libpng -I.. -I../lib -O3 -D__EXTENSIONS__
 -D_POSIX_C_SOURCE=199506L -DHAVE_STRERROR -MT metrics.lo -MD -MP -MF
 .deps/metrics.Tpo -c metrics.c  -fPIC -DPIC -o .libs/metrics.o
 metrics.c:167: error: static declaration of 'ncpus' follows
 non-static
 declaration
 /usr/include/sys/cpuvar.h:351: error: previous declaration of 'ncpus'
 was here
 metrics.c: In function 'percentages':
 metrics.c:306: warning: pointer targets in assignment differ in
 signedness
 *** Error code 1
 make: Fatal error: Command failed for target `metrics.lo'
 Current working directory
 /opt/ganglia/ganglia-3.0.2/srclib/libmetrics/solaris
 *** Error code 1
 make: Fatal error: Command failed for target `all-recursive'
 Current working directory
 /opt/ganglia/ganglia-3.0.2/srclib/libmetrics
 *** Error code 1
 make: Fatal error: Command failed for target `all'
 Current working directory
 /opt/ganglia/ganglia-3.0.2/srclib/libmetrics
 *** Error code 1
 make: Fatal error: Command failed for target `all-recursive'
 Current working directory /opt/ganglia/ganglia-3.0.2/srclib
 *** Error code 1
 make: Fatal error: Command failed for target `all-recursive'
 Current working directory /opt/ganglia/ganglia-3.0.2
 *** Error code 1
 make: Fatal error: Command failed for target `all'
 #PosixConnector#pwd
 /opt/ganglia/ganglia-3.0.2
  
 But the same thing is working fine on LinuxAny idea of how to
 solve
 this or any hints that would take me out of this...
 
  
 Thanks and Regards
  
 ARAVINDH VARADHARAJU (WTO1 - E-Enabling)
 Project Engineer
 Tel   : +91- 80- 2852 0408 Extn.1053
 Mobile : +91- 99860 17606
  
 A Smile can take you MILES..! Keep Smiling and Have a Nice Day
  
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

RE: [Ganglia-general] Nodes Reported as Dead

2006-04-18 Thread Martin Knoblauch

 as
 dead, even though they are not. Doing a `telnet computer 8649` gives
 the
 appropriate data. Get Fresh Data will usually change out which
 nodes
 are dead and given a 30min cycle most will have switched.
 
 2) Even though this has been running for many hours, some of the
 alive
 nodes report inaccuracies. Like one node for example Last heartbeat
 received -209998 seconds ago Uptime -975 days, 16:27:49
 Swap: Using 0.0 of -100Mb
 Booted: January 1, 1970
 The inaccuracies change every so often and it will report correctly
 for
 a while. Most of those I don't care about but I think it may be a
 related problem.
 
 3) The dead nodes are almost all spot on with their stats, and if
 you
 go to the node view and click the Get Fresh Data the Load and CPU
 Utilization do update in sync even though its reported as dead.
 
 
 Maybe I missed the keywords, but I was not able to find anything
 quite
 like this in the email archive. I would be very grateful if anyone
 has
 any clues as to what maybe going on.
 
 Thank you for your time,
 Chris Stackpole
 
 
 ---
 This SF.Net email is sponsored by xPML, a groundbreaking scripting
 language that extends applications into web and mobile media. Attend
 the
 live webcast and join the prime developer group breaking into this
 new
 coding territory!
 http://sel.as-us.falkag.net/sel?cmd=kkid0944bid$1720dat1642
 ___
 Ganglia-general mailing list Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


 For more information about Barclays Capital, please
 visit our web site at http://www.barcap.com.
 
 
 Internet communications are not secure and therefore the Barclays 
 Group does not accept legal responsibility for the contents of this 
 message.  Although the Barclays Group operates anti-virus programmes,
 
 it does not accept responsibility for any damage whatsoever that is 
 caused by viruses being passed.  Any views or opinions presented are 
 solely those of the author and do not necessarily represent those of
 the 
 Barclays Group.  Replies to this email may be monitored by the
 Barclays 
 Group for operational or business reasons.
 


 
 
 
 
 ---
 This SF.Net email is sponsored by xPML, a groundbreaking scripting
 language
 that extends applications into web and mobile media. Attend the live
 webcast
 and join the prime developer group breaking into this new coding
 territory!
 http://sel.as-us.falkag.net/sel?cmd=lnkkid0944bid$1720dat1642
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] unable to write XML tree info

2006-04-19 Thread Martin Knoblauch

James,

 cool. No need to be sorry. This is actually valuable information, as
this may hit others as well.

 How did you find out and where exactely is the php_value located in
the config files?

Thanks
Martin

--- James Trater [EMAIL PROTECTED] wrote:

 I figured it out. I had assumed that it was a gmetad problem, but it
 turned out to be a problem with PHP - specifically the amount of
 memory that PHP is allowed to allocate. I put this in my apache
 config
 file for ganglia:
 
 php_value memory_limit 32M
 
 and it works fine now. Sorry!
 
 Jim
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Fwd: compile ganglia 3.0.3 on SLES9 x86_64

2006-05-17 Thread Martin Knoblauch

Hi Bernard,

--- Bernd Wenger [EMAIL PROTECTED] wrote:


/usr/lib64/gcc-lib/x86_64-suse-linux/3.3.3/../../../../x86_64-suse-linux/bin/ld:
 cannot find -lpng
 collect2: ld returned 1 exit status
 make[2]: *** [gmetad] Error 1
 make[2]: Leaving directory `/tmp/ganglia-3.0.3/gmetad'
 make[1]: *** [all-recursive] Error 1
 make[1]: Leaving directory `/tmp/ganglia-3.0.3'
 make: *** [all] Error 2
 

 you need to check whether you have libpng-devel installed.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

RE: [Ganglia-general] Fwd: compile ganglia 3.0.3 on SLES9 x86_64

2006-05-17 Thread Martin Knoblauch

Bernard,

 in addition to making libpng-devel a requirement for RPM builds, we
also should/need-to check in configure. Apparently configure is a
bit sloppy wrt. building gmetad.

  Any autoconf takers? :-)

Martin

--- Bernard Li [EMAIL PROTECTED] wrote:

 Hi Bernd:
 
 I had issues building on SLES9 x64 due to an issue with lib vs lib64
 but I don't think that's your problem.  It said that it cannot find
 -lpng - do you need to installing something like libpng-devel or
 something like that on SLES?
 
 If that's a requirement to build on SLES, I could update the spec
 file after we migrate our code repository from CVS - SVN.
 
 Cheers,
 
 Bernard
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Multicast issue on systems with multiple interfaces

2006-05-18 Thread Martin Knoblauch

Vladimir,

 I am afraid this is broken since 3.0.0 (or when we moved to apr). Matt
wanted to look into it.

Martin

--- Vladimir Vuksan [EMAIL PROTECTED] wrote:


-
  We just upgraded our Gangliacluster to 3.0.3 from 2.5.7. All of the
systems have dual networkinterfaces. Most of the network traffic goes
over eth1 interfacewhereas the control messages etc. go over eth0. In
2.5.7 we specifiedmcast_if to be eth0 and that works well. In 3.0.3
even though eth0 isspecified multicast traffic goes over eth1. Only way
we have been ableto resolve it is to add a manual route for 239.0.0.0.
Any clues aboutthis ?

Vladimir
---Using Tomcat but
need to do more? Need to support web services, security?Get stuff done
quickly with pre-integrated technology to make your job easierDownload
IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimohttp://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642___Ganglia-general
mailing
[EMAIL PROTECTED]://lists.sourceforge.net/lists/listinfo/ganglia-general


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia 3.0.3 compilation on AIX 5.2

2006-05-23 Thread Martin Knoblauch

Hi Knut,

 there is supposed to be a README.AIX file in the 3.0.3 distribution.
This explains a few things.

 Basically, building with xlc is not supported. There are a few hints
on how to do it under 2.)

 And you absolutely need to build non-shared. That is where most likely
your core-dump comes from. Explained under 1)

Cheers
Martin

--- Knut Hellebï¿½ [EMAIL PROTECTED] wrote:

 Regards,
 
 I'm trying to compile Ganglia 3.0.3 on an AIX 5.2 box using the
 native
 IBM compiler and have encountered two problems compiling and one
 fatal
 when running gmond.
 
 Compilation problems:
 
 1. The compilation breaks on the file ./srclib/confuse/src/lexer.c at
 line 786 which stems from the lex file lexer.l line 82:
 
 #line 82 lexer.l
 cfg-line++; /* keep track of line number */
  YY_BREAK
 
 saying undeclared identifier cfg. I put in a cfg_t *cfg;
 declaration
 in line 696 and then the compilation proceeds.
 
 2. Also, I need to use the -qcpluscmt switch allowing C++ comment
 style or else the compilation bombs in gmond.c
 
 3. Running gmond always crashes with a SIGSEGV. The trace shows that
 the
 crash occurs when opening the /etc/gmond.conf file. A dbx session on
 the
 core file shows the crash seems to be related to the parser file
 fix i
 did in section 1. above. Here's the backtrace:
 
 (dbx) where
 cfg_yylex() at 0x1000af28
 cfg_parse_internal() at 0x1000821c
 cfg_parse_fp() at 0x1000a5a0
 cfg_parse() at 0x1000a684
 Ganglia_gmond_config_create() at 0x10006d58
 process_configuration_file() at 0x100036dc
 main() at 0x14b4
 
 What's up here ?
 -- 
 
   
 **
* Knut Hellebï¿½ | DAMN GOOD
COFFEE
 !! *
* Hydro IS Partner ESI (Unix) Team | (and hot too)
   *
*  |  
   *
* E-mail: [EMAIL PROTECTED]   | Dale Cooper, FBI 
   *
   
 **
 
  
 

***
 NOTICE: This e-mail transmission, and any documents, files or
 previous
 e-mail messages attached to it, may contain confidential or
 privileged
 information. If you are not the intended recipient, or a person
 responsible for delivering it to the intended recipient, you are
 hereby notified that any disclosure, copying, distribution or use of
 any of the information contained in or attached to this message is
 STRICTLY PROHIBITED. If you have received this transmission in error,
 please immediately notify the sender and delete the e-mail and
 attached
 documents. Thank you.

***
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] What's the meaning of Cached memory and Buffered memory

2006-05-23 Thread Martin Knoblauch

Hi Yongsheng,

 the meaning of cached/buffered depends on the architecture. If you are
on Linux, cached describes the amount of memory that is used for the
page cache, which usually means the pages used to speed up IO
operations. It will not go down, unless all there is pressure for
memory from other applications.

  buffered (In Linux) counts the pages used for filesystem meta-data
(like directories).

Cheers
Martin


--- Zhao, Yongsheng [EMAIL PROTECTED] wrote:

 
 Hello, 
 
 When my application is running, the Memory cached is going up all
 the way
 to the top. And it does not return when the application is done. Any
 one know
 what is the Memory cached exactly, also what is Memory buffered?
 Thanks.
 
 Yongsheng
 
 -
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

RE: [Ganglia-general] What's the meaning of Cached memory and Buffered memory

2006-05-23 Thread Martin Knoblauch

Hi Yongsheng,

 the only sure way to get it down is reboot. Another way mybe to
unmount/mount all filesystems (which does not work for / :-)

 But there is no need to worry about cached. It will go away
automatically if an application wants the memory. Oh - you could write
an application that mallocs lots of memory (as much as you have). This
will push away the cache. On exit, the application memory is freed.

 But as I said, everything is fine.

Cheers
Martin

--- Zhao, Yongsheng [EMAIL PROTECTED] wrote:

 Hello, Martin:
  
 Thanks for the anwer. We are on Linux. Are there commands or
 utilities which
 can reset the cached memory to its original value? Thanks.
  
 Yongsheng
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] New issue with hosts reporting

2006-06-06 Thread Martin Knoblauch

Hi Mark,

 you have configured a tcp_accept_channel for each of your two clusters
master gmonds?

 Then you may need to define an acl for your gmetad server. Something
like:

tcp_accept_channel {
 port = 8649 
 acl {
default = deny
access {
  ip = ip-of-the-gmetad-server
  mask = 32
  action = allow
}
  }
}

Cheers
Martin

--- Mark Haney [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 David Zaltron wrote:
   Probably you have a gmond configuration on each node that muticast
 the
  cluster status to every node.
 
  For example, if you have a configuration like this in the nodes:
 
  -
  cluster {
 name = dummy_cluster
  }
 
  udp_send_channel {
 mcast_join = 239.2.11.71
 port = 8649
  }
 
  udp_recv_channel {
 mcast_join = 239.2.11.71
 port = 8649
 bind = 239.2.11.71
  }
  
 
  This means that every node know to belong to the dummy_cluster,
 and
  every gmond can return the status of the entire cluster because it
 knows
  about every each other node (talking in the same multicast channel
 with
  each other) if telled at the default 8649 TCP port.
 
  You can find the solution unicasting the traffic between the node 
 itself:
 
  
  udp_send_channel {
 host = hostname of 127.0.0.1
 port = 8649
  }
 
  udp_recv_channel {
 port = 8649
  }
  ---
 
  In this way you can simulate a cluster of a single node,
 monitoring in
  reality the single node.
 
 Okay, I did that and that /sort of/ fixed it, except for now I do not
 see the nodes in my web interface.  Keep in mind the web interface is
 running on a completely separate box that's not either newton or
 winterstar.  So, how do I get the node showing up in the web
 interface now?
 
 (And David, I apologize for sending to you and not the list, my
 fingers
 got ahead of me today.)
 
 
 
 
 - --
 Fere libenter homines id quod volunt credunt.
 
 Mark Haney
 Sr. Systems Administrator
 ERC Broadband
 (828) 350-2415
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.2.2 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
 iD8DBQFEhDXZYQhnfRtc0AIRAj07AJwNaTsNHM02oJaznXnO0qECZEPZUwCfa6JR
 0rLX5KWkRW9MjL/5/J/Igj0=
 =iIJp
 -END PGP SIGNATURE-
 
 
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia History

2006-06-06 Thread Martin Knoblauch

Adam,

 that is unexpected. The RRDs are supposed to keep one year (the
default) of history.

Martin

--- Adam Brust [EMAIL PROTECTED] wrote:

 I recently had to reboot the Front End of my cluster... upon the
 reboot, 
 my Ganglia history is gone... Gangila is only keeping data from the
 time 
 of the reboot... it was nearly a years worth of history... can anyone
 
 offer any suggestions?
 
 thanks,
 
 -adam
 
 
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia History

2006-06-09 Thread Martin Knoblauch

Adam,

 do you still have those error messages? And: which version of the
web-frontend are you using? We fixed quite a few of the php messages in
3.0.3.

Martin

--- Adam Brust [EMAIL PROTECTED] wrote:

 At the beginning of the month, ganglia/php were producing massive 
 amounts of httpd errors which filled up my / partition causing the 
 machine to crash... since then, I believe my ganglia history had been
 
 effected... I tried to restore from the three tar files located in 
 /var/lib/ganglia/archives/  and each one only had about a weeks worth
 of 
 history... I was able to restore from an earlier backup, which has my
 
 previous history, although now I am missing roughly these last three 
 weeks.  Also, I'm not certain if the problem is corrected now... I
 don't 
 know if I'll lose this history again upon a reboot.
 
 -adam
 
 Martin Knoblauch wrote:
 
 Adam,
 
  that sounds OK. Do you see any messages in either /var/log/messages
 or
 in your webservers log files?
 
 Martin
 
 --- Adam Brust [EMAIL PROTECTED] wrote:
 
   
 
 Ian,  Thanks for your reply.
 
 My rrd files appear to in the default /var/lib/ganglia directory, I
 
 could not find any other instances of them.  gmetad is running as 
 nobody and the rrds are owned by nobody... do you know if
 that's
 the 
 correct user/permissions?
 
 thanks,
 
 adam
 
 Ian Cunningham wrote:
 
 
 
 Look at where gmetad is storing the rrd files now. You can find it
   
 
 in 
 
 
 your gmetad.conf under rrd_rootdir. Maybe you didn't specify it
 for
   
 
 
 
 --
 Martin Knoblauch
 email: k n o b i AT knobisoft DOT de
 www:   http://www.knobisoft.de
 
 
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
   
 
 
 
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] not showing all hosts

2006-07-13 Thread Martin Knoblauch



--- Ian Cunningham [EMAIL PROTECTED] wrote:

 
 Solution B:
 increase the Time To Live or ttl on the gmond multicast packets.
 This assumes that multicast packets can get from one vlan to the
 other.

 The configuration option used to be available in the 2.x codebase,
 but I don't see it in 3.0.x code. I think it would be mcast_ttl
 but I can't say if that will work or not.
 

 it is ttl in the udp_send_channel section. It will be used, if
mcast_join is set.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] changed ip

2006-07-24 Thread Martin Knoblauch

Hi Toney,

 my first guess would be that you are:

a) using multicast and
b) your default gateway goes via eth0
c) your compute nodes are on the 192.168.180.x network

 After the change the MC packets are still expected via eth0, but come
in from eth1.

 Try adding this from the documentation:

mcast_if=eth1 in your headnodes gmond.conf and

route add -host 239.2.11.71 dev eth1

Hope this helps
Martin

--- toney samuel [EMAIL PROTECTED] wrote:

 I have a 4 node cluster. my head node has got two gigabit card and
 infiniband card my cluster ip is
 
  eth0  192.168.180.17/255.255.252.0
 ipoib0 192.168.0.1/255.255.255.0
 
 I have installed ganglia with this configuration. ganglia was working
 properly.
 
 later i changed my network configuration to this
 
 eth0  192.168.1.1/255.255.255.0
 eth1  192.168.180.17/255.255.252.0
 ipoib0 192.168.0.1/255.255.255.0
 
 
 Now i can't see any information in my web page
 
 Pls guide how to resolve this issue.
 
 Regards.


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Troubles linking: Linux (SUSE 9.3) on Itanium (ia64, Altix)

2006-08-09 Thread Martin Knoblauch

On a RedHat-ish distro you would need to check that the RPMs for libpng
*and* libpng-devel are installed. Not sure about SuSE though.

Martin

--- Ryurick Marius Hristev [EMAIL PROTECTED] wrote:

 Hello,
 
 I was trying to compile the ganglia package (rpm version) on the
 following system:
 
 SuSE 9.3 (Linux) running on Itaniums (ia64, SGI Altix )
 
  and I am getting this error:
 
 gcc -O0 -I../lib -I../gmond -I../srclib/expat/lib/ -g -O2 -Wall
 -D_REENTRANT -o gmetad gmetad.o cmdline.o data_thread.o server.o
 process_xml.o rrd_helpers.o conf.o type_hash.o xml_hash.o cleanup.o 
 ../lib/.libs/libganglia.a /usr/lib/librrd.a -lpng -lz -lm
 ../srclib/expat/lib/.libs/libexpat.a -ldl -lresolv -lnsl -lpthread
 

/usr/lib/gcc-lib/ia64-suse-linux/3.3.3/../../../../ia64-suse-linux/bin/ld:
 cannot find -lpng
 
 but I do have a /usr/lib/libpng.so.3
 
 Are there any known quirks with respect to my OS/Distro and
 CPU/Machine ? (I am new to this one, apologies if I missed something
 obvious).
 
 TIA
 
 Cheers,
 -- 
 Ryurick M. Hristev -- Systems Administrator (Unix)
 University of Queensland -- ITS Dept.
 mailto: [EMAIL PROTECTED]
 the greatest hacking experience: hack your own mind -- me
 
 

-
 Using Tomcat but need to do more? Need to support web services,
 security?
 Get stuff done quickly with pre-integrated technology to make your
 job easier
 Download IBM WebSphere Application Server v.1.0.1 based on Apache
 Geronimo

http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Obtaining Immediate Interval Data From Ganglia

2006-08-11 Thread Martin Knoblauch

Correct. Below code limits the sampling rate for the cpu*, load*, mem*
and net* graphs. Setting them to 0 will give you 1 second accuracy.
Or nice furry graphs as Richard said (actually the furriness is
what the original authors wanted to prevent :-). Personally I doubt
that sampling load* and mem* at that rate. cpu* and net* may make
sense.

 Richard, yes please file a report. Unfortunatelly I spoke to soon when
I mentioned that we should get rid of the intervalls at all. Reason is
that we need to compute differences for the cpu* and net* metrics (they
are rates after all). If we want to have sub-second sampling rates, we
need to use getimeofday instead of time.

--- [EMAIL PROTECTED] wrote:

 If you do want to do fast polling on the Linux or cygwin gmond, I
 found
 some hardwired code in there which effectively limits the polling
 rate
 for
 some metrics no matter what you put in the config files. (Sorry
 martin,
 have not raised a bug report yet). Anyway:
  the code below is in the cygwin and linux metric.c files.
  
  
  typedef struct {
uint32_t last_read;
uint32_t thresh;
char *name;
char buffer[BUFFSIZE];
  } timely_file;
  
  timely_file proc_stat= { 0, 15, /proc/stat };
  timely_file proc_loadavg = { 0, 15, /proc/loadavg };
  timely_file proc_meminfo = { 0, 30, /proc/meminfo };
  timely_file proc_net_dev = { 0, 30, /proc/net/dev };
  
  char *update_file(timely_file *tf)
  {
int now,rval;
now = time(0);
if(now - tf-last_read  tf-thresh) {
  rval = slurpfile(tf-name, tf-buffer, BUFFSIZE);
  if(rval == SYNAPSE_FAILURE) {
err_msg(update_file() got an error from slurpfile() reading 
  %s,
tf-name);
return (char *)SYNAPSE_FAILURE;
  }
  else tf-last_read = now;
}
return tf-buffer;
  }
  
 
 I have set those timeout values zero, which works well and gives
 me nice spiky furry graphs.
 
 - richard


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] monitoring

2006-08-25 Thread Martin Knoblauch


 Nagios?

Cheers
Martin

--- Dirk Roessler [EMAIL PROTECTED] wrote:

 Does someone knows an easy to install and easy to use solution for 
 monitoring and sending email notifications of down nodes and health 
 state on a Linux HPC cluster?
 
 Dirk
  begin:vcard
 fn;quoted-printable:Dirk R=C3=B6=C3=9Fler
 n;quoted-printable:R=C3=B6=C3=9Fler;Dirk
 org:_University of Potsdam;Department of Geosciences
 adr:;;K.-Liebknecht-Str. 24/25;Golm/Potsdam;;14476;Germany
 email;internet:[EMAIL PROTECTED]
 title:Geophysicist
 tel;work:+49 331 977 5795
 tel;fax:+49 331 977 5700
 x-mozilla-html:FALSE
 url:http://www.geo.uni-potsdam.de/mitarbeiter/Roessler/roessler.html
 version:2.1
 end:vcard
 
 
-
 Using Tomcat but need to do more? Need to support web services,
 security?
 Get stuff done quickly with pre-integrated technology to make your
 job easier
 Download IBM WebSphere Application Server v.1.0.1 based on Apache
 Geronimo

http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia scaling testing?

2006-08-25 Thread Martin Knoblauch

-integrated technology to make 
  your job easier
  Download IBM WebSphere Application Server v.1.0.1 based on 
  Apache Geronimo
  http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057;
  dat=121642
  ___
  Ganglia-general mailing list
  Ganglia-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/ganglia-general
  
 

-
 Using Tomcat but need to do more? Need to support web services,
 security?
 Get stuff done quickly with pre-integrated technology to make your
 job easier
 Download IBM WebSphere Application Server v.1.0.1 based on Apache
 Geronimo

http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] First Snapshot for 3.0.4

2006-08-28 Thread Martin Knoblauch

--- Bernard Li [EMAIL PROTECTED] wrote:

   It is the first release after moving from CVS to SVN. 
  Changes compared
  to 3.0.3 are:
  
  - Fix bz #110 by allowing higher sampling rates for 
  cpu/net/load/mem in
  Linux/Cygwin. Likely needs similar changes in other platforms.
  - Add Yemis Host-Spoofing patch (bz #99)
  - Fix bz #77 (Diskless NFS Root not treated correctly)
  - Compile fixes for IRIX (bz #73/79)
  - Fix locking problems in gmetad (bz #56)
  - Fix incorrect writing of RRDs (bz #105)
  - Increases the number of rows in newly created RRAs (bz #33)
  - Better handling of bonding interfaces in Linux (bz #102/104)
  - Fix for network metrics overrun by Andreas Schoenfeld in AIX
  - SVN related cleanups in distribution targets
  - Take some of the proposed AIX changes from Micheal Perzl. The
 real
  stuff will come in 3.1.x
 
 I would also add:
 
 - Better RPM support for SUSE Linux 10.0/10.1 x86 and x86_64
 
 Cheers,
 
 Bernard
 

 Oops. Sorry. Yes, the list is not neccessarily complete. I should also
have mentioned the generated ChangeLog, which gives some more info.

Martin


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Problem with metrics

2006-09-20 Thread Martin Knoblauch



--- Ben Hartshorne [EMAIL PROTECTED] wrote:

 On Tue, Sep 19, 2006 at 03:11:26PM +0200, Rafal Masztalerz wrote:
  Hi
  I added some new metrics for my ganglia software using the
 gmetric
  command.  When I run the webpage without parameters :
  http://computer/ganglia/ everything seems to be ok and I can choose
 my
  new metrics.
  
  But when I try to do other things on this page, for expample, when
 I
  choose some metric  (bytes_out) then there are no my new metrics
 on
  the new/refreshed page.
 

http://computer/ganglia/?m=bytes_outr=hours=descendingc=comph=sh=1hc=4
  
 
 Rafael,
 
 Be careful that your metric only sends numbers.  In some versions of
 ganglia, if your script that reports the gmetric accidentally sends
 letters instead, Bad Things(tm) happen.  I wrote a script to parse
 the
 output of 'who' to count the number of logged in users, but I did it
 wrong.  Occasionally it got a word instead of a number.  This caused
 unexplained metric-loss throughout my gangila installation.  
 
 A newer version of gmetric fixed this problem, but it is a good place
 to -ben
 
 -- 
 Ben Hartshorne
 email: [EMAIL PROTECTED]
 http://ben.hartshorne.net
 
-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys -- and earn
 cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 

 start looking.
 
 I'm sorry, but I don't remember what versions are affected.
 

 The fix for the gmetric bug went in on 25-Jan-2006. So, it should be
in 3.0.3.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

[Ganglia-general] New/Last Snapshot for 3.0.4

2006-09-24 Thread Martin Knoblauch

Hi,

 please have a look at the 2nd 3.0.4 snapshot located at:

http://www.knobisoft.de/ganglia/ganglia-3.0.4.200609241751.tar.gz

 This snapshot contains the following changes compared to the last one:

- fixup of the corrupted JPG images
- move libmetrics to top-level in order to prepare removal of
external sources in 3.1
- fix a stray debug message going to STDOUT instead of SDTERR
- fix two stupid HP-UX syntax errors reported ages ago

 The full list of Changes is in the ChangeLog. There has not been a lot
of feedback since the first snapshot. If nothing serious comes out
during the next week, I will push out 3.0.4.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] [Ganglia-developers] Ganglia 3.0.4 released

2006-12-26 Thread Martin Knoblauch


--- Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote:

 On Mon, Dec 25, 2006 at 02:32:30AM -0800, Martin Knoblauch wrote:
  Ho ho ho,
  
  Santa just released version 3.0.4 of Ganglia. This is mainly a
 bugfix
  release. See the ChangeLog in the tarball for a complete list of
  changes.
 
 thanks Santa, and I got to be the first kid that went to the
 sourceforge
 tree for the nicely wrapped package :) which was far nicer than that
 Wii that
 Matt is probably still waiting to get a hold of.
 
 since I was running tests on the last SVN anyway, I got some more
 platforms
 where gmond/gmetric (and therefore libmetrics) were tested (*):
 
 * Gentoo Linux 2006.1 (amd64), Fedora Core 6 (i386)
 * Solaris 9 (sparc), Solaris 10 (i386, amd64 and sparc)
 * NetBSD 2.0.2 (i386), NetBSD 3.0 (i386), NetBSD 3.1 (i386, amd64)
 * FreeBSD 6.1 (amd64)
 
Hi Carlo,

 thanks for the feedback. Could you just tell us which toolchains were
used on the non-Linux platforms? Especially which compiler?

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia+OpenBSD?

2006-12-26 Thread Martin Knoblauch

Jason,

 apparently configure fails to realize that you are on OpenBSD, which
is not supported currently. The unknown part is telling.

 In order to support OpenBSD one needs to fix the recognition process
in configure and add OpenBSD-specific metrics code to libmetrics.

 So I am afraid that it is not as easy as you believe.

 Btw. what is the output of config/config.guess?

Cheers
Martin

--- Jason Faulkner [EMAIL PROTECTED] wrote:

 Anybody have even a direction to point me in? I'm at my wits end.
 
 Jason Faulkner wrote:
  I've been trying all morning (about 5 hours now, heh) to get
 Ganglia 
  3.0.3 to compile on OpenBSD to no avail.
 
  Here's the error it spits at me:
 
  ./configure --prefix=/opt ran without a hitch, but when I said
 make...
 
  /bin/sh ../libtool --tag=CC --mode=link /usr/bin/gcc -I.. -I. 
  -I../srclib/expat/lib/ -I../srclib/libmetrics/
 -I../srclib/apr/include/ 
  -I../srclib/apr/include/arch/unix/ -I../srclib/confuse/src -g -O2 
  -Wall-o libganglia.la -rpath /opt/lib -version-info 0:0:0 
 -release 
  3.0.3  -export-dynamic become_a_nobody.lo debug_msg.lo 
 daemon_init.lo 
  file.lo dotconf.lo error.lo ganglia.lo hash.lo  inetaddr.lo
 llist.lo 
  my_inet_ntop.lo rdwr.lo readdir.lo tcp.lo  protocol_xdr.lo
 apr_net.lo 
  libgmond.lo  -lkvm -lresolv -lpthread
 
  *** Warning: linker path does not have real file for library
 -lresolv.
  *** I have the capability to make that library automatically link
 in when
  *** you link to this library.  But I can only do this if you have a
  *** shared version of the library, which you do not appear to have
  *** because I did check the linker path looking for a file starting
  *** with libresolv and none of the candidates passed a file format
 test
  *** using a regex pattern. Last file checked: /usr/lib//libresolv.a
  *** The inter-library dependencies that have been dropped here will
 be
  *** automatically added whenever a program is linked with this
 library
  *** or is declared to -dlopen it.
  /usr/bin/gcc -shared  -fPIC -DPIC -o .libs/libganglia-3.0.3.so.0.0 
 
  .libs/become_a_nobody.o .libs/debug_msg.o .libs/daemon_init.o 
  .libs/file.o .libs/dotconf.o .libs/error.o .libs/ganglia.o
 .libs/hash.o 
  .libs/inetaddr.o .libs/llist.o .libs/my_inet_ntop.o .libs/rdwr.o 
  .libs/readdir.o .libs/tcp.o .libs/protocol_xdr.o .libs/apr_net.o 
  .libs/libgmond.o  -lkvm -lpthread
  (cd .libs  rm -f libganglia.so.0.0  ln -s
 libganglia-3.0.3.so.0.0 
  libganglia.so.0.0)
  ar cru .libs/libganglia.a  become_a_nobody.o debug_msg.o
 daemon_init.o 
  file.o dotconf.o error.o ganglia.o hash.o inetaddr.o llist.o 
  my_inet_ntop.o rdwr.o readdir.o tcp.o protocol_xdr.o apr_net.o
 libgmond.o
  ranlib .libs/libganglia.a
  creating libganglia.la
  (cd .libs  rm -f libganglia.la  ln -s ../libganglia.la
 libganglia.la)
  Making all in srclib
  Making all in libmetrics
  make  all-recursive
  Making all in unknown
  /bin/sh: cd: /usr/src/ganglia-3.0.3/srclib/libmetrics/unknown - No
 such 
  file or directory
  *** Error code 1
 
  Stop in /usr/src/ganglia-3.0.3/srclib/libmetrics (line 342 of
 Makefile).
  *** Error code 1
 
  Stop in /usr/src/ganglia-3.0.3/srclib/libmetrics (line 204 of
 Makefile).
  *** Error code 1
 
  Stop in /usr/src/ganglia-3.0.3/srclib (line 243 of Makefile).
  *** Error code 1
 
  Stop in /usr/src/ganglia-3.0.3 (line 332 of Makefile).
  *** Error code 1
 
  Stop in /usr/src/ganglia-3.0.3 (line 214 of Makefile).
 
 
 
 
  This is on OpenBSD 3.8.
 

 
 
 -- 
 Jason Faulkner
 Systems Manager
 Broadwick Corporation
 (919) 459-2509
 
 

-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys - and earn
 cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia+OpenBSD?

2006-12-26 Thread Martin Knoblauch


--- Jason Faulkner [EMAIL PROTECTED] wrote:

 
  http://j.oldos.org/configguess.txt

 I feel less than smart.
 
 You wanted this, didn't you:


:-)
 
 [EMAIL PROTECTED]:/usr/src/ganglia-3.0.3/config$ ./config.guess
 i386-unknown-openbsd3.8
 

 guess this explains the unknown. But from the other follow-ups there
seems to be hope for you.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia+OpenBSD?

2006-12-27 Thread Martin Knoblauch


--- Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote:

 On Tue, Dec 26, 2006 at 02:38:01PM -0500, Jason Faulkner wrote:
  Ooops -- sent first email directly to Martin instead of list.
  
  Martin Knoblauch wrote:
   Jason,
  
apparently configure fails to realize that you are on OpenBSD,
 which
   is not supported currently. The unknown part is telling.
  
 
  I thought that might be the case.
  
   In order to support OpenBSD one needs to fix the recognition
 process
   in configure and add OpenBSD-specific metrics code to
 libmetrics.
 
  I'm confused though, according to this page: 
  http://sourceforge.net/projects/ganglia/ ganglia runs on all
 openbsd 
  platforms. I was going on the, apparently false, presumption that
 this 
  meant the libmetrics code already existed for openbsd.
 
 not in 3.0.4, but I have a rough version that will be hopefully
 merged for
 3.0.5 and that so far compiles and works (not all metrics though) in
 the hosts
 i have to test:
 
   OpenBSD 3.7 (i386)
   OpenBSD 4.0 (i386 and amd64))
 
  IANAP, but if there's anything I can do to help get this working on
 
  OpenBSD, let me know.
 
 what versions/arch are you interested on?, would you be able to
 deploy test
 snapshots of ganglia on them?
 
 Carlo
 
Carlo,

 I see no problem to add OpenBSD support in 3.0.5. Just go on and check
it in once you are satisfied with your stuff.

 Just out of curiosity: how similar are the BSD flavours? We already
have NetBSD and FreeBSD support in.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] gmond problem on SLES 10 x64 with floats

2006-12-27 Thread Martin Knoblauch

Hi Ludovic,

 do you happen to have some stange/unusual setting of your locale
(LANG variable and friends) when you start the gmond executable?

 The output definitely looks broken. Could you please file a bug on
bugzilla?

Cheers
Martin
--- Ludovic Drolez [EMAIL PROTECTED] wrote:

 Hi !
 
 I installed the official Ganglian RPM on a SLES 10 x64. My graphs are
 really 
 strange, and the percentage values show random characters. I've just
 found 
 that the problem is in gmond, which sends random strings in the XML
 dialog. 
 I've tried to recompile gmond, but I have still the same problem.
 
 Here's some of the strace output:
 
 =
 accept(6, {sa_family=AF_INET, sin_port=htons(43998), 
 sin_addr=inet_addr(127.0.0.1)}, [17179869200]) = 9
 write(9, ?xml version=\1.0\ encoding=\ISO-8859-1\ 
 standalone=\yes\?\n!DOCTYPE GANGLIA_XML [\n   !ELEMENT G...,
 2328) = 2328
 write(9, GANGLIA_XML VERSION=\3.0.3\ SOURCE=\gmond\\n, 45) =
 45
 write(9, CLUSTER NAME=\cluster\ LOCALTIME=\1166087533\ 
 OWNER=\unspecified\ LATLONG=\unspecified\ URL=\unspe..., 108) =
 108
 write(9, HOST NAME=\master.localdomain\ IP=\192.168.0.106\ 
 REPORTED=\1166087527\ TN=\5\ TMAX=\20\ DMAX=\0\ ..., 150) =
 150
 write(9, METRIC NAME=\disk_total\ VAL=\1A.\332\326\260\
 TYPE=\double\ 
 UNITS=\GB\ TN=\1500\ TMAX=\1200\ DMAX=\0\ SLOP..., 125) =
 125
 write(9, METRIC NAME=\cpu_speed\ VAL=\2993\ TYPE=\uint32\ 
 UNITS=\MHz\ TN=\300\ TMAX=\1200\ DMAX=\0\ SLOPE=\..., 122)
 = 122
 write(9, METRIC NAME=\part_max_used\ VAL=\7y.\n\ TYPE=\float\
 
 UNITS=\\ TN=\60\ TMAX=\180\ DMAX=\0\ SLOPE=\bo..., 120) =
 120
 write(9, METRIC NAME=\swap_total\ VAL=\4194296\ TYPE=\uint32\
 
 UNITS=\KB\ TN=\300\ TMAX=\1200\ DMAX=\0\ SLOP..., 125) = 125
 write(9, METRIC NAME=\os_name\ VAL=\Linux\ TYPE=\string\
 UNITS=\\ 
 TN=\300\ TMAX=\1200\ DMAX=\0\ SLOPE=\zero..., 118) = 118
 write(9, METRIC NAME=\cpu_user\ VAL=\2.F\ TYPE=\float\
 UNITS=\%\ 
 TN=\20\ TMAX=\90\ DMAX=\0\ SLOPE=\both\ SO..., 114) = 114
 write(9, METRIC NAME=\cpu_system\ VAL=\3.0\ TYPE=\float\
 UNITS=\%\ 
 TN=\20\ TMAX=\90\ DMAX=\0\ SLOPE=\both\ ..., 116) = 116
 =
 
 As you can see, there's garbage for disk_total, part_max_used,
 cpu_user...
 So all values of type float or double, are not properly converted.
 The SLES runs under Qemu.
 
 I've also added some printfs in the host_metric_value and here's what
 I get:
 On the left the float converted by apr_* and on the right the
 prinf(%f) !!!
 
 VALUE =2.G= =2.343750=
 VALUE =2.G= =2.343750=
 VALUE =9.Ö= =93.487236=
 VALUE =0.6o= =0.64=
 VALUE =0.1;= =0.119600=
 VALUE =0.00= =0.000311=
 VALUE =0.0= =0.00=
 VALUE =0.0= =0.00=
 VALUE =9.ê= =95.312500=
 VALUE =0.9= =0.94=
 VALUE =0.4Y= =0.42=
 VALUE =0.1;= =0.113054=
 VALUE =0.00= =0.000536=
 
 
 Any ideas ?
 
 Cheers,
 
 -- 
 Ludovic DROLEZ  Linbox / FreeALter Soft
 www.linbox.com www.linbox.org
 
 
-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys - and earn
 cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] [Ganglia-developers] Correct counting of CPUs, Cores, Siblings (bz #84)

2006-12-27 Thread Martin Knoblauch

Hi Jarod,

 thanks. Your and Bens input were really useful for detecting patterns
in 2.6 based configurations.

 What I now need is the output from 2.4 based configs. Only multi-core
and/or HT-enabled systems actually.

Thanks and have a Godd new Year 2007
Martin
--- Jarod Wilson [EMAIL PROTECTED] wrote:

 On Friday 22 December 2006 11:05, Martin Knoblauch wrote:
  Hi Folks,
 
   in order to fix bz#84 for Linux, I would like to collect some data
  from different system configurations. Could you please create the
 file
  cpu.grep and execute the cat/grep chain below.
 
   Please report the results together with uname -a output which
 distro
  you are running.
 
  # more cpu.grep
  processor
  vendor
  model name
  physical id
  siblings
  core id
  cpu cores
  # cat /proc/cpuinfo  | grep -f cpu.grep
 
 Here's the data from my Fedora Core 6 workstation in the office,
 since its 
 fairly interesting for this specific topic. Its a dual-socket,
 dual-core Xeon 
 system with hyperthreading turned on, so two sockets, four cores,
 eight 
 logical cpus...
 
 Linux xavier.boston.redhat.com 2.6.18-1.2849.fc6 #1 SMP Fri Nov 10
 12:34:46 
 EST 2006 x86_64 x86_64 x86_64 GNU/Linux
 
 processor   : 0
 vendor_id   : GenuineIntel
 model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
 physical id : 0
 siblings: 4
 core id : 0
 cpu cores   : 2
 processor   : 1
 vendor_id   : GenuineIntel
 model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
 physical id : 1
 siblings: 4
 core id : 0
 cpu cores   : 2
 processor   : 2
 vendor_id   : GenuineIntel
 model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
 physical id : 0
 siblings: 4
 core id : 1
 cpu cores   : 2
 processor   : 3
 vendor_id   : GenuineIntel
 model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
 physical id : 1
 siblings: 4
 core id : 1
 cpu cores   : 2
 processor   : 4
 vendor_id   : GenuineIntel
 model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
 physical id : 0
 siblings: 4
 core id : 0
 cpu cores   : 2
 processor   : 5
 vendor_id   : GenuineIntel
 model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
 physical id : 1
 siblings: 4
 core id : 0
 cpu cores   : 2
 processor   : 6
 vendor_id   : GenuineIntel
 model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
 physical id : 0
 siblings: 4
 core id : 1
 cpu cores   : 2
 processor   : 7
 vendor_id   : GenuineIntel
 model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
 physical id : 1
 siblings: 4
 core id : 1
 cpu cores   : 2
 
 
 -- 
 Jarod Wilson
 [EMAIL PROTECTED]
 
-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys - and earn
 cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
 Ganglia-developers mailing list
 [EMAIL PROTECTED]
 https://lists.sourceforge.net/lists/listinfo/ganglia-developers
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia+OpenBSD?

2006-12-27 Thread Martin Knoblauch


--- Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote:

 On Wed, Dec 27, 2006 at 12:38:00AM -0800, Martin Knoblauch wrote:
 
   I see no problem to add OpenBSD support in 3.0.5. Just go on and
 check
  it in once you are satisfied with your stuff.
 
 checked it in already in revision 697.


 saw it.
 
 
   Just out of curiosity: how similar are the BSD flavours? We
 already
  have NetBSD and FreeBSD support in.
 
 I used NetBSD as a base from my port (as it is the closest), sadly
 they are not that similar as to just work with the other source
 as you can see by the diff.


 Understand. Btw. you should check the use of the strings NetBSD /
FreeBSD in you patch :-)

 DragonflyBSD will be most likely closer to FreeBSD and the same for
 MacOS X (AKA Darwin), but I have no interest on adding those yet
 (DragonFlyBSD could be an interesting option for clusters, but
 I'd heard of no one using it in a cluster yet).
 

 You realize that we already have a Darwin port, although I do not know
the quality/completeness of the metrics code.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Compatibility mode for gmetad?

2007-01-03 Thread Martin Knoblauch


--- Jason Faulkner [EMAIL PROTECTED] wrote:

 I'm curious about how possible or difficult it would be to make
 gmetad 
 backwards compatible -- i.e. where I could leave my 2.5.x gmond 
 installations alone, and install 3.x gmetad on my main server (and be
 
 able to collect stats despite having a heterogeneous 2.5.x and 3.x 
 environment). This would allow me to (hopefully) live-migrate my
 ganglia 
 install up to the new version.
 
 -- 
 Jason Faulkner
 Systems Manager
 Broadwick Corporation
 (919) 459-2509
 
Hi Jason,

 although we bumped the major number in the 2.5.x - 3.0 transition, we
took care to not introduce incompatible changes to the core metrics
framework. In short, I see no reason why a 3.0.4 gmetad should not be
able to query 2.5.x gmond data.

 It should even be possible to have a 3.0.4 gmond listen to older
gmonds. Of course, you are limited to multicast until you have replaced
all gmonds.

 Just try it out.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Windows port issues

2007-01-04 Thread Martin Knoblauch


--- Vladimir Vuksan [EMAIL PROTECTED] wrote:

 matt massie wrote:
  you need to install the cygwin sunrpc package which is not
 installed by
  default during the cygwin install...

 That was it.
 
 I still wasn't able to compile 3.0.4 (xdr_create? can't be find)  
 however 3.0.3 compiles with no problem.


 could you be more specific on the error message? Is it compile time,
or link time? There is no such thing as xdr_create. Maybe
xdrmem_create.
 
 Who is the person that packaged it initially since 3.0.3 corrects the
 
 Wait CPU issue ie. instead of showing 100% idle shows 100% Wait CPU.
 
 Also it may be nice to include gmetric.
 

 Hmm. What package are you refering to? There is no official windows
(cygwin) binary distribution.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Compatibility mode for gmetad?

2007-01-04 Thread Martin Knoblauch


--- Jason Faulkner [EMAIL PROTECTED] wrote:

 Martin Knoblauch wrote:
  --- Jason Faulkner [EMAIL PROTECTED] wrote:
 

  I'm curious about how possible or difficult it would be to make
  gmetad 
  backwards compatible -- i.e. where I could leave my 2.5.x gmond 
  installations alone, and install 3.x gmetad on my main server (and
 be
 
  able to collect stats despite having a heterogeneous 2.5.x and 3.x
 
  environment). This would allow me to (hopefully) live-migrate my
  ganglia 
  install up to the new version.
 
  -- 
  Jason Faulkner
  Systems Manager
  Broadwick Corporation
  (919) 459-2509
 
  
  Hi Jason,
 
   although we bumped the major number in the 2.5.x - 3.0
 transition, we
  took care to not introduce incompatible changes to the core metrics
  framework. In short, I see no reason why a 3.0.4 gmetad should not
 be
  able to query 2.5.x gmond data.
 
   It should even be possible to have a 3.0.4 gmond listen to older
  gmonds. Of course, you are limited to multicast until you have
 replaced
  all gmonds.

 Jan  3 23:12:07 intranet1 ./gmetad[25006]: RRD_update 
 (/var/lib/ganglia/rrds/Dev Login 
 Servers/__SummaryInfo__/part_max_used.rrd): illegal attempt to update
 
 using time 1167883927 when last update time is 1167883927 (minimum
 one 
 second step)
 
 I've been receiving repeated errors like this attempting to use a
 3.0.x 
 gmetad with a 2.5.7 gmond. The times are synced perfectly to a local
 NTP 
 server, so I'm sure that's not the issue.
 

 Not an NTP issue, you are most likely right. The message tells that
the current timestamp for the metrics in question did not change from
the previous invocation of the call. 

 Does this only happen on part_max_used, or are other metrics showing
up as well? part_max_used is likely changeing very slow, this might be
an indicator. also interesting to note that in your example the metrics
is not a host, but a summary metrics.

 Does it prevent useful operation of the 3.0.x gmetad together with
2.5.7 gmonds? Or is it just annoying?

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Windows port issues

2007-01-04 Thread Martin Knoblauch


--- Vladimir [EMAIL PROTECTED] wrote:

 Martin Knoblauch wrote:
   could you be more specific on the error message? Is it compile
 time,
  or link time? There is no such thing as xdr_create. Maybe
  xdrmem_create.
 Sorry I should have been more precise. It is a linking error. Here is
 
 the log
 
 gmond.o: In function `Ganglia_collection_group_send':
 /ganglia-3.0.4/gmond/gmond.c:1633: undefined reference to
 `_xdrmem_create'
 gmond.o: In function `main':
 /ganglia-3.0.4/gmond/gmond.c:897: undefined reference to
 `_xdrmem_create'
 /ganglia-3.0.4/gmond/gmond.c:828: undefined reference to
 `_xdr_free'
 /ganglia-3.0.4/gmond/gmond.c:912: undefined reference to
 `_xdr_free'
 ../lib/.libs/libganglia.a(libgmond.o): In function
 `Ganglia_gmetric_send':
 /ganglia-3.0.4/lib/libgmond.c:695: undefined reference to
 `_xdrmem_create'
 ../lib/.libs/libganglia.a(libgmond.o): In function
 `Ganglia_gmetric_send_spoof':
 /ganglia-3.0.4/lib/libgmond.c:748: undefined reference to
 `_xdrmem_create'
 ../lib/.libs/libganglia.a(protocol_xdr.o): In function
 `xdr_Ganglia_value_types':
 /ganglia-3.0.4/lib/protocol_xdr.c:13: undefined reference to
 `_xdr_enum'
 ../lib/.libs/libganglia.a(protocol_xdr.o): In function
 `xdr_Ganglia_gmetric_message':
 /ganglia-3.0.4/lib/protocol_xdr.c:23: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:25: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:27: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:29: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:31: undefined reference to
 `_xdr_u_int'
 /ganglia-3.0.4/lib/protocol_xdr.c:33: undefined reference to
 `_xdr_u_int'
 /ganglia-3.0.4/lib/protocol_xdr.c:35: undefined reference to
 `_xdr_u_int'
 ../lib/.libs/libganglia.a(protocol_xdr.o): In function
 `xdr_Ganglia_spoof_header':
 /ganglia-3.0.4/lib/protocol_xdr.c:45: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:47: undefined reference to
 `_xdr_string'
 ../lib/.libs/libganglia.a(protocol_xdr.o): In function
 `xdr_Ganglia_message_formats':
 /ganglia-3.0.4/lib/protocol_xdr.c:69: undefined reference to
 `_xdr_enum'
 ../lib/.libs/libganglia.a(protocol_xdr.o): In function
 `xdr_Ganglia_message':
 /ganglia-3.0.4/lib/protocol_xdr.c:116: undefined reference to
 `_xdr_u_int'
 /ganglia-3.0.4/lib/protocol_xdr.c:124: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:151: undefined reference to
 `_xdr_float'
 /ganglia-3.0.4/lib/protocol_xdr.c:156: undefined reference to
 `_xdr_double'
 /ganglia-3.0.4/lib/protocol_xdr.c:95: undefined reference to
 `_xdr_u_short'
 ../lib/.libs/libganglia.a(protocol_xdr.o): In function
 `xdr_Ganglia_25metric':
 /ganglia-3.0.4/lib/protocol_xdr.c:170: undefined reference to
 `_xdr_int'
 /ganglia-3.0.4/lib/protocol_xdr.c:172: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:174: undefined reference to
 `_xdr_int'
 /ganglia-3.0.4/lib/protocol_xdr.c:178: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:180: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:182: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:184: undefined reference to
 `_xdr_int'
 collect2: ld returned 1 exit status
 make[3]: *** [gmond.exe] Error 1
 make[2]: *** [all-recursive] Error 1
 make[1]: *** [all-recursive] Error 1
 make: *** [all] Error 2
 

 OK, seems ld is unable to find all of the xdr functions. Maybe
someone removed a library from the library list. Although under Linux
those functions are in libc.

 
   Hmm. What package are you refering to? There is no official
 windows
  (cygwin) binary distribution.

 Perhaps it is unofficial but it is on SourceForge e.g.
 

http://downloads.sourceforge.net/ganglia/ganglia-3.0.0-setup.exe?modtime=1107790662big_mirror=0
 

 Ah. I forgot about this one. And I do not recall who donated the work.
I am adding the developers list. Apparently, the installer was never
updated after the initial release.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Two similar linux hosts provides different metrics

2007-01-11 Thread Martin Knoblauch

Hi Vitaly,

 what does ps axl show on both hosts, as that is basically what gmond
looks at? If it is already different there, the problem is not
ganglia related. (OK, I see you already checked ...)

 What are the load averages according to uptime?

Cheers
Martin


--- Vitaly Karasik [EMAIL PROTECTED] wrote:

   Hi,
 
 I have a weird problem - two linux hosts with similar configuration
 provide very different metrics about  number of running processes -
 one
 shows about 2, and second about 20-40 (I speak about concentrated
 load
 graph at top right.) 
 proc_total is different too - 171 vs. 350 (BTW,  ps -ef |wc == 61 on
 both boxes)
 
 Both machines are RHEL3 kernel 2.4.21-37.ELsmp with
 ganglia-gmond-3.0.3-1 installed from RPM.
 
 Any ideas?
 Thanks,
 Vitaly
 
  
 

-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys - and earn
 cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Two similar linux hosts provides different metrics

2007-01-15 Thread Martin Knoblauch

Hi Vitaly,

 where do you see the invalid numbers:

a) in the gmond XML Stream (telnet/nc to the gmond XML port)
b) in the XML Stream from gmetad (telnet/nc to the gmetad XML port)
c) only in the web-frontend

Cheers
Martin
--- Vitaly Karasik [EMAIL PROTECTED] wrote:

 NON-BUSY HOST:
 # ps axl|wc
  61 8625865
 # uptime
  08:54:55  up 204 days,  2:00,  1 user,  load average: 0.00, 0.00,
 0.00
 
 BUSY HOST 
  ]# ps axl|wc
  62 8775977
  ]# uptime
  08:55:18  up 31 days, 16:30,  1 user,  load average: 0.04, 0.01,
 0.00
  
 
  -Original Message-
  From: Martin Knoblauch [mailto:[EMAIL PROTECTED] 
  Sent: Thursday, January 11, 2007 10:54 AM
  To: Vitaly Karasik; ganglia-general@lists.sourceforge.net
  Subject: Re: [Ganglia-general] Two similar linux hosts 
  provides different metrics
  
  Hi Vitaly,
  
   what does ps axl show on both hosts, as that is basically 
  what gmond looks at? If it is already different there, the 
  problem is not ganglia related. (OK, I see you already checked
 ...)
  
   What are the load averages according to uptime?
  
  Cheers
  Martin
  
  
  --- Vitaly Karasik [EMAIL PROTECTED] wrote:
  
 Hi,
   
   I have a weird problem - two linux hosts with similar
 configuration 
   provide very different metrics about  number of running processes
 - 
   one shows about 2, and second about 20-40 (I speak about 
  concentrated 
   load
   graph at top right.)
   proc_total is different too - 171 vs. 350 (BTW,  ps -ef |wc 
  == 61 on 
   both boxes)
   
   Both machines are RHEL3 kernel 2.4.21-37.ELsmp with
   ganglia-gmond-3.0.3-1 installed from RPM.
   
   Any ideas?
   Thanks,
   Vitaly
   

   
  
  --
  ---
   Take Surveys. Earn Cash. Influence the Future of IT Join 
   SourceForge.net's Techsay panel and you'll get the chance to
 share 
   your opinions on IT  business topics through brief surveys 
  - and earn 
   cash
  
  http://www.techsay.com/default.php?page=join.phpp=sourceforge
  CID=DEVDEV
   ___
   Ganglia-general mailing list
   Ganglia-general@lists.sourceforge.net
   https://lists.sourceforge.net/lists/listinfo/ganglia-general
   
   
  
  
  --
  Martin Knoblauch
  email: k n o b i AT knobisoft DOT de
  www:   http://www.knobisoft.de
  
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] PBS Queue visualisation

2007-01-16 Thread Martin Knoblauch

Adam,

 look at the report/compound graphs in web/graph.php They should
basically do what you want.

Cheers
Martin
--- Adam Gray [EMAIL PROTECTED] wrote:

 I'm running ganglia on a cluster managed with OpenPBS. I have made a
 few
 extra metrics for monitoring CPU temp and batch system jobs on each
 node. I was wondering how I could go about making a sort of cluster
 queue usage graph. Each queue would pile on top of each other the
 number
 of nodes it is using.
 
 E.g. if queue1 was using 24 of 124 available nodes, and queue2 was
 using
 96, there would be a section at the bottom 20% and a different
 colored
 section on the next 75%, and the top 5% would be empty.
 
-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys - and earn
 cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] XML error: no element found at 1

2007-01-16 Thread Martin Knoblauch

Ashutok,

 you need to do a query if you use port 8562 (the web interface
does). What happens if you do telnet localhost 8561. That should give
you the complete gmetad XML stream.

 Is the rrdroot directory writable to the owner of the gmetad
process? It should belong to e.g. nobody. This is a common mistake.

cheers
Martin
--- Ashutosh Mahajan [EMAIL PROTECTED] wrote:

 hello everyone,
We are having problems installing ganglia version 3.0.4 with
 rrdtool-1.2.15.
 we can successfully do make, make install. gstat -a also seems to
 work.
 telnet localhost 8649 seems to throw out correct XML file. However,
 gmetad
 seems to be having some problems.
 
 telnet localhost 8652 seems to hang forever with the message:
 Trying 127.0.0.1...
 Connected to localhost.
 Escape character is '^]'.
 
 if i access ganglia through the web, i get this message after a long 
 
 long time:
 There was an error collecting ganglia data (192.168.1.1:8652): XML
 error: no
 element found at 1
 
 rrd_rootdir also remains empty. what could be wrong? i can provide
 more
 details if necessary.
 
 thanks in advance.
 -- 
 Regards
 Ashutosh
 www.lehigh.edu/~asm4
 
 
 
 This message was sent using IMP, the Internet Messaging Program.
 
 

-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys - and earn
 cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Two similar linux hosts provides different metrics

2007-01-16 Thread Martin Knoblauch

Vitaly,

 in this case try to run gmond with a debug level higher that 2.
Maybe this sheds some light on it.

 Or, you could add debug statements to the proc_run_func and
proc_total_func code.

 But: first of all show us the output of cat /proc/loadavg on both
nodes.

cheers
Martin
--- Vitaly Karasik [EMAIL PROTECTED] wrote:

 It seems like we have different numbers in gmond:
 
 HOST NAME=5.5.5.5 IP=5.5.5.5 REPORTED=1168934873 TN=2
 TMAX=20
 DMAX=0 LOCATION=unspecified GMOND_STARTED=1166534354 
 ..
 METRIC NAME=proc_total VAL=185 TYPE=uint32 UNITS= TN=229
 TMAX=950 DMAX=0 SLOPE=both SOURCE=gmond/
 ..
 METRIC NAME=proc_run VAL=0 TYPE=uint32 UNITS= TN=229
 TMAX=950 DMAX=0 SLOPE=both SOURCE=gmond/
 
 
 HOST NAME=5.5.5.6 IP=5.5.5.6 REPORTED=1168934871 TN=3
 TMAX=20
 DMAX=0 LOCATION=unspecified GMOND_STARTED=1166534349
 
 METRIC NAME=proc_run VAL=15 TYPE=uint32 UNITS= TN=68
 TMAX=950 DMAX=0 SLOPE=both SOURCE=gmond/
 
 METRIC NAME=proc_total VAL=439 TYPE=uint32 UNITS= TN=68
 TMAX=950 DMAX=0 SLOPE=both SOURCE=gmond/
 
 Thanks,
 Vitaly
 
  -Original Message-
  From: Martin Knoblauch [mailto:[EMAIL PROTECTED] 
  Sent: Monday, January 15, 2007 12:30 PM
  To: Vitaly Karasik; ganglia-general@lists.sourceforge.net
  Subject: RE: [Ganglia-general] Two similar linux hosts 
  provides different metrics
  
  Hi Vitaly,
  
   where do you see the invalid numbers:
  
  a) in the gmond XML Stream (telnet/nc to the gmond XML port)
  b) in the XML Stream from gmetad (telnet/nc to the gmetad XML port)
  c) only in the web-frontend
  
  Cheers
  Martin
  --- Vitaly Karasik [EMAIL PROTECTED] wrote:
  
   NON-BUSY HOST:
   # ps axl|wc
61 8625865
   # uptime
08:54:55  up 204 days,  2:00,  1 user,  load average: 0.00,
 0.00, 
   0.00
   
   BUSY HOST
]# ps axl|wc
62 8775977
]# uptime
08:55:18  up 31 days, 16:30,  1 user,  load average: 0.04, 
  0.01, 0.00

   
-Original Message-
From: Martin Knoblauch [mailto:[EMAIL PROTECTED] 
Sent: Thursday, January 11, 2007 10:54 AM
To: Vitaly Karasik; ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] Two similar linux hosts 
provides different metrics

Hi Vitaly,

 what does ps axl show on both hosts, as that is basically 
what gmond looks at? If it is already different there, the 
problem is not ganglia related. (OK, I see you already
 checked
   ...)

 What are the load averages according to uptime?

Cheers
Martin


--- Vitaly Karasik [EMAIL PROTECTED] wrote:

   Hi,
 
 I have a weird problem - two linux hosts with similar
   configuration 
 provide very different metrics about  number of running 
  processes
   - 
 one shows about 2, and second about 20-40 (I speak about 
concentrated 
 load
 graph at top right.)
 proc_total is different too - 171 vs. 350 (BTW,  ps -ef |wc 
== 61 on 
 both boxes)
 
 Both machines are RHEL3 kernel 2.4.21-37.ELsmp with
 ganglia-gmond-3.0.3-1 installed from RPM.
 
 Any ideas?
 Thanks,
 Vitaly
 
  
 

--
---
 Take Surveys. Earn Cash. Influence the Future of IT Join 
 SourceForge.net's Techsay panel and you'll get the chance to
   share 
 your opinions on IT  business topics through brief surveys 
- and earn 
 cash

http://www.techsay.com/default.php?page=join.phpp=sourceforge
CID=DEVDEV
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

   
   
  
  
  --
  Martin Knoblauch
  email: k n o b i AT knobisoft DOT de
  www:   http://www.knobisoft.de
  
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Two similar linux hosts provides different metrics

2007-01-16 Thread Martin Knoblauch

Vitaly,

 gmond on Linux just interprets the fourth filed of /proc/loadavg. The
number in front of the slash is the number of running processes, the
number following the slash is the total number of processes.

Cheers
Martin
 
--- Vitaly Karasik [EMAIL PROTECTED] wrote:

 .5:
  cat /proc/loadavg
 0.04 0.06 0.01 1/185 10512
 
 .6:  cat /proc/loadavg
 1.03 1.01 1.00 2/441 19965 
 
 Oops! I think I'm starting to  understand - number of processes on
 both
 machines are the same, but number the threads are different. probably
 gmond counts threads, not processes:
 
 .5: ps -ef|wc
  64
  ps -efm|wc
 187
 
 .6:
   ps -ef|wc
  62 
   ps -efm|wc
 441   
 
 
  -Original Message-
  From: Martin Knoblauch [mailto:[EMAIL PROTECTED] 
  Sent: Tuesday, January 16, 2007 11:59 AM
  To: Vitaly Karasik; [EMAIL PROTECTED]; 
  ganglia-general@lists.sourceforge.net
  Subject: RE: [Ganglia-general] Two similar linux hosts 
  provides different metrics
  
  Vitaly,
  
   in this case try to run gmond with a debug level higher that 2.
  Maybe this sheds some light on it.
  
   Or, you could add debug statements to the proc_run_func and 
  proc_total_func code.
  
   But: first of all show us the output of cat /proc/loadavg 
  on both nodes.
  
  cheers
  Martin
  --- Vitaly Karasik [EMAIL PROTECTED] wrote:
  
   It seems like we have different numbers in gmond:
   
   HOST NAME=5.5.5.5 IP=5.5.5.5 REPORTED=1168934873 TN=2
   TMAX=20
   DMAX=0 LOCATION=unspecified GMOND_STARTED=1166534354
 ..
   METRIC NAME=proc_total VAL=185 TYPE=uint32 UNITS=
 TN=229
   TMAX=950 DMAX=0 SLOPE=both SOURCE=gmond/ ..
   METRIC NAME=proc_run VAL=0 TYPE=uint32 UNITS= TN=229
   TMAX=950 DMAX=0 SLOPE=both SOURCE=gmond/
   
   
   HOST NAME=5.5.5.6 IP=5.5.5.6 REPORTED=1168934871 TN=3
   TMAX=20
   DMAX=0 LOCATION=unspecified GMOND_STARTED=1166534349 
   METRIC NAME=proc_run VAL=15 TYPE=uint32 UNITS= TN=68
   TMAX=950 DMAX=0 SLOPE=both SOURCE=gmond/ 
   METRIC NAME=proc_total VAL=439 TYPE=uint32 UNITS=
 TN=68
   TMAX=950 DMAX=0 SLOPE=both SOURCE=gmond/
   
   Thanks,
   Vitaly
   
-Original Message-
From: Martin Knoblauch [mailto:[EMAIL PROTECTED]
Sent: Monday, January 15, 2007 12:30 PM
To: Vitaly Karasik; ganglia-general@lists.sourceforge.net
Subject: RE: [Ganglia-general] Two similar linux hosts provides
 
different metrics

Hi Vitaly,

 where do you see the invalid numbers:

a) in the gmond XML Stream (telnet/nc to the gmond XML port)
b) in the XML Stream from gmetad (telnet/nc to the gmetad 
  XML port)
c) only in the web-frontend

Cheers
Martin
--- Vitaly Karasik [EMAIL PROTECTED] wrote:

 NON-BUSY HOST:
 # ps axl|wc
  61 8625865
 # uptime
  08:54:55  up 204 days,  2:00,  1 user,  load average: 0.00,
   0.00,
 0.00
 
 BUSY HOST
  ]# ps axl|wc
  62 8775977
  ]# uptime
  08:55:18  up 31 days, 16:30,  1 user,  load average: 0.04,
0.01, 0.00
  
 
  -Original Message-
  From: Martin Knoblauch [mailto:[EMAIL PROTECTED]
  Sent: Thursday, January 11, 2007 10:54 AM
  To: Vitaly Karasik; ganglia-general@lists.sourceforge.net
  Subject: Re: [Ganglia-general] Two similar linux 
  hosts provides 
  different metrics
  
  Hi Vitaly,
  
   what does ps axl show on both hosts, as that is
 basically 
  what gmond looks at? If it is already different there, the 
  problem is not ganglia related. (OK, I see you already
   checked
 ...)
  
   What are the load averages according to uptime?
  
  Cheers
  Martin
  
  
  --- Vitaly Karasik [EMAIL PROTECTED] wrote:
  
 Hi,
   
   I have a weird problem - two linux hosts with similar
 configuration
   provide very different metrics about  number of running
processes
 -
   one shows about 2, and second about 20-40 (I speak about
  concentrated
   load
   graph at top right.)
   proc_total is different too - 171 vs. 350 (BTW,  ps -ef
 |wc
  == 61 on
   both boxes)
   
   Both machines are RHEL3 kernel 2.4.21-37.ELsmp with
   ganglia-gmond-3.0.3-1 installed from RPM.
   
   Any ideas?
   Thanks,
   Vitaly
   

   
  
 
 --
  ---
   Take Surveys. Earn Cash. Influence the Future of IT Join 
   SourceForge.net's Techsay panel and you'll get the chance
 to
 share
   your opinions on IT  business topics through brief
 surveys
  - and earn
   cash
  
 
 http://www.techsay.com/default.php?page=join.phpp=sourceforge
  CID=DEVDEV
   ___
   Ganglia-general mailing list
   Ganglia-general@lists.sourceforge.net
  
 https://lists.sourceforge.net

Re: [Ganglia-general] XML error: no element found at 1

2007-01-16 Thread Martin Knoblauch

Hi Ashutosh,

 sorry for the wrong port. I meant of course 8651.

 You could try to run gmetad with a high debug level. This could help
to track down the problem.

 Also, could you please post the gmetad.conf file?

Cheers
Martin
--- Ashutosh Mahajan [EMAIL PROTECTED] wrote:

 Quoting Martin Knoblauch [EMAIL PROTECTED]:
 
  Ashutok,
 
   you need to do a query if you use port 8562 (the web interface
  does). What happens if you do telnet localhost 8561. That should
 give
  you the complete gmetad XML stream.
 
 
 thanks for the prompt reply.
 you meant 8651, rather than 8561?
 [EMAIL PROTECTED] ~]$ telnet localhost 8651
 Trying 127.0.0.1...
 Connected to localhost.
 Escape character is '^]'.
 
 seems to hang forever there.
 
 
   Is the rrdroot directory writable to the owner of the gmetad
  process? It should belong to e.g. nobody. This is a common
 mistake.
 
 
 yeah. it is writable.
 
 
  cheers
  Martin
  --- Ashutosh Mahajan [EMAIL PROTECTED] wrote:
 
  hello everyone,
 We are having problems installing ganglia version 3.0.4 with
  rrdtool-1.2.15.
  we can successfully do make, make install. gstat -a also seems to
  work.
  telnet localhost 8649 seems to throw out correct XML file.
 However,
  gmetad
  seems to be having some problems.
 
  telnet localhost 8652 seems to hang forever with the message:
  Trying 127.0.0.1...
  Connected to localhost.
  Escape character is '^]'.
 
  if i access ganglia through the web, i get this message after a
 long
 
  long time:
  There was an error collecting ganglia data (192.168.1.1:8652): XML
  error: no
  element found at 1
 
  rrd_rootdir also remains empty. what could be wrong? i can provide
  more
  details if necessary.
 
  thanks in advance.
 
 
 
 
 This message was sent using IMP, the Internet Messaging Program.
 
 

-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys - and earn
 cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] mcast_ttl in 3.0 gmond.conf

2007-03-14 Thread Martin Knoblauch


--- Ian Cunningham [EMAIL PROTECTED] wrote:

 Gil,
 
 Gilad Raphaelli wrote:
  Hello,
 
I'm having a problem increasing gmond's multicast packet ttl. 
 I've tried putting mcast_ttl on a line of its own and inside the
 global { } and udp_send_channel {} directives and always get
 gmond.conf parsing errors when trying to start gmond-3.0.4.  Any
 pointers on where mcast_ttl can be set?
 
  The error message is:
 
  gmond.conf:200: no such option 'mcast_ttl'
 
  Finally, mcast_ttl doesn't appear in gmond -t - has this
 functionality been removed altogether?
 
  Thanks,
 
  Gil
 I no longer use multicast so I not sure it works, but from looking at
 
 the source code, It looks like it was changed to 'ttl' under 
 'udp_send_channel'.
 

 which is even correctly documented in the shipping tarball. We should
update the stuff on the weg-page though ...

Cheers
Martin


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Help! I have a petabyte/s network

2007-03-28 Thread Martin Knoblauch

David,

 as far as I remember, the AIX metrics code had an overflow/wrap-around
problem prior to 3.0.4. Maybe the fixes are not thorough enough.

 The packets/sec are of course less affected.

Cheers
Martin

--- David Wong [EMAIL PROTECTED] wrote:

 Ganglia is reporting that I'm pushing up to 200 Petabytes/s through
 my
 network.  Nobody tell the network admin!
 
 I'm running Ganglia 3.0.4 with the Power5 add-ons on AIX5.3
 
 Bytes in and out statistics generally appear to have the right
 values.
 However at random times, I get spikes in the petabytes/s range.
 
 Here's a dump of the bytes_in database.  At first, I suspected
 perhaps
 these coincide with some counters getting reset, but they don't occur
 at
 regular intervals.
 
 !-- 2007-03-27 20:42:00 GMT / 1175028120 --
 rowv 1.9268390706e+05 /v/row
 !-- 2007-03-27 20:48:00 GMT / 1175028480 --
 rowv 1.5833184975e+05 /v/row
 !-- 2007-03-27 20:54:00 GMT / 1175028840 --
 rowv 1.6838302753e+05 /v/row
 !-- 2007-03-27 21:00:00 GMT / 1175029200 --
 rowv 1.3766069592e+05 /v/row
 !-- 2007-03-27 21:06:00 GMT / 1175029560 --
 rowv 2.1711888414e+05 /v/row
 !-- 2007-03-27 21:12:00 GMT / 1175029920 --
 rowv 4.9959709273e+16 /v/row
 !-- 2007-03-27 21:18:00 GMT / 1175030280 --
 rowv 1.7401339783e+05 /v/row
 !-- 2007-03-27 21:24:00 GMT / 1175030640 --
 rowv 2.0955720861e+05 /v/row
 !-- 2007-03-27 21:30:00 GMT / 1175031000 --
 rowv 1.9032255300e+05 /v/row
 !-- 2007-03-27 21:36:00 GMT / 1175031360 --
 rowv 1.9162727036e+05 /v/row
 !-- 2007-03-27 21:42:00 GMT / 1175031720 --
 rowv 1.2703790825e+05 /v/row
 
 Can anyone shed light on what might be happening?  Any pointers for
 debugging?
 
 David Wong
 Senior Systems Engineer
 Management Dynamics, Inc.
 Phone: 201-804-6127
 [EMAIL PROTECTED]
 
 
 

-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys-and earn cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Help! I have a petabyte/s network

2007-03-29 Thread Martin Knoblauch

David,

 good catch. I will have to look at it for a bit.

Cheers
Martin
--- David Wong [EMAIL PROTECTED] wrote:

 I don't write much code nowadays, so I'm going to need a lot of help
 with this.
 
 I dug through the ganglia code and I found this interesting tidbit in
 libmetrics/aix/metrics.c which may be indicative of the problem.
 
 There's an assignment from cur_ninfo.ibytes to cur_net_stat.ibytes,
 but
 the types of the two variables are different.
 
 net_stat::ibytes is a double: 
 
 struct net_stat{
   double ipackets;
   double opackets;
   double ibytes;
   double obytes;
 } cur_net_stat;
 
 and we have *ninfo declared here:
 
 perfstat_netinterface_total_t ninfo[2],*last_ninfo, *cur_ninfo ;
 
 libperfstat.h has perfstat_netinterface_total_t::ibytes as
 u_longlong_t.
 
 Does this code try to do what I think it is doing, i.e. assign an
 unsigned 64 bit integer to a signed 64bit integer?
 
 I'm willing to test the code if someone who's more adept at coding
 and
 building will take on the challenge.
 
 It looks to me that the type mismatch will have to fixed in a few
 places, such as CALC_NETSTAT, and we'll have to add an unsigned long
 long to g_val_t too.  Those are the ones I can see so far.
 
 David Wong
 Senior Systems Engineer
 Management Dynamics, Inc.
 Phone: 201-804-6127
 [EMAIL PROTECTED]
 
 -Original Message-
 From: Martin Knoblauch [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, March 28, 2007 12:00 PM
 To: David Wong; ganglia-general@lists.sourceforge.net
 Subject: Re: [Ganglia-general] Help! I have a petabyte/s network
 
 David,
 
  as far as I remember, the AIX metrics code had an
 overflow/wrap-around
 problem prior to 3.0.4. Maybe the fixes are not thorough enough.
 
  The packets/sec are of course less affected.
 
 Cheers
 Martin
 
 --- David Wong [EMAIL PROTECTED] wrote:
 
  Ganglia is reporting that I'm pushing up to 200 Petabytes/s through
  my
  network.  Nobody tell the network admin!
  
  I'm running Ganglia 3.0.4 with the Power5 add-ons on AIX5.3
  
  Bytes in and out statistics generally appear to have the right
  values.
  However at random times, I get spikes in the petabytes/s range.
  
  Here's a dump of the bytes_in database.  At first, I suspected
  perhaps
  these coincide with some counters getting reset, but they don't
 occur
  at
  regular intervals.
  
  !-- 2007-03-27 20:42:00 GMT / 1175028120
 --
  rowv 1.9268390706e+05 /v/row
  !-- 2007-03-27 20:48:00 GMT / 1175028480
 --
  rowv 1.5833184975e+05 /v/row
  !-- 2007-03-27 20:54:00 GMT / 1175028840
 --
  rowv 1.6838302753e+05 /v/row
  !-- 2007-03-27 21:00:00 GMT / 1175029200
 --
  rowv 1.3766069592e+05 /v/row
  !-- 2007-03-27 21:06:00 GMT / 1175029560
 --
  rowv 2.1711888414e+05 /v/row
  !-- 2007-03-27 21:12:00 GMT / 1175029920
 --
  rowv 4.9959709273e+16 /v/row
  !-- 2007-03-27 21:18:00 GMT / 1175030280
 --
  rowv 1.7401339783e+05 /v/row
  !-- 2007-03-27 21:24:00 GMT / 1175030640
 --
  rowv 2.0955720861e+05 /v/row
  !-- 2007-03-27 21:30:00 GMT / 1175031000
 --
  rowv 1.9032255300e+05 /v/row
  !-- 2007-03-27 21:36:00 GMT / 1175031360
 --
  rowv 1.9162727036e+05 /v/row
  !-- 2007-03-27 21:42:00 GMT / 1175031720
 --
  rowv 1.2703790825e+05 /v/row
  
  Can anyone shed light on what might be happening?  Any pointers for
  debugging?
  
  David Wong
  Senior Systems Engineer
  Management Dynamics, Inc.
  Phone: 201-804-6127
  [EMAIL PROTECTED]
  
  
  
 


 -
  Take Surveys. Earn Cash. Influence the Future of IT
  Join SourceForge.net's Techsay panel and you'll get the chance to
  share your
  opinions on IT  business topics through brief surveys-and earn
 cash
 

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDE
 V
  ___
  Ganglia-general mailing list
  Ganglia-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/ganglia-general
  
  
 
 
 --
 Martin Knoblauch
 email: k n o b i AT knobisoft DOT de
 www:   http://www.knobisoft.de
 
 
 

-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys-and earn cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b

Re: [Ganglia-general] gmetad patch to contact random data_source hosts

2007-03-29 Thread Martin Knoblauch

Tim,

 your diff command looks a bit surprising to me. The revision number
looks like CVS to me and we are SVN since quite some time.

 Which version of Ganglia have you checked out?

Cheers
Martin
--- Witham, Timothy D [EMAIL PROTECTED] wrote:

 Hi,
 
 I just had a situation where the first host in a gmetad data_source
 accepts the connection but offers no data, like this:
 
   poll() timeout for [clustername] data source after 0 bytes read
 
 Gmetad always tries the sources in order and so it just keeps getting
 stuck on this first one, and losing the data for the entire cluster.
 
 Here is a quick patch that tries random hosts from the list instead,
 and solved my problem.  It is not careful to make sure it tried them
 all, but if it fails it will just try again next time.  If someone
 wants to fix it to try all the sources in a random order, that would
 be fine.  Perhaps this could be included in the next release unless
 someone knows a good reason to always try the sources in order.
 
 Thanks!
 
 -8-
 diff -c -r1.1.1.1 data_thread.c
 *** data_thread.c 19 Mar 2007 18:52:32 -  1.1.1.1
 --- data_thread.c 28 Mar 2007 18:12:08 -
 ***
 *** 18,24 
   void *
   data_thread ( void *arg )
   {
 !int i, sleep_time, bytes_read, rval;
  data_source_list_t *d = (data_source_list_t *)arg;
  g_inet_addr *addr;
  g_tcp_socket *sock=0;
 --- 18,24 
   void *
   data_thread ( void *arg )
   {
 !int i, j, sleep_time, bytes_read, rval;
  data_source_list_t *d = (data_source_list_t *)arg;
  g_inet_addr *addr;
  g_tcp_socket *sock=0;
 ***
 *** 60,75 
if(d-last_good_index = 0)
  sock = g_tcp_socket_new ( d-sources[d-last_good_index] );
   
 !  /* If there was no good connection last time or the above
 connect failed then try each host in the list. */
if(!sock)
  {
 !  for(i=0; i  d-num_sources; i++)
  {
 !  /* Find first viable source in list. */
 !  sock = g_tcp_socket_new ( d-sources[i] );
if( sock )
  {
 !  d-last_good_index = i;
break;
  }
  }
 --- 60,80 
if(d-last_good_index = 0)
  sock = g_tcp_socket_new ( d-sources[d-last_good_index] );
   
 !  /* If there was no good connection last time or the above
 ! connect failed then try random hosts in the list.  We try
 ! random ones in case someone is accepting the connection
 ! but refusing to provide any data; we don't want to get
 ! stuck with a non-working host. */
if(!sock)
  {
 !  for(i=0; i  d-num_sources * 2; i++)
  {
 !  /* Find random viable source in list. */
 !  j = d-num_sources * (rand() / (RAND_MAX - 1.0));
 !  sock = g_tcp_socket_new ( d-sources[j] );
if( sock )
  {
 !  d-last_good_index = j;
break;
  }
  }
 -8--
 
 -- 
 [EMAIL PROTECTED]; I don't speak for Intel or anyone.
 

-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys-and earn cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Gmetad and web frontend on different machines.

2007-03-29 Thread Martin Knoblauch

Richard,

 depending on the cluster size, writing the RRDs via NFS might turn out
to be a huge bottleneck.

Cheers
Martin
--- [EMAIL PROTECTED] wrote:

 Saundry,
  
 It sort of looks like you can, but actually you can't.
 gmetad writes to rrd databases as local files,
 and the web and php read rrd databases as local
 (actually it invokes rrdtool itself).
  
 I imagine you could separate the two using NFS filessystems,
 but I have not tried this.
 
 kind regards,
 
 Richard Grevis 
 Production Architecture 
 Barclays Capital, Canary Wharf, London, E14 4BB 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of
 saundrya mishra
 Sent: 29 March 2007 14:30
 To: ganglia-general@lists.sourceforge.net
 Subject: [Ganglia-general] Gmetad and web frontend on different
 machines.
 
 
 
   Hi There,
   
   I am new to Ganglia. Can we have gmetad and web frontend for a
 cluster to be running on two different machines?? If yes, then how is
 it
 possible since i read in the configuration file of the web frontend
 that
 the RRDTool databases  need to be local to be read? 
   
   Greetings,
   Saundrya.
   
 
 


 For more information about Barclays Capital, please visit our web
 site at http://www.barcap.com.
 
 Internet communications are not secure and therefore the Barclays
 Group does not accept legal responsibility for the contents of this
 message.  Although the Barclays Group operates anti-virus programmes,
 it does not accept responsibility for any damage whatsoever that is
 caused by viruses being passed.  Any views or opinions presented are
 solely those of the author and do not necessarily represent those of
 the Barclays Group.  Replies to this email may be monitored by the
 Barclays Group for operational or business reasons.


 
-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys-and earn cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

1 2 >

1 - 100 of 164 matches

Mail list logo