Re: [Ganglia-general] aix/linux ganglia web server question.

2016-08-01 Thread Khrist Hansen
Yes, that is how we do it.

 

Collector gmond and gmetad processes are running on Linux along with the web
server.

 

 

From: Spatola, Pat [mailto:pspat...@wrberkley.com] 
Sent: Monday, August 01, 2016 11:59 AM
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] aix/linux ganglia web server question.

 

Does the hardware/os of the web server need to match the hardware/os of the
gmond clients? I have aix lpars running ganglia but would prefer to have
them reporting into a web server running on linux. Is this setup supported?
Are there any modifications I need to make to the ganglia web server setup
to get this working?

 


Pat Spatola - Sr. Systems Engineer

 


cid:image001.png@01CFD632.2C86BBD0


101 Bellevue Parkway, Wilmington DE 19809
Phone: (302) 439-2006 | Cell: (610) 952-0064
Email: pspat...@wrberkley.com
Website:   www.bts.wrberkley.com

 


Technology Leadership Unleashing Business Potential

 

 

CONFIDENTIALITY NOTICE: This e-mail and the transmitted documents contain
private, privileged and confidential information belonging to the sender.
The information therein is solely for the use of the addressee. If your
receipt of this transmission has occurred as the result of an error, please
immediately notify us so we can arrange for the return of the documents. In
such circumstances, you are advised that you may not disclose, copy,
distribute or take any other action in reliance on the information
transmitted. 

--
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Is there a way to Display deltas for metrics

2014-08-04 Thread Khrist Hansen
I use gmetric from a script scheduled to run every minute via cron.

Check 'man gmetric' for syntax, and there are example scripts here:

http://www.perzl.org/ganglia/devicespecific.html


Hope that helps,

Khrist

-Original Message-
From: Silver, Jonathan [mailto:jonathan.sil...@unify.com] 
Sent: Monday, August 04, 2014 10:18 AM
To: Ganglia
Subject: [Ganglia-general] Is there a way to Display deltas for metrics

Some of the metrics that we are collecting are totals since the system
rebooted. 
That is not very interesting and it becomes a huge number that shows
(percentage-wise) little change over time. 

Is there some way to change ganglia to display the change of this metric
instead of it value without changing the collector itself?

Thanks,
Jonathan 




--
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071iu=/4140/ostg.clktrk
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071iu=/4140/ostg.clktrk
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Is there a way to Display deltas for metrics

2014-08-04 Thread Khrist Hansen
Correct, gmetric will the send your desired metric value and its attributes
to gmetad.

If the OS or application only provides the metric as a counter but you want
them visualized as a gauge, then it is on you to calculate the delta *and*
rate of occurrence over time.  The example scripts from Dr. Perzl are a good
starting point for both, though I had to tweak the algorithm to fit my
needs.


-Original Message-
From: Silver, Jonathan [mailto:jonathan.sil...@unify.com] 
Sent: Monday, August 04, 2014 10:44 AM
To: Khrist Hansen; 'Ganglia'
Subject: RE: [Ganglia-general] Is there a way to Display deltas for metrics

Thanks, but gmetric (I thought) was for the collection of metrics into
ganglia. 
The metrics are already being collected but they are the sum since day 1. 

I guess that I could create a new metric by running a local script, get the
latest 2 values from rrds, compute the difference and set that as the new
value for the new metric (added using gmetric), but that is a lot of
overhead. 

Did I miss-understand something?


-Original Message-
From: Khrist Hansen [mailto:khrist.han...@gmail.com] 
Sent: Monday, August 04, 2014 11:39 AM
To: Silver, Jonathan; 'Ganglia'
Subject: RE: [Ganglia-general] Is there a way to Display deltas for metrics

I use gmetric from a script scheduled to run every minute via cron.

Check 'man gmetric' for syntax, and there are example scripts here:

http://www.perzl.org/ganglia/devicespecific.html


Hope that helps,

Khrist

-Original Message-
From: Silver, Jonathan [mailto:jonathan.sil...@unify.com]
Sent: Monday, August 04, 2014 10:18 AM
To: Ganglia
Subject: [Ganglia-general] Is there a way to Display deltas for metrics

Some of the metrics that we are collecting are totals since the system
rebooted. 
That is not very interesting and it becomes a huge number that shows
(percentage-wise) little change over time. 

Is there some way to change ganglia to display the change of this metric
instead of it value without changing the collector itself?

Thanks,
Jonathan 




--
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071iu=/4140/ostg.clktrk
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general



--
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071iu=/4140/ostg.clktrk
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Ganglia web csv import

2014-05-13 Thread Khrist Hansen
I don’t think that can be done very easily.  You would probably be better off 
graphing in Excel.

 

If you added the CSV stream from Ganglia into Excel as an external data source, 
you should only need to refresh the external data source to update the graph.

 

 

From: yanqing huang [mailto:yanqinghuang1...@gmail.com] 
Sent: Monday, May 12, 2014 10:34 PM
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] Ganglia web csv import

 

 Any body know how to import csv data to ganglia web for drawing graphs?  Thx ~

--
Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free.
http://p.sf.net/sfu/SauceLabs___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] How to show mean value of multicurve in Ganglia aggregate graphs

2014-05-06 Thread Khrist Hansen
I don't think you will find this functionality in ganglia, but you can
import the CSV stream into Excel as an external data source and do it there.
On May 6, 2014 8:01 PM, yanqing huang yanqinghuang1...@gmail.com wrote:

 Hi all,

I have used ganglia web to aggregate graphs of same metric form all
 nodes.
And the question is I want to get the mean value of the aggregate
 metrc(such as cpu_idle) from all nodes. Because I attempt to compare the
 performance in  different environment, so showing the arithmetic mean of
 all the curves would be great for me.


 --
 Is your legacy SCM system holding you back? Join Perforce May 7 to find
 out:
 #149; 3 signs your SCM is hindering your productivity
 #149; Requirements for releasing software faster
 #149; Expert tips and advice for migrating your SCM now
 http://p.sf.net/sfu/perforce
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing software faster
#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Ganglia 4.x architecture planning

2014-03-28 Thread Khrist Hansen
Happy AIX Ganglia user here thanks to all of Dr. Perzl’s generous efforts!

 

:)

 

 

From: Alexander Karner [mailto:a...@de.ibm.com] 
Sent: Friday, March 28, 2014 3:07 AM
To: Daniel Pocock
Cc: ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] Ganglia 4.x architecture planning

 

Hi! 

I think we should continue to put an emphasis on portability: 
Ganglia is not only used in Linux environments but also on AIX, HP-UX,
Solaris etc. 
This includes both, gmond and gmetad (+webserver). 

Personally I'd suggest to check, which tools are available in those
platforms. For the DB this could be for example postgresql. 

How would the current architectural overview match a Grid of Grids
environment? Would this allow us to search for a system on the central
gmtead WebUI and then jump to the specific grid server? 
And: Would this also match actual requirements about partition mobility? 

Mit freundlichen Grüßen / Kind regards 

Alexander Karner 





From:Daniel Pocock dan...@pocock.com.au 
To:ganglia-general@lists.sourceforge.net, 
Date:27.03.2014 21:08 
Subject:[Ganglia-general] Ganglia 4.x architecture planning 

  _  






I made up a rough diagram about how Ganglia 4.x could look:

 
https://raw.githubusercontent.com/ganglia/monitor-core/master/doc/planning/
ganglia-4.x.png
https://raw.githubusercontent.com/ganglia/monitor-core/master/doc/planning/g
anglia-4.x.png

The biggest change is the introduction of MongoDB

Instead of having the gmetad serve up an XML every time somebody asks to
see the web page, the gmetad will just store current values into MongoDB.

This means that web frameworks (like PHP) can query the data from
MongoDB, which is much more horizontally scalable and more suited to
serving this data.  For large sites where many users access the web
reports, this will be very useful.

MongoDB is also a backend for rsyslog daemon now and could potentially
be a Nagios backend, so it would be a great way to unify monitoring data.

The introduction of RabbitMQ is an optional dependency.  It would allow
users to send commands from the web interface.

One of the motivations for this work is the Google Summer of Code
projects.  Each student can potentially work on a different part of this
puzzle and at the end of the year we could launch it as Ganglia 4.0 if
people like it.

Regards,

Daniel



--
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
https://lists.sourceforge.net/lists/listinfo/ganglia-general



--
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] multicast not working

2014-03-21 Thread Khrist Hansen
Yes, that is correct insofar as I know.  Even though your nodes are on the
same subnet, your private cloud’s network config may still be preventing
multicast in some way.

 

Assuming your OS is some flavor of Linux, I would try one of the freely
available tools to test multicast routing between node1 and your other
nodes.

 

A quick google search led me to these tools:

* mz

* ssmping

* iperf



http://serverfault.com/questions/211482/tools-to-test-multicast-routing

 

 

 

From: Cristovao Jose Domingues Cordeiro [mailto:cristovao.corde...@cern.ch] 
Sent: Friday, March 21, 2014 4:23 AM
To: Khrist Hansen; ganglia-general@lists.sourceforge.net
Subject: RE: [Ganglia-general] multicast not working

 

Hi Khrist,

well that makes sense indeed. 
All my VM's are running in a private cloud infrastructure so I have no
control over their network characteristics. 
But still, for instance, in gmetad I am gathering the data from node1, and
node1's IP is in the same subnet as node2:

 - node1 IP= x.x.x.169
 - node2 IP= x.x.x.96

but in Ganglia frontend, I only get information from node1!!

Nevertheless, do I have to do any extra configuration to put multicast
working? Because I did nothing in the gmonds. I just installed ganglia and
ganglia-gmond and I left gmond.conf as default. Is this correct?

 

Cumprimentos / Best regards,
Cristóvão José Domingues Cordeiro

  _  

From: Khrist Hansen [khrist.han...@gmail.com]
Sent: 20 March 2014 23:07
To: Cristovao Jose Domingues Cordeiro; ganglia-general@lists.sourceforge.net
Subject: RE: [Ganglia-general] multicast not working

I had the same problem due to multicast packets not being routed between
subnets.

 

Are your nodes on different subnets/VLANs?

 

There are some tools out there that will test multicast connectivity across
subnets.  All that comes to mind for the moment is mping on AIX, but I know
there are more for the various OS platforms.

 

I had to use unicast in the end, but I am still working on my network
engineering team to enable multicast routing between subnets.

 

Hope that helps,


Khrist Hansen

 

 

From: Cristovao Jose Domingues Cordeiro [mailto:cristovao.corde...@cern.ch] 
Sent: Thursday, March 20, 2014 10:42 AM
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] multicast not working

 

Hi,

I'm trying to set the simplest of clusters with mcast. Basically I have one
gmetad.conf where I just did:
data_source unspecified node1

then, I have in one cloud, node1, node2, node3, node4 and node5.
All of these have default gmond installations with no changes in gmond.conf.

The problem is that in my frontend, I only get node1 and not the others.
Weren't they supposed to talk with each others and have all the metrics
about each others?

Thanks

 

Cumprimentos / Best regards,
Cristóvão José Domingues Cordeiro

--
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] multicast not working

2014-03-20 Thread Khrist Hansen
I had the same problem due to multicast packets not being routed between
subnets.

 

Are your nodes on different subnets/VLANs?

 

There are some tools out there that will test multicast connectivity across
subnets.  All that comes to mind for the moment is mping on AIX, but I know
there are more for the various OS platforms.

 

I had to use unicast in the end, but I am still working on my network
engineering team to enable multicast routing between subnets.

 

Hope that helps,


Khrist Hansen

 

 

From: Cristovao Jose Domingues Cordeiro [mailto:cristovao.corde...@cern.ch] 
Sent: Thursday, March 20, 2014 10:42 AM
To: ganglia-general@lists.sourceforge.net
Subject: [Ganglia-general] multicast not working

 

Hi,

I'm trying to set the simplest of clusters with mcast. Basically I have one
gmetad.conf where I just did:
data_source unspecified node1

then, I have in one cloud, node1, node2, node3, node4 and node5.
All of these have default gmond installations with no changes in gmond.conf.

The problem is that in my frontend, I only get node1 and not the others.
Weren't they supposed to talk with each others and have all the metrics
about each others?

Thanks

 

Cumprimentos / Best regards,
Cristóvão José Domingues Cordeiro

--
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] [Ganglia-developers] Gmetad Platform Poll

2013-12-16 Thread Khrist Hansen
I second this motion.  :)


-Original Message-
From: Bernard Li [mailto:bern...@vanhpc.org] 
Sent: Monday, December 16, 2013 2:04 PM
To: Michael Perzl
Cc: ganglia-develop...@lists.sourceforge.net; Morten Torstensen;
ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] [Ganglia-developers] Gmetad Platform Poll

Hi Michael:

Any chance we can try to merge your AIX specific patches into upstream code?
Aren't you getting a bit tired managing them separately? ;-)

Thanks,

Bernard

On Wed, Dec 11, 2013 at 3:17 PM, Michael Perzl mich...@perzl.org wrote:
 I can certainly test and verify this on a variety of different AIX levels.

 Regards,
 Michael

 On 12/11/2013 07:27 PM, Devon H. O'Dell wrote:
 Thanks. I think the work I'm doing should work with AIX on POWER.
 Would anybody with a builder be able to test and verify this?

 2013/12/11 Morten Torstensen morten.torsten...@evry.com:
 We are using ganglia for aix on power, and possibly linux on power too
in the close future.

 We use binaries from Michael Perzl, http://www.perzl.org/ganglia/


 Best regards
 Morten Torstensen
 Chief Solution Architect, BA Nordic Open Systems Future Proof 
 Service Development morten.torsten...@evry.com M +47 46819584

 -Original Message-
 From: Devon H. O'Dell [mailto:devon.od...@gmail.com]
 Sent: Wednesday, 11 December, 2013 16:49
 To: ganglia-develop...@lists.sourceforge.net; 
 ganglia-general@lists.sourceforge.net
 Subject: [Ganglia-general] Gmetad Platform Poll

 Hi all,

 I'm intending to continue working on performance improvements for 
 gmetad. I'm curious if anybody uses gmetad on architectures that are
 not:

   * ARM
   * PPC
   * PPC64
   * SPARCv9
   * i386
   * amd64

 or on systems that are not:

   * Linux
   * ${any}BSD
   * Solaris

 (I'd also be interested in hearing if people are using gmond on 
 architectures other than those mentioned above; less interested 
 about the operating systems for that one.)

 Kind regards,

 --dho

 
 -- Rapidly troubleshoot problems before they affect your 
 business. Most IT organizations don't have a clear picture of how
application performance affects their revenue. With AppDynamics, you get
100% visibility into your Java,.NET,  PHP application. Start your 15-day
FREE TRIAL of AppDynamics Pro!
 http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg
 .clktrk ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 -
 - Rapidly troubleshoot problems before they affect your 
 business. Most IT organizations don't have a clear picture of how 
 application performance affects their revenue. With AppDynamics, you 
 get 100% visibility into your Java,.NET,  PHP application. Start 
 your 15-day FREE TRIAL of AppDynamics Pro!
 http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.
 clktrk ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general



 --
  Rapidly troubleshoot problems before they affect your 
 business. Most IT organizations don't have a clear picture of how 
 application performance affects their revenue. With AppDynamics, you 
 get 100% visibility into your Java,.NET,  PHP application. Start your 
 15-day FREE TRIAL of AppDynamics Pro!
 http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.c
 lktrk ___
 Ganglia-developers mailing list
 ganglia-develop...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-developers


--
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics
Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!

Re: [Ganglia-general] Insane negative values for cpu_idle and cpu_wio when node is CPU bound

2013-10-03 Thread Khrist Hansen
This appears to be an issue with Mr. Perzl's updated libperfstat code
borrowed from IBM's perfstat_cpu_total example.

ftp://www.oss4aix.org/ganglia/RPMs-3.3.7/src/ganglia-3.3.7-aix.patch

http://publib.boulder.ibm.com/infocenter/aix/v6r1/topic/com.ibm.aix.prftools
/doc/prftools/idprftools_perfstat_glob_cpu.htm

When calculating wio and idle, the code performs a divide operation with
(dlt_lcpu_wait + dlt_lcpu_idle) as the divisor.  If the server is CPU bound,
i.e. usr+sys=100%, then both dlt_lcpu_wait and dlt_lcpu_idle will be zero,
and the division will occur with zero as the divisor.

This should be a fairly simple fix, and I am attempting to contact Mr. Perzl
to that effect.


-Original Message-
From: Khrist Hansen [mailto:khrist.han...@gmail.com] 
Sent: Wednesday, October 02, 2013 6:18 PM
To: ganglia-general@lists.sourceforge.net
Subject: RE: Insane negative values for cpu_idle and cpu_wio when node is
CPU bound

Here is another example from gstat:
 CPUs (Procs/Total) [ 1, 5, 15min] [  User,  Nice, System, Idle,
Wio]
8 (8/  122) [  4.59,  2.04,  1.35] [  99.8,   0.0,
0.2,-67062349824.0,-67062349824.0] OFF

Looking at the source code for AIX metrics
(https://github.com/ganglia/monitor-core/blob/master/libmetrics/aix/metrics.
c), it appears that negative values should be converted to 0.  This is
either not happening or the metrics are somehow being modified after the
fact.

g_val_t
cpu_wio_func ( void )
{
   g_val_t val;
   
   get_cpuinfo();
   val.f = CALC_CPUINFO(wait);


   if(val.f  0) val.f = 0.0;
   return val;
}

g_val_t
cpu_idle_func ( void )
{
   g_val_t val;


   get_cpuinfo();
   val.f = CALC_CPUINFO(idle);


   if(val.f  0) val.f = 0.0;
   return val;
}


From: K. Hansen
Sent: Wednesday, October 02, 2013 4:50 PM
To: ganglia-general@lists.sourceforge.net
Subject: Insane negative values for cpu_idle and cpu_wio when node is CPU
bound

Environment:
AIX 6.1 TL7 SP7
gmond 3.6.0 (from http://www.perzl.org/ganglia/)

I noticed that a particular node would send insanely high negative values
for cpu_idle and cpu_wait metrics when cpu_user + cpu_system were near 100%,
i.e. the node is completely CPU bound.  The result is major skewing of the
node's cpu_idle and cpu_wio graphs so that no true positive values are
visible, and the cpu_report graph for the node, cluster, and grid become
corrupted.

Here is an example of what I am talking about:  http://imgur.com/a/aIzyU

I am able to replicate this behavior on any AIX node by running the
following command to generate CPU load:

perl -e 'while (--$ARGV[0] and fork) {}; while () {}' 8

Where the last digit is the number of threads available to the server.  For
example, if a server has 2 POWER7 vCPU, then it has 8 threads (logical CPU)
due to 4-way simultaneous multithreading (SMT).

Has anyone else experienced this on AIX or Linux?

Thanks!




--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] Insane negative values for cpu_idle and cpu_wio when node is CPU bound

2013-10-02 Thread Khrist Hansen
Environment:

AIX 6.1 TL7 SP7

gmond 3.6.0 (from http://www.perzl.org/ganglia/)

 

I noticed that a particular node would send insanely high negative values
for cpu_idle and cpu_wait metrics when cpu_user + cpu_system were near 100%,
i.e. the node is completely CPU bound.  The result is major skewing of the
node's cpu_idle and cpu_wio graphs so that no true positive values are
visible, and the cpu_report graph for the node, cluster, and grid become
corrupted.

 

Here is an example of what I am talking about:  http://imgur.com/a/aIzyU

 

I am able to replicate this behavior on any AIX node by running the
following command to generate CPU load:

 

perl -e 'while (--$ARGV[0] and fork) {}; while () {}' 8

 

Where the last digit is the number of threads available to the server.  For
example, if a server has 2 POWER7 vCPU, then it has 8 threads (logical CPU)
due to 4-way simultaneous multithreading (SMT).

 

Has anyone else experienced this on AIX or Linux?

 

Thanks!

 

--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Insane negative values for cpu_idle and cpu_wio when node is CPU bound

2013-10-02 Thread Khrist Hansen
Here is another example from gstat:
 CPUs (Procs/Total) [ 1, 5, 15min] [  User,  Nice, System, Idle,
Wio]
8 (8/  122) [  4.59,  2.04,  1.35] [  99.8,   0.0,
0.2,-67062349824.0,-67062349824.0] OFF

Looking at the source code for AIX metrics
(https://github.com/ganglia/monitor-core/blob/master/libmetrics/aix/metrics.
c), it appears that negative values should be converted to 0.  This is
either not happening or the metrics are somehow being modified after the
fact.

g_val_t
cpu_wio_func ( void )
{
   g_val_t val;
   
   get_cpuinfo();
   val.f = CALC_CPUINFO(wait);


   if(val.f  0) val.f = 0.0;
   return val;
}

g_val_t
cpu_idle_func ( void )
{
   g_val_t val;


   get_cpuinfo();
   val.f = CALC_CPUINFO(idle);


   if(val.f  0) val.f = 0.0;
   return val;
}


From: K. Hansen
Sent: Wednesday, October 02, 2013 4:50 PM
To: ganglia-general@lists.sourceforge.net
Subject: Insane negative values for cpu_idle and cpu_wio when node is CPU
bound

Environment:
AIX 6.1 TL7 SP7
gmond 3.6.0 (from http://www.perzl.org/ganglia/)

I noticed that a particular node would send insanely high negative values
for cpu_idle and cpu_wait metrics when cpu_user + cpu_system were near 100%,
i.e. the node is completely CPU bound.  The result is major skewing of the
node's cpu_idle and cpu_wio graphs so that no true positive values are
visible, and the cpu_report graph for the node, cluster, and grid become
corrupted.

Here is an example of what I am talking about:  http://imgur.com/a/aIzyU

I am able to replicate this behavior on any AIX node by running the
following command to generate CPU load:

perl -e 'while (--$ARGV[0] and fork) {}; while () {}' 8

Where the last digit is the number of threads available to the server.  For
example, if a server has 2 POWER7 vCPU, then it has 8 threads (logical CPU)
due to 4-way simultaneous multithreading (SMT).

Has anyone else experienced this on AIX or Linux?

Thanks!



--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60134791iu=/4140/ostg.clktrk
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general