from:"Martin Knoblauch"

[Ganglia-general] Application Monitoring with Ganglia

2017-08-29 Thread Martin Knoblauch

Hi,

 I am afraid I know the answer, but just to be sure... I am monitoring a
bunch of Linux servers running one or more JVMs with Ganglia. In order to
get some get some insight in the resource usage of the JVMs we use
"jmxtrans" to retrieve the metrics and spoof them to an Ganglia aggregator.
Works fine with one JVM, but gives trouble with two and more. Reason the
metrics are called the same.

 So I have the idea to group the metrics of each JVM into separate metrics
groups JVM1, JVM2, JVM3 ... The problem is that this still does not seem to
work. What I want is

HostX
-JVM1
--Metric1
--Metric2
--Metric3
-JVM2
--Metric1
--Metric2
--Metric3
-JVM3
--Metric1
--Metric2
--Metric3

 That is nine metrics in three groups. But I only see three metrics and
they "jump" from group to group. Works fine if I make the Metrics names
unique.So it seems there is a uniqueness requirement on the metrics level.

 It would be really nice, if that requirement would could restricted to the
group level. Any chance?

Thanks
Martin
-- 
----------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] High SystemCPU usage, low UserCPU usage

2015-10-14 Thread Martin Knoblauch

Dear Khan,

 as Vladimir said, "System CPU" is spent in the kernel on I/O, Interrupts,
memory management. Just out of curiosity: what Linux are you (is your
customer) running, which kernel version and what is the uptime?

I ask, because I recently was facing a similar issue on Servers running
SLES11/SP2 (kernel 3.0.58-0.6.2-default). Those were used for Tomcat (Java)
processes, not HPC. They started to really max out all CPUs 100% with 75%
solid "red". But that happened only after some days of uptime

It turned out that in our situation turning of the half-baked (at least in
that kernel) "Transparent Huge Pages" feature off (or to voluntary mode)
solved the problem:

# echo madvise > /sys/kernel/mm/transparent_hugepage/enabled
# echo madvise >  /sys/kernel/mm/transparent_hugepage/defrag

# cat /sys/kernel/mm/transparent_hugepage/{enabled,defrag}
always [madvise] never
always [madvise] never

Doing that is pretty much without risk and can be done/reverted at any
time. It may cost a bit of performance in systems with lots of memory, but
I personally think it is overrated for general usage.

As I said, not sure it applies to your situation, but comes from a real
world high throughput environment.

Cheers
Martin

On Tue, Oct 13, 2015 at 7:49 PM, Kamran Khan <kam...@pssclabs.com> wrote:

> Hi All,
>
>
> This isn't a problem with Ganglia, but I was hoping I might get a little
> advice on what I am seeing.  I have a customer who is running ls-dyna
> applications, and he is noticing something odd.  He is noticing his jobs
> being bogged down and not running at their full capacity.  He looked at the
> Ganglia web interface and saw that "System CPU" was at 100%, while "User
> CPU" was at like 20%.  What processes does the "System CPU" refer to?  What
> tools can I use to track what might be pushing the "System CPU" to 100%?
> There are times when the "User CPU" goes up to 100%, which is what he
> wants, but then at times it spikes down to 20% ish and the "System CPU"
> stays up around 100%.
>
>
> Any advice is greatly appreciated.  If you need me to send output, I
> certainly can.  Just let me know what to run.
>
>
> Please let me know.
>
>
> Thanks.
> --
> Kamran Khan
> PSSC Labs
> HPC Software / Technical Engineer
>
>
> --
>
> ___
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>
>

-- 
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de
--
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Monitoring CTX switches and memory fragmentation

2015-05-05 Thread Martin Knoblauch

Hi Vladimir,

 is the CTX stuff already in a released version? I may need to tell the end
customer to upgrade.

Cheers
Martin

On Tue, May 5, 2015 at 4:12 PM, Vladimir Vuksan vli...@veus.hr wrote:

  I have wrote one for memory fragmentation. You can find it here


 https://github.com/ganglia/gmond_python_modules/tree/master/system/mem_fragmentation

 Context stuff is now in the monitor-core master


 https://github.com/ganglia/monitor-core/blob/master/gmond/python_modules/cpu/cpu_stats.py

 Vladimir


 On 05/05/2015 02:49 AM, Martin Knoblauch wrote:

  Hi friends,

   short question: does Ganglia provide monitor agents for context switches
 and memory fragmentation (e.g. listing contents of /proc/buddyinfo)? I
 want to avoid double work, should they exist officially?

  Cheers
 Martin
   --
   --
 Martin Knoblauch
 email: k n o b i AT knobisoft DOT de
 www: http://www.knobisoft.de


 --
 One dashboard for servers and applications across Physical-Virtual-Cloud
 Widest out-of-the-box monitoring support with 50+ applications
 Performance metrics, stats and reports that give you Actionable Insights
 Deep dive visibility with transaction tracing using APM 
 Insight.http://ad.doubleclick.net/ddm/clk/290420510;117567292;y



 ___
 Ganglia-general mailing 
 listGanglia-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/ganglia-general




 --
 One dashboard for servers and applications across Physical-Virtual-Cloud
 Widest out-of-the-box monitoring support with 50+ applications
 Performance metrics, stats and reports that give you Actionable Insights
 Deep dive visibility with transaction tracing using APM Insight.
 http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general




-- 
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de
--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

[Ganglia-general] Monitoring CTX switches and memory fragmentation

2015-05-05 Thread Martin Knoblauch

Hi friends,

 short question: does Ganglia provide monitor agents for context switches
and memory fragmentation (e.g. listing contents of /proc/buddyinfo)? I
want to avoid double work, should they exist officially?

Cheers
Martin
 --
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de
--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

[Ganglia-general] Combining metrics from several RRD files

2014-01-31 Thread Martin Knoblauch

Hi friends,

 hope somebody already had this problem and solved it. So I have a cluster
were we monitor the status (size, used, free) for several filesystems using
Ganglia. Looks all great in the browser, but now the customer wants to have
those data sets combined into one. In order to not loose the data we have,
I want to combine those into one RRD. All the source RRDs have identical
structure (RRAs) and timestamps.

Any solution? Ideas?

Cheers
Martin


-- 
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de
--
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Combining metrics from several RRD files

2014-01-31 Thread Martin Knoblauch

Hi Arnau,

not completely :-) I actually want to extract the data from the RRD files
and combine them into one, adding up the vaules. Good thing, I found out
about rrdtool xport. I does what I want on the extracting. Now I just
need to do the summing up.

Cheers
Martin

On Fri, Jan 31, 2014 at 11:21 AM, Arnau Bria listsar...@gmail.com wrote:

On Fri, 31 Jan 2014 10:37:19 +0100
Martin Knoblauch wrote:

Hi friends,
Hi,

hope somebody already had this problem and solved it. So I have a
cluster were we monitor the status (size, used, free) for several
filesystems using Ganglia. Looks all great in the browser, but now
the customer wants to have those data sets combined into one. In
order to not loose the data we have, I want to combine those into one
RRD. All the source RRDs have identical structure (RRAs) and
timestamps.

Any solution? Ideas?

If I've understood you property:

1.-) use the Aggregate Graphs from ganglia's web.
2.-) create a custom grpah and add it to one host :
quick google search:

http://sourceforge.net/mailarchive/forum.php?thread_name=503E2A47.6020705%40gmail.comforum_name=ganglia-general

3.-) as they are RRDs you can mix them using your own script (bash,
perl, python)

HTH,

Cheers
Martin
Arnau

--
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.

http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

--
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de
--
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991iu=/4140/ostg.clktrk___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Combining metrics from several RRD files

2014-01-31 Thread Martin Knoblauch

Hi Vladimir,

thanks. That is also an option. What I came up with is the following RRD
magic. Combines 3 metrics metrics from 4 filesystems, removes the NaNs and
computes the percentage used accurately:

rrdtool xport --start now-366d --end now-1d \
DEF:t000=vault_000_total.rrd:sum:AVERAGE \
DEF:t001=vault_001_total.rrd:sum:AVERAGE \
DEF:t002=vault_002_total.rrd:sum:AVERAGE \
DEF:t003=vault_003_total.rrd:sum:AVERAGE \
CDEF:total=t000,t001,ADDNAN,t002,ADDNAN,t003,ADDNAN,1.09951E+12,/ \
DEF:u000=vault_000_used.rrd:sum:AVERAGE \
DEF:u001=vault_001_used.rrd:sum:AVERAGE \
DEF:u002=vault_002_used.rrd:sum:AVERAGE \
DEF:u003=vault_003_used.rrd:sum:AVERAGE \
CDEF:used=u000,u001,ADDNAN,u002,ADDNAN,u003,ADDNAN,1.09951E+12,/ \
DEF:a000=vault_000_avail.rrd:sum:AVERAGE \
DEF:a001=vault_001_avail.rrd:sum:AVERAGE \
DEF:a002=vault_002_avail.rrd:sum:AVERAGE \
DEF:a003=vault_003_avail.rrd:sum:AVERAGE \
CDEF:avail=a000,a001,ADDNAN,a002,ADDNAN,a003,ADDNAN,1.09951E+12,/ \
CDEF:pctc=total,avail,-,total,/ \
XPORT:total:Total (TB) XPORT:used:Used (TB) XPORT:avail:Avail (TB)
XPORT:pctc:PCT used (%)

RRDTOOL is cool :-)

Cheers
Martin

On Fri, Jan 31, 2014 at 3:26 PM, Vladimir Vuksan vli...@veus.hr wrote:

Another alternative is to use CSV or JSON export from the Web Ui eg

http://blog.vuksan.com/2012/04/06/

It will eg export all values from aggregate graphs as well so you can do
the summing

On 31. siječnja 2014. 09:19:30 EST, Martin Knoblauch kn...@knobisoft.de
wrote:

Hi Arnau,

not completely :-) I actually want to extract the data from the RRD
files and combine them into one, adding up the vaules. Good thing, I found
out about rrdtool xport. I does what I want on the extracting. Now I just
need to do the summing up.

Cheers
Martin

On Fri, Jan 31, 2014 at 11:21 AM, Arnau Bria listsar...@gmail.comwrote:

On Fri, 31 Jan 2014 10:37:19 +0100
Martin Knoblauch wrote:

Hi friends,
Hi,

Any solution? Ideas?

If I've understood you property:

1.-) use the Aggregate Graphs from ganglia's web.
2.-) create a custom grpah and add it to one host :
quick google search:

http://sourceforge.net/mailarchive/forum.php?thread_name=503E2A47.6020705%40gmail.comforum_name=ganglia-general

3.-) as they are RRDs you can mix them using your own script (bash,
perl, python)

HTH,

Cheers
Martin
Arnau

Vladimir

Re: [Ganglia-general] Java/JMX plugin for Ganglia 3.1.x

2012-09-17 Thread Martin Knoblauch

Hi Daniel,

 JMXetric is one of the options I am considering. The other is JMXtrans. Both 
are now using gmetric4j.

- JMXetric has the advantage that I can instrument the tomcat directly and send 
to the local gmond, without any spoofing. The disadvantage is that it changes 
the application and needs a lot testing for productive use
- JMXtrans has the advantage that it is external to the application. The beauty 
is that one *could* have a central JMX aggregator which would spoof the data to 
the aggregating gmonds. Unfortunatelly there seems to be a prblem with 
spoofing, gmetric4j and the 3.1 wireformat. Seems this is just not supported. 
Alternatively one could of course run local JMXtrans instances on evers tomcat 
host. Not that nice ...

 Brings me back to my question at the developers list. What is the story of 
gmetric4j vs. spoofing.

Cheers

Martin 

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



 From: Daniel Pocock dan...@pocock.com.au
To: ganglia-general@lists.sourceforge.net 
Sent: Sunday, September 16, 2012 8:51 PM
Subject: Re: [Ganglia-general] Java/JMX plugin for Ganglia 3.1.x
 


Have you looked at JMXetric?

The latest code is in the main community github now

  https://github.com/ganglia/jmxetric

It originated here:

  http://code.google.com/p/jmxetric/

but I have recently split the JMX stuff, so that non-JMX users can just
use it as gmetric4j.  So for JMX, you use gmetric4j + jmxetric together.




On 16/09/12 15:02, Martin Knoblauch wrote:
 Hi Peter,
 
  thanks. Unfortunatelly due to the situation at the customer ite I am bound 
to 3.1.x. But I will remember this.
 
 Cheers
 
 Martin 
 
 --
 Martin Knoblauch
 email: k n o b i AT knobisoft DOT de
 www:  http://www.knobisoft.de
 
 
 
 From: Peter Phaal peter.ph...@gmail.com
 To: Martin Knoblauch kn...@knobisoft.de 
 Cc: ganglia general ganglia-general@lists.sourceforge.net 
 Sent: Saturday, September 15, 2012 12:57 AM
 Subject: Re: [Ganglia-general] Java/JMX plugin for Ganglia 3.1.x

 Martin,

 If you can upgrade to the latest Ganglia release you could use sFlow
 to monitor your Tomcat servers, the jxm-sflow-agent exports standard
 JVM metrics, or the tomcat-sflow-valve can export the JVM metrics as
 well as HTTP counters and transactions.

 http://host-sflow.sourceforge.net/relatedlinks.php

 Cheers,
 Peter

 On Thu, Sep 13, 2012 at 5:43 AM, Martin Knoblauch kn...@knobisoft.de 
 wrote:
 Hi,

   as part of a larger tomcat deployment I need to monitor several tomcat
 instances and want to add the measured data to a Ganglia setup. I already
 found JMXtrans which seems a cool solution, but it uses host spoofing and
 I am not sure it is what I really want. Needs some real investigating.

   What I would love would to have would be a Gmond plugin that just can add
 the measured metric to the system metrics. Has anybody already done such a
 plugin or is working on it? I could provide testing, feedback and maybe
 help.

 Cheers
 Martin
 --
 Martin Knoblauch
 email: k n o b i AT knobisoft DOT de
 www: http://www.knobisoft.de

 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general






 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://ad.doubleclick.net/clk;258768047;13503038;j?
 http://info.appdynamics.com/FreeJavaPerformanceDownload.html


 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
Live Security Virtual

Re: [Ganglia-general] Java/JMX plugin for Ganglia 3.1.x

2012-09-16 Thread Martin Knoblauch

Hi Peter,

 thanks. Unfortunatelly due to the situation at the customer ite I am bound to 
3.1.x. But I will remember this.

Cheers

Martin 

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



 From: Peter Phaal peter.ph...@gmail.com
To: Martin Knoblauch kn...@knobisoft.de 
Cc: ganglia general ganglia-general@lists.sourceforge.net 
Sent: Saturday, September 15, 2012 12:57 AM
Subject: Re: [Ganglia-general] Java/JMX plugin for Ganglia 3.1.x
 
Martin,

If you can upgrade to the latest Ganglia release you could use sFlow
to monitor your Tomcat servers, the jxm-sflow-agent exports standard
JVM metrics, or the tomcat-sflow-valve can export the JVM metrics as
well as HTTP counters and transactions.

http://host-sflow.sourceforge.net/relatedlinks.php

Cheers,
Peter

On Thu, Sep 13, 2012 at 5:43 AM, Martin Knoblauch kn...@knobisoft.de wrote:
 Hi,

  as part of a larger tomcat deployment I need to monitor several tomcat
 instances and want to add the measured data to a Ganglia setup. I already
 found JMXtrans which seems a cool solution, but it uses host spoofing and
 I am not sure it is what I really want. Needs some real investigating.

  What I would love would to have would be a Gmond plugin that just can add
 the measured metric to the system metrics. Has anybody already done such a
 plugin or is working on it? I could provide testing, feedback and maybe
 help.

 Cheers
 Martin
 --
 Martin Knoblauch
 email: k n o b i AT knobisoft DOT de
 www: http://www.knobisoft.de

 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general



--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

[Ganglia-general] Java/JMX plugin for Ganglia 3.1.x

2012-09-13 Thread Martin Knoblauch

Hi,

 as part of a larger tomcat deployment I need to monitor several tomcat 
instances and want to add the measured data to a Ganglia setup. I already found 
JMXtrans which seems a cool solution, but it uses host spoofing and I am not 
sure it is what I really want. Needs some real investigating.


 What I would love would to have would be a Gmond plugin that just can add the 
measured metric to the system metrics. Has anybody already done such a plugin 
or is working on it? I could provide testing, feedback and maybe help.

Cheers

Martin 

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Ganglia gmond memory leak?

2012-02-27 Thread Martin Knoblauch

Hi Aidan,

 for what it is worth, I cannot reproduce the growing memory consumption on a 
small 3.2.0 grid using only standard metrics in unicast mode. Running now for a 
few hours. Will check again tomorrow.

Cheers

Martin 

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



 From: Aidan Wong aidanw...@attinteractive.com
To: Ave-Lallemant, Nathan P nathan.p.ave-lallem...@efleets.com; 
ganglia-general ganglia-general@lists.sourceforge.net 
Sent: Thursday, February 23, 2012 8:34 AM
Subject: Re: [Ganglia-general] Ganglia gmond memory leak?
 

I've restarted the gmond process and memory usage drops until gmond hogs 
memory over time.  Any Ganglia contributors who may want to chime in on this 
memory leak issue?  I'm on Ganglia 3.2.0.  Are there any improvements on 
version 3.3.1 addressing this issue?


Thanks

From: Ave-Lallemant, Nathan P nathan.p.ave-lallem...@efleets.com
Date: Wed, 22 Feb 2012 16:31:58 -0600
To: Aidan Wong aidanw...@attinteractive.com, ganglia-general 
ganglia-general@lists.sourceforge.net
Subject: RE: Ganglia gmond memory leak?



 
I have seen the same behavior in my environment but do not have a solution.
 
 
Nathan


 
From:Aidan Wong [mailto:aidanw...@attinteractive.com] 
Sent: Wednesday, February 22, 2012 4:10 PM
To: ganglia-general
Subject: [Ganglia-general] Ganglia gmond memory leak?
 
Hi it looks like my install of gmond version 3.2.0 is leaking memory.   The 
amount of resident used memory that the process uses, gets up pretty high and 
keeps increasing.
 
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     18647  0.0  9.9 2965464 1836268 ?     Ss   Jan14  11:24 
/home/t/hadoop-ganglia-client/sbin/gmond -c 
/home/t/hadoop-ganglia-client/gmond.conf -p 
/home/t/hadoop-ganglia-client/logs/gmond.pid
 
Is this a bug?  Can anyone suggest a solution?
 
Thank you

 CONFIDENTIALITY NOTICE: This e-mail and any files transmitted with it are 
 intended solely for the use of the individual or entity to whom they are 
 addressed and may contain confidential and privileged information protected 
 by law. If you received this e-mail in error, any review, use, dissemination, 
 distribution, or copying of the e-mail is strictly prohibited. Please notify 
 the sender immediately by return e-mail and delete all copies from your 
 system.


 
--
Virtualization  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Ganglia gmond memory leak?

2012-02-23 Thread Martin Knoblauch

Hi Aidan,

 if possible for you, I would suggest running the gmond in foreground under 
the control of valgrind or a similar tool. Send us the report generated by 
the tool.

Cheers

Martin 

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



 From: Aidan Wong aidanw...@attinteractive.com
To: Ave-Lallemant, Nathan P nathan.p.ave-lallem...@efleets.com; 
ganglia-general ganglia-general@lists.sourceforge.net 
Sent: Thursday, February 23, 2012 8:34 AM
Subject: Re: [Ganglia-general] Ganglia gmond memory leak?
 

I've restarted the gmond process and memory usage drops until gmond hogs 
memory over time.  Any Ganglia contributors who may want to chime in on this 
memory leak issue?  I'm on Ganglia 3.2.0.  Are there any improvements on 
version 3.3.1 addressing this issue?


Thanks

From: Ave-Lallemant, Nathan P nathan.p.ave-lallem...@efleets.com
Date: Wed, 22 Feb 2012 16:31:58 -0600
To: Aidan Wong aidanw...@attinteractive.com, ganglia-general 
ganglia-general@lists.sourceforge.net
Subject: RE: Ganglia gmond memory leak?



 
I have seen the same behavior in my environment but do not have a solution.
 
 
Nathan


 
From:Aidan Wong [mailto:aidanw...@attinteractive.com] 
Sent: Wednesday, February 22, 2012 4:10 PM
To: ganglia-general
Subject: [Ganglia-general] Ganglia gmond memory leak?
 
Hi it looks like my install of gmond version 3.2.0 is leaking memory.   The 
amount of resident used memory that the process uses, gets up pretty high and 
keeps increasing.
 
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     18647  0.0  9.9 2965464 1836268 ?     Ss   Jan14  11:24 
/home/t/hadoop-ganglia-client/sbin/gmond -c 
/home/t/hadoop-ganglia-client/gmond.conf -p 
/home/t/hadoop-ganglia-client/logs/gmond.pid
 
Is this a bug?  Can anyone suggest a solution?
 
Thank you

 CONFIDENTIALITY NOTICE: This e-mail and any files transmitted with it are 
 intended solely for the use of the individual or entity to whom they are 
 addressed and may contain confidential and privileged information protected 
 by law. If you received this e-mail in error, any review, use, dissemination, 
 distribution, or copying of the e-mail is strictly prohibited. Please notify 
 the sender immediately by return e-mail and delete all copies from your 
 system.


 
--
Virtualization  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
Virtualization  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Ganglia gmond memory leak?

2012-02-23 Thread Martin Knoblauch

Hi Jesse,

 but in that case the memory footprint of gmond would approach a maximum 
after some time - correct? Aidan did not say whether it grows forever or goes 
asymptotic. Aidan?

 
Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



 From: Jesse Becker haw...@gmail.com
To: Aidan Wong aidanw...@attinteractive.com 
Cc: ganglia-general ganglia-general@lists.sourceforge.net 
Sent: Thursday, February 23, 2012 2:36 PM
Subject: Re: [Ganglia-general] Ganglia gmond memory leak?
 
How many metrics are you monitoring?  gmond must allocated memory for
each metric, from each host.  If you are using multicast, each gmond
instance will get metrics from all other instances.

If you run gmond in isolation--no traffic to/from other gmond
instances--does memory usage still go up?

On Wed, Feb 22, 2012 at 17:10, Aidan Wong aidanw...@attinteractive.com wrote:
 Hi it looks like my install of gmond version 3.2.0 is leaking memory.   The
 amount of resident used memory that the process uses, gets up pretty high
 and keeps increasing.

 USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
 root     18647  0.0  9.9 2965464 1836268 ?     Ss   Jan14  11:24
 /home/t/hadoop-ganglia-client/sbin/gmond -c
 /home/t/hadoop-ganglia-client/gmond.conf -p
 /home/t/hadoop-ganglia-client/logs/gmond.pid

 Is this a bug?  Can anyone suggest a solution?

 Thank you

 --
 Virtualization  Cloud Management Using Capacity Planning
 Cloud computing makes use of virtualization - but cloud computing
 also focuses on allowing computing to be delivered as a service.
 http://www.accelacomm.com/jaw/sfnl/114/51521223/
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general




-- 
Jesse Becker

--
Virtualization  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
Virtualization  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

[Ganglia-general] Looking for 3.1.7 binaries/rpms for RHEL-5.x on IA64

2011-12-20 Thread Martin Knoblauch

Hi folks,

 someone have those available? Species on the extinction list - I know, but a 
customer has a bunch of those.

Thanks in advance

Martin
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de--
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

[Ganglia-general] Gmond not reporting some metrics (3.1.7 unicast running on RHEL-6.1)

2011-11-23 Thread Martin Knoblauch

Hi,

 while setting up a new cluster, I came  across the following problem:

a) Headnode RHEL-6.1 (x86_64, ESX VM, yum up-to-date) with gmetad/gmond 3.1.7 
RPMs from EPEL
b) Gmond node RHEL-6.1 (x86_64, real hardware, not up-to-date for customer 
reason) 3.1.7 RPM from EPEL, different network


 Unicast setup, with both gmonds reporting to themselves and to each other. 
Multicast not possible due to Switch/Router refusing to do multicast.

 The gmond-only node fails to report bytes in, bytes out, load (besides 
load-1), memory and cpu metrics. Under debug I see that it is monitoring those 
metrics, but not sending, although there should be changes beyond the 
thresholds. The node with gmond/gmetad works great.

 Any ideas? I saw some similar reports with RHEL-5.5, but no conclusinon.

 If needed, I can produce config files tomorrow.


Cheers

Martin 

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] revisiting bogus spikes

2011-07-18 Thread Martin Knoblauch

Hi David,

 this is kind of helpful. What seems to happen is that the bytes in counter 
(rbi)  for you network card seems to completety wrap around or is going 
backwards  for about 20-210 MB between two calls to update_ifdata. This would 
definitely lead to PB spikes.


 If I recall correctly, this is a bit different from the case that made me 
write that REMOVE_BOGUS_SPIKES thing. There the bogus numbers were much more 
erratic. I modelled the thresholds in the #ifdef REMOVE_BOGUS_SPIKES section:

    if ((l_bin  1.0e13) || (l_bout  1.0e13) ||
    (l_pin  1.0e8)  || (l_pout  1.0e8)) {


 They might not be adequate for your scenario. You may need to add a few more 
debug statemens to find the right values. Without actually having such a system 
at hands I cannot do much more.


Cheers

Martin 

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



From: David Lee david.yi@gmail.com
To: ganglia-general@lists.sourceforge.net
Sent: Monday, July 18, 2011 8:41 AM
Subject: [Ganglia-general] revisiting bogus spikes


I wanted to add to the original thread regarding bogus spikes in network 
graphs, which were suspected to be caused by broadcom NICs that ship with many 
of the HP Proliant series servers today. We're running HP BL460G6, with vmware 
ESXi 4.1u1 hypervisors, and RHEL5.3 x64 guests. Using gmond-3.2 built off of 
the ganglia-3.2.0 source rpm, we're seeing the network spikes as well (PB 
range). 


Running in debug=10, I've found entries like this:


update_ifdata(BO) - Overflow in rbi: 910239662712 - 910029125551
 ** bytes_out:  234956.359375
        metric 'bytes_out' has value_threshold 4096.00
        metric 'bytes_in' being collected now
 ** bytes_in:  461075631262662656.00
        metric 'bytes_in' has value_threshold 4096.00
        metric 'pkts_in' being collected now
 ** pkts_in:  251.174362
        metric 'pkts_in' has value_threshold 256.00
        metric 'pkts_out' being collected now
 ** pkts_out:  166.366455
        metric 'pkts_out' has value_threshold 256.00






update_ifdata(BO) - Overflow in rbi: 916309233232 - 916289211909
 ** bytes_out:  375413.312500
        metric 'bytes_out' has value_threshold 4096.00
        metric 'bytes_in' being collected now
 ** bytes_in:  461094494759026688.00
        metric 'bytes_in' has value_threshold 4096.00
        metric 'pkts_in' being collected now
 ** pkts_in:  498.569885
        metric 'pkts_in' has value_threshold 256.00
        metric 'pkts_out' being collected now
 ** pkts_out:  303.376251
        metric 'pkts_out' has value_threshold 256.00




Kernel 2.6.18-128.el5 #1




I was not able to find any other obvious error messages related to interface 
metrics. We are seeing this across all of our Proliant series servers.


Thanks
DL




--
AppSumo Presents a FREE Video for the SourceForge Community by Eric 
Ries, the creator of the Lean Startup Methodology on Lean Startup 
Secrets Revealed. This video shows you how to validate your ideas, 
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
AppSumo Presents a FREE Video for the SourceForge Community by Eric 
Ries, the creator of the Lean Startup Methodology on Lean Startup 
Secrets Revealed. This video shows you how to validate your ideas, 
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] revisiting bogus spikes

2011-06-22 Thread Martin Knoblauch

Hi Patrick,

 it would be *really* important to see the debug messages that are part of the 
network metric code on Linux. That way we would see what the counters are when 
the spikes happen. This could provide more insight.


 As for making my/the REMOVE_BOGUS_SPIKES default I have my doubts. At least in 
the current form it is modelled very strict to the failure mode I experienced 
back in 200x. It also has some smoothing/levelling effect on the data that 
might not be welcome. Especially with interfaces faster than 1G.

Cheers

Martin 

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de


- Original Message -
 From: Patrick Gilbert pdgilb...@gmail.com
 To: Ganglia-general@lists.sourceforge.net
 Cc: 
 Sent: Wednesday, June 22, 2011 2:36 AM
 Subject: Re: [Ganglia-general] revisiting bogus spikes
 
 So, to add to some of the data I've read here:
 
 I'm also experiencing this issue on VMX3 clusters with
 para-virtualization enabled. Seems odd that an OS that has no real
 knowledge of the physical network hardware would also exhibit the
 spiking issue. Has anyone else experienced this?
 
 To be fair, the underlying hardware does contain the Broadcom NICs.
 
 Also on this same topic, will the REMOVE_BOGUS_SPIKES flag be a
 default flag on future releases? Can anyone confirm this works ( so I
 don't have to recomplie :)?
 
 Thanks,
 
 Patrick Gilbert
 
 --
 Simplify data backup and recovery for your virtual environment with vRanger.
 Installation's a snap, and flexible recovery options mean your data is safe,
 secure and there when you need it. Data protection magic?
 Nope - It's vRanger. Get your free trial download today.
 http://p.sf.net/sfu/quest-sfdev2dev
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 vg


--
Simplify data backup and recovery for your virtual environment with vRanger.
Installation's a snap, and flexible recovery options mean your data is safe,
secure and there when you need it. Data protection magic?
Nope - It's vRanger. Get your free trial download today.
http://p.sf.net/sfu/quest-sfdev2dev
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] revisiting bogus spikes

2011-04-28 Thread Martin Knoblauch

Hi Cameron, [adding the developers list]

 OK:

1) we write the unmodified data in line 233 to capture the raw counters. That 
is what we are using in line 227 for the comparison
2) ns is created and returned by hash_lookup
3) The ULONG_MAX logic in line 231 is there because we need to ensure that the 
result is always positive. Needed because the variables are unsigned.
4) update_ifdata is called once by metric_init and then every time one of 
the byte/pkts_in/out collectors fires

 Now this does not solve your problem ... Question: do you see any of the debug 
messages that should be created by update_ifdata in case of something 
unusual? 
That should help to get an idea on how the interface counters on your 
machine(s) 
look like. Lokk in /var/log/messages, or just start gmond noninteractive.

 Hmm. Another question: do you compile gmond in 64-bit or 32-bit mode? The 
ULONG_MAX logic may/will fail in 32-bit mode, if the kernel is 64-bit. It could 
even be that the interface counters on 32-bit kernels are written as 64-bit 
values.

Hope this helps

Martin 
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de



From: Cameron L. Spitzer cspit...@nvidia.com
To: ganglia-general@lists.sourceforge.net 
ganglia-general@lists.sourceforge.net
Sent: Thu, April 28, 2011 3:21:04 AM
Subject: [Ganglia-general] revisiting bogus spikes


Once again I've been asked to make Ganglia usable on Linux hosts with the 
Broadcom NIC with the 32-bit byte counters.
E.g., HP Proliant 580 G5, a rather popular machine where Ganglia doesn't 
work 
out of the box.

So I'm trying to understand ganglia-3.1.7/libmetrics/linux/metrics.c again.

In update_ifdata(), we parse /proc/net/dev for the current bytes and packets 
in 
and out.
There's a structure ns (declared where?) of type net_dev_stats, representing 
the previous sample?
I'm not sure exactly what ns represents.

There's a sanity check at line 227   if ( rbi = ns-rbi )  for whether the 
counter went up or down.  If it went down, we assume the counter rolled 
around, 
and guess the value is negative, and invert it, line 231.  l_bytes_in += 
ULONG_MAX - ns-rbi + rbi;
(I don't understand how that is supposed to work.)
Then, regardless of whether the sample passed or failed the sanity check, it's 
saved in the ns structure.
Line 233, ns-rpi = rpi;

After the parsing is all done, and the crazy value is in ns, an optional 
reasonableness test (REMOVE_BOGUS_SPIKES)
returns early if any of the numbers are extremely large.  Otherwise it updates 
the static running counts and then returns.
On our HP 580G5s, defining REMOVE_BOGUS_SPIKES had no effect.  The network 
traffic graphs become useless within a minute of starting gmond.

The part I don't understand is when the line 227 check fails, we put the 
known-bad data in ns anyway.

I'd appreciate it if someone familiar with update_ifdata() could explain its 
logic.  When is this routine called?
(I can see modules/network/mod_net.c calls it via bytes_in_func(), but I 
haven't 
figured out when net_metric_handler()
is called.  Maybe that would explain how bogus data in ns doesn't matter.)
Is there any way to keep way out-of-scale data out of these graphs?
Thanks for any help.

-Cameron in Los Gatos






 
This email message is for the sole use of the intended recipient(s) and may  
contain confidential information.  Any unauthorized review, use, disclosure  
or 
distribution is prohibited.  If you are not the intended recipient,  please 
contact the sender by reply email and destroy all copies of the original  
message. 


--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

[Ganglia-general] Fw: Network bytes spikes

2011-03-30 Thread Martin Knoblauch

forgot the list ...

 --
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de



- Forwarded Message 
From: Martin Knoblauch kn...@knobisoft.de
To: Bostjan Skufca bost...@a2o.si
Sent: Wed, March 30, 2011 11:42:12 AM
Subject: Re: [Ganglia-general] Network bytes spikes


Hi Bostjan,

 yes, the REMOVE_BOGUS_SPIKES workaround is *supposed* to work. It did for me, 
when I wrote it :-)

Cheers

Martin 
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de




From: Bostjan Skufca bost...@a2o.si
To: Vladimir Vuksan vli...@veus.hr
Cc: ganglia-general ganglia-general@lists.sourceforge.net
Sent: Tue, March 29, 2011 9:25:28 PM
Subject: Re: [Ganglia-general] Network bytes spikes

That really seems to be the case. Speaking out of my head now but it seems 
that 
I only see this on HP DL3x0 with Broadcom Corporation NetXtreme II BCM5708 
Gigabit Ethernet (rev 12) interfaces. I've found some threads...

Anyway, does this really work? There is something in code which eliminates 
1e^13 
and bigger or so it seems...

make CPPFLAGS=-DREMOVE_BOGUS_SPIKES

b.



On 29 March 2011 20:30, Vladimir Vuksan vli...@veus.hr wrote:


I see it all the time :-(. According to Bernard this is due to problem
with some of the Broadcom cards. Perhaps Bernard can offer more insight.


On Tue, 29 Mar 2011 20:23:31 +0200, Bostjan Skufca bost...@a2o.si wrote:
 Hi,

 occasionally I notice huge spikes in network graphs in ganglia
(petabytes
 per second or so). Not sure whether those are caused by gmond restarts
or
 network interface byte counter overflows or something else.
 Is someone else also seeing similar behaviour? Running latest ganglia
 (3.1.7).

 b.

--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Network bytes spikes

2011-03-30 Thread Martin Knoblauch

Hi Cameron,

 there are two problems:

a) overflow. 32-bit counters will not last very long on 1 Gbit or faster. They 
should not repord PB spikes though.
b) some BMC adapters on Linux-64 had/have a really bad HW bug reporting bogus 
counters every now and then. That is supposed to be fixed by 
REMOVE_BOGUS_SPIKES, but only on Linux. But no guarantees. It worked for me on 
3.0.7.

 Cheers

 Martin--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de



From: Cameron Spitzer cspit...@nvidia.com
To: Bostjan Skufca bost...@a2o.si
Cc: ganglia-general ganglia-general@lists.sourceforge.net
Sent: Tue, March 29, 2011 11:01:24 PM
Subject: Re: [Ganglia-general] Network bytes spikes


CPPFLAGS=-DREMOVE_BOGUS_SPIKES
had no effect in my installation.
We eventually found a patch in a non-ganglia forum somewhere, but I can't find 
it now.
It basically added input sanity checking.

The problem is a 32-bit counter on a 1 Gbps NIC can overflow in less than 
gmond's sampling interval.
When it overflows, ganglia treats the small negative number as a very large 
positive.
This is a known ganglia bug.  It's been around since 2003.  You just have to 
live with it, or try to fix it yourself.

-Cameron



Bostjan Skufca wrote: 
That really seems to be the case. Speaking out of my head now but it seems 
that 
I only see this on HP DL3x0 with Broadcom Corporation NetXtreme II BCM5708 
Gigabit Ethernet (rev 12) interfaces. I've found some threads...

Anyway, does this really work? There is something in code which eliminates 
1e^13 
and bigger or so it seems...

make CPPFLAGS=-DREMOVE_BOGUS_SPIKES

b.



On 29 March 2011 20:30, Vladimir Vuksan vli...@veus.hr wrote:


I see it all the time :-(. According to Bernard this is due to problem
with some of the Broadcom cards. Perhaps Bernard can offer more insight.


On Tue, 29 Mar 2011 20:23:31 +0200, Bostjan Skufca bost...@a2o.si wrote:
 Hi,

 occasionally I notice huge spikes in network graphs in ganglia
(petabytes
 per second or so). Not sure whether those are caused by gmond restarts
or
 network interface byte counter overflows or something else.
 Is someone else also seeing similar behaviour? Running latest ganglia
 (3.1.7).

 b.





 
This email message is for the sole use of the intended recipient(s) and may  
contain confidential information.  Any unauthorized review, use, disclosure  or 
distribution is prohibited.  If you are not the intended recipient,  please 
contact the sender by reply email and destroy all copies of the original  
message. 


--
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Multicast/Unicast Poll

2011-01-13 Thread Martin Knoblauch

- Original Message 

 From: Seth Graham set...@fnal.gov
 To: Jesse Becker haw...@gmail.com
 Cc: Ganglia Mailing List ganglia-general@lists.sourceforge.net
 Sent: Wed, January 12, 2011 10:31:49 PM
 Subject: Re: [Ganglia-general] Multicast/Unicast Poll

 On Jan 12, 2011, at 3:12 PM, Jesse Becker wrote:

  In light of the  recent discussions over metadata and unicast vs.
  multicast, we (meaning  Bernard) have created a poll on
  http://ganglia.info/ to try and gauge the use of each.   Please let us
  know if you use multicast, unicast, or both in your  environments.

  If you have any comments about using one or the  other, 

 We used multicast for a long time because it's certainly easy,  and ganglia 
 is 
something multicast is well suited for.

 But as the years  rolled on, firewalls got involved, people became concerned 
about memory and  network usage, and subnet privacy was eroding.  We started 
getting other  departments' machines mixed in with our machines, and this 
caused 
all kinds of  confusion on both sides.

 Migrating to unicast eliminated the firewall  issues, means only a select few 
machines have to keep metrics in memory, and no  more cross talk with other 
groups. I never saw any solid evidence that ganglia  was putting an unfair 
load 
on systems, but it was easier to reconfigure than  fight it.

 So the reasons to switch were mostly  political.

 Basically my reasons for using unicast are very much the same.

 For new installations I will always use UC today. For old installations I am 
moving from MC to UC if the situation allows.

Cheers
Martin

--
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand 
malware threats, the impact they can have on your business, and how you 
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] last-N-hours view

2010-12-29 Thread Martin Knoblauch

Hi Ryan,

 it works as designed :-) your new intervalls do not have proper samples in the 
RRD database, so the graphs are blown up from the day intervall.

 You need to tell gmetad to generate samples for the two and three hour 
intervalls. Something like this in gmetad.conf should do, although I am no 
specialist.

RRAs RRA:AVERAGE:0.5:1:244 RRA:AVERAGE:0.5:24:244 \ 
  RRA:AVERAGE:0.5:2:244 RRA:AVERAGE:0.5:3:244 \ 
  RRA:AVERAGE:0.5:168:244 RRA:AVERAGE:0.5:672:244 \
  RRA:AVERAGE:0.5:5760:374


 Warning: you need to remove (or move) your old  data first. Or you need some 
rrd-magic to add the new intervalls to the old database.

Cheers
Martin
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www: http://www.knobisoft.de



From: 朱韬 ryanzhu...@163.com
To: ganglia-general@lists.sourceforge.net
Sent: Wed, December 29, 2010 8:36:38 AM
Subject: [Ganglia-general] last-N-hours view

Hi guys:
   I enountered the problem that my job lasted for a few hours while 
 ganglia 
do not support last-N-hours view. 


So I tried to add to two view model to conf.php  as follows:
$time_ranges = array(
   'hour'=3600,
   'twohours'=7200,
   'threehours'=10800,
   'day'=86400,
   'week'=604800,
   'month'=2419200,
   'year'=31449600
);
  But it does not work as it should be. The resultion of the modified 
 model 
is much lower these orginal ones.
Is there any other code to be modified?
 Thank  you

   

 ryan zhu


--
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] restarting the gmond collector node causes no data to be reported

2010-11-18 Thread Martin Knoblauch

From: Cameron L. Spitzer cspit...@nvidia.com
To: Bernard Li bern...@vanhpc.org
Cc: Louis Coilliot louis.coill...@think.fr; 
ganglia-general@lists.sourceforge.net ganglia-general@lists.sourceforge.net
Sent: Wed, November 17, 2010 10:36:00 PM
Subject: Re: [Ganglia-general] restarting the gmond collector node causes no 
data to be reported

Just out of curiosity, I followed the link in Bernard's message.
I didn't find anything related to Russell's question.
I followed the link to Current Release Notes, and searched the page for 
send_metadata_interval, which is cheating,
because I would only have Russell's question if I didn't know about 
send_metadata_interval.

Then I followed the link to Ganglia FAQs.
Someone who already understood Ganglia pretty well might make the connection 
between
Russells's question  ... no metrics are reported anymore and the FAQ 
Sometimes graphs don't show up for hosts.
I doubt a newcomer would see it.  That's unclear.

 Definitely, one of the not-so-strong points of Ganglia is documentation. 
Frankly I use it for quite some years now, but this behavior/option was new to 
me.

Cheers
Martin

--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] restarting the gmond collector node causes no data to be reported

2010-11-18 Thread Martin Knoblauch

Hi Bernard,

- Original Message 

 From: Bernard Li bern...@vanhpc.org
 To: Louis Coilliot louis.coill...@think.fr
 Cc: ganglia-general@lists.sourceforge.net
 Sent: Wed, November 17, 2010 9:16:22 PM
 Subject: Re: [Ganglia-general] restarting the gmond collector node causes no 
data to be reported
 
 Hello:
 
 This is actually documented in both the release notes and the FAQs  in our 
Wiki:
 
 http://sourceforge.net/apps/trac/ganglia/wiki
 
 Please  let us know if anything is unclear.
 
 Thanks,
 
 Bernard

 besides that this is really unclear and difficult to find, we may want to 
consider a different default for unicast mode. It is always better to not let 
people run into forseeable problems.

Cheers
Martin

 
 On Wed,  Nov 17, 2010 at 1:14 PM, Louis Coilliot louis.coill...@think.fr  
wrote:
  Hello, this behaviour is reported from time to time with unicast  :)
 
  Use:
  send_metadata_interval = 600
 
   (600, for example)
 
  on the gmond.conf for your  nodes.
 
  The metrics should get back after a  while.
 
  Louis
 

--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] restarting the gmond collector node causes no data to be reported

2010-11-18 Thread Martin Knoblauch

- Original Message 

 From: Kostas Georgiou k.georg...@atreides.org.uk
 To: ganglia-general@lists.sourceforge.net
 Sent: Thu, November 18, 2010 11:57:29 AM
 Subject: Re: [Ganglia-general] restarting the gmond collector node causes no 
data to be reported
 
 On Thu, Nov 18, 2010 at 02:44:13AM -0800, Martin Knoblauch  wrote:
 
   besides that this is really unclear and difficult to  find, we may want to 
  consider a different default for unicast mode. It  is always better to not 
let 

  people run into forseeable  problems.
 
 You can get the same problems with multicast as well, what is  the

 Does this really happen in MC mode? I would call that a bug then.

 reasoning for the send_metadata_interval=0  default?
 

 Can't answer that one.

cheers
Martin


--
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2  L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today
http://p.sf.net/sfu/msIE9-sfdev2dev
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

[Ganglia-general] Fw: How can gmetad be configured for 2 clusters?

2010-11-12 Thread Martin Knoblauch

sorry, forgot the list ...

 --
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



- Forwarded Message 
 From: Martin Knoblauch kn...@knobisoft.de
 To: Whit Blauvelt w...@transpect.com
 Sent: Fri, November 12, 2010 5:35:44 PM
 Subject: Re: [Ganglia-general] How can gmetad be configured for 2 clusters?
 
 Hi Whit,
 
  let me guess, all of your machines are running multicast, and  all are on 
 the 

 same port? As a result, every gmond will have the complete  information for 
all 

 8 nodes. That is what you see. Try telnet 192.168.19  8649 and you will see 
the 

 info of all eight nodes.
 
  In order to  separate the two clusters, they need to run on different ports.
 
  In  addition: when you list more than one node on the data_source, this does 
not 

 define the cluster. I just adds failover capability. gmetad will only talk  
to 

 one of the hosts at a time. If that fails, it will try the next on the  list.
 
 Hope this helps a bit
 
 Martin 
 --
 Martin  Knoblauch
 email: k n o b i AT knobisoft DOT de
 www:  http://www.knobisoft.de
 
 
 
 - Original Message 
   From: Whit Blauvelt w...@transpect.com
  To: ganglia-general@lists.sourceforge.net
   Sent: Fri, November 12, 2010 4:53:49 PM
  Subject: [Ganglia-general] How  can gmetad be configured for 2 clusters?
  
  Hi,
  
   Although I've looked through the docs, I must not be looking in the   right
  place. We've added a second cluster, and want to track it as  a  separate
  entity from the first. What intuitively seems likely to  work  doesn't
  accomplish that. I've tried:
  
  -  Defining two clusters like this  in gmetad.conf:
  
   data_source Cluster1 localhost 192.168.19 192.168.1.32   192.168.1.16
  data_source Cluster2 192.168.1.24 192.168.1.8  192.168.1.5  192.168.1.6
  
  - And defining the cluster name  in each  gmond.conf:
  
  cluster {
name =  Cluster1
owner =  unspecified
latlong =  unspecified
url =  unspecified
  }
  
  The result? The Web front end gives a choice of GridCluster1 or 
Cluster2,
  but either choice shows all 8 machines in  both  clusters. (The only
  difference is that under Cluster1 the  Linux members all  have their names
  shown in the listing, while  under Cluster2 the Linux members  are shown 
just
  by IPs - while the  OSX show their names in both cases - but  this isn't the
  show  stopper here.)
  
  No doubt the right solution is as  simple  and obvious as the wrong one I've
  tried. But what is it? All  examples  I've found assume a single  cluster.
  
   Thanks,
  Whit
  
   
--
   Centralized  Desktop Delivery: Dell and VMware Reference  Architecture
  Simplifying  enterprise desktop deployment and  management using
  Dell EqualLogic storage  and VMware View: A highly  scalable, end-to-end
  client virtualization  framework. Read  more!
  http://p.sf.net/sfu/dell-eql-dev2dev
   ___
  Ganglia-general   mailing list
  Ganglia-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/ganglia-general
  另
 

--
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] How can gmetad be configured for 2 clusters?

2010-11-12 Thread Martin Knoblauch

- Original Message 

 From: David Birdsong david.birds...@gmail.com
 To: Whit Blauvelt w...@transpect.com
 Cc: Martin Knoblauch kn...@knobisoft.de; 
ganglia-general@lists.sourceforge.net
 Sent: Fri, November 12, 2010 9:56:26 PM
 Subject: Re: [Ganglia-general] How can gmetad be configured for 2 clusters?

 On Fri, Nov 12, 2010 at 9:19 AM, Whit Blauvelt w...@transpect.com wrote:
  On  Fri, Nov 12, 2010 at 08:35:44AM -0800, Martin Knoblauch  wrote:

   In order to separate the two clusters, they need to  run on different 
ports.

   In addition: when you list more  than one node on the data_source, this 
does not
  define the cluster.  I just adds failover capability. gmetad will only 
talk to
  one of  the hosts at a time. If that fails, it will try the next on the  
list.

  Thanks Martin. That was the whole trick. I was making the  assumption that
  gmetad, being meta, would be the gatherer of data from  the nodes.
  Understanding that the gmonds go ahead and consolidate that  changes the
  picture entirely. As my five-year-old sometimes says, Silly  me.

  Whit

 While I can't argue against something that  clearly fixed this for you,
 this doesn't sound correct and it would be nice  to hear this
 clarified.

 Sure every host would have info about every  other host, but each
 host's xml tree should have all the nodes in a nested in  their
 corresponding cluster tags.  Gmetad could hit any host and pick  up
 info about both clusters on any host, but it should know to  distribute
 the updates from the xml stream to the correct clusters and not  'cross
 pollinate' the two.

 As far as I know, every gmond just puts all the information it has inside its 
own cluster tags. It does not care about the cluster tags it receives from 
other gmonds. It has always been the task of gmetad to build up the correct XML 
for the complete grid. Therefore it is vital that the gmond configuration for 
multiple clusters is correct.

 One could argue that this behaviour of gmond needs improvement. One solution 
could be that it aggregates only data coming from the cluster. On the other 
hand, the cluster tag is just optional. What should a gmond without such a 
tag 
do about data from tagged gmonds? I still favor correct configuration. In any 
case, I am adding ganglia developers to CC.

 But the confusion shows, that documentation might be lacking ...

Cheers
Martin

--
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

[Ganglia-general] Fw: How can gmetad be configured for 2 clusters?

2010-11-12 Thread Martin Knoblauch

 really adding the developers ...

- Forwarded Message 

 From: Martin Knoblauch kn...@knobisoft.de
 To: David Birdsong david.birds...@gmail.com; Whit Blauvelt 
w...@transpect.com
 Cc: ganglia-general@lists.sourceforge.net
 Sent: Sat, November 13, 2010 8:34:43 AM
 Subject: Re: [Ganglia-general] How can gmetad be configured for 2 clusters?

 - Original Message 

  From: David Birdsong david.birds...@gmail.com
   To: Whit Blauvelt w...@transpect.com
  Cc: Martin  Knoblauch kn...@knobisoft.de; 
 ganglia-general@lists.sourceforge.net
   Sent: Fri, November 12, 2010 9:56:26 PM
  Subject: Re: [Ganglia-general]  How can gmetad be configured for 2 clusters?

  On Fri, Nov 12,  2010 at 9:19 AM, Whit Blauvelt w...@transpect.com wrote:
On  Fri, Nov 12, 2010 at 08:35:44AM -0800, Martin Knoblauch   wrote:

In order to separate the two  clusters, they need to  run on different 
 ports.

In addition: when you list more  than one  node on the data_source, 
   this 

 does not
   define the  cluster.  I just adds failover capability. gmetad will 
   only 

 talk  to
   one of  the hosts at a time. If that fails, it will try  the next on the 

 list.

   Thanks Martin.  That was the whole trick. I was making the  assumption 
that
gmetad, being meta, would be the gatherer of data from  the  nodes.
   Understanding that the gmonds go ahead and consolidate  that  changes the
   picture entirely. As my five-year-old  sometimes says, Silly  me.

   Whit

  While I can't argue against something that  clearly fixed this  for you,
  this doesn't sound correct and it would be nice  to hear  this
  clarified.

  Sure every host would have info about  every  other host, but each
  host's xml tree should have all the  nodes in a nested in  their
  corresponding cluster tags.   Gmetad could hit any host and pick  up
  info about both clusters on  any host, but it should know to  distribute
  the updates from the  xml stream to the correct clusters and not  'cross
  pollinate' the  two.

  As far as I know, every gmond just puts all the  information it has inside 
 its 

 own cluster tags. It does not care about the  cluster tags it receives from 
 other gmonds. It has always been the task of  gmetad to build up the correct 
XML 

 for the complete grid. Therefore it is  vital that the gmond configuration 
 for 

 multiple clusters is  correct.

  One could argue that this behaviour of gmond needs  improvement. One 
solution 

 could be that it aggregates only data coming from  the cluster. On the 
 other 

 hand, the cluster tag is just optional. What  should a gmond without such a 
tag 

 do about data from tagged gmonds? I still  favor correct configuration. In 
 any 

 case, I am adding ganglia developers to  CC.

  But the confusion shows, that documentation might be lacking  ...

 Cheers
 Martin

--
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] gmetad only reads from one node of each data_source

2010-10-25 Thread Martin Knoblauch

Hi Marc,the output of telnet seems to indicate that your "gmond"s indeed only see their own data. Kind of strange. I have to admit that I have not used MC configurations for quite some time. UC is so much cleaner in my opinion. Questions:a) how many network interfaces do the "nodes"s have?b) if more than one, to which interface is the MC address bound? If not the first, you may want to play with "mcast_if".Output if "ifconfig -a" and "netstat -rn" would be useful.CheersMartin--Martin Knoblauchemail: k n o b i AT knobisoft DOT dewww:   http://www.knobisoft.deFrom: Joan Marc Riera marc.ri...@barcelonamedia.orgTo: Martin Knoblauch kn...@knobisoft.deCc: "ganglia-general@lists.sourceforge.net" ganglia-general@lists.sourceforge.netSent: Sat, October 23, 2010 7:17:08 PMSubject: Re: [Ganglia-general] gmetad only reads from one node of each data_source



  
  

Hi,

I have restarted all, for sure. 

This are the ouputs from the telnet:
node01: http://paste.ubuntu.com/518811/
node02: http://paste.ubuntu.com/518812/


I've done the following to get some output.
on node1 launch:(/usr/sbin/gmond --debug=10 21 ) 
/hpcdrive/homemarc.riera/node01.gmond.debug
this is the complete output:
http://paste.ubuntu.com/518824/
on node02 launch: (/usr/sbin/gmond --debug=10 21 )
 /hpcdrive/homemarc.riera/node02.gmond.debug
this is the complete output:
http://paste.ubuntu.com/518825/
restart gmetad on ganglia server.
Ctrl- C on node01
ctrl-c on node02






I've seen both logs and still don't get whats wrong. shame on me. 

Meaningwhile, Ron, another user on the list suggested me to change
something on my gmond.conf
udp_recv_channel { 
  family = inet4
  port = 8649 
}

I've tryied, without success. maybe something else should be changed. 





} 





On 10/22/2010 02:27 PM, Martin Knoblauch wrote:

  
  Hi
Marc,
  
on first sight, the configs for node01 and node02 look identical and
correct. Have the "gmonds" on all nodes been restarted after the
changes (just to be sure :-). What do you get from: "telnet node01
8649" and "telnet node02 8649"?
  
Oh, which version of gmetad/gmond are you running?
  
Cheers
  Martin 
  
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
  www: http://www.knobisoft.de
  
  
  

From:
Joan Marc Riera marc.ri...@barcelonamedia.org
To: Martin Knoblauch
kn...@knobisoft.de
Cc:
"ganglia-general@lists.sourceforge.net"
ganglia-general@lists.sourceforge.net
Sent: Fri, October
22, 2010 12:51:55 PM
Subject: Re:
[Ganglia-general] gmetad only reads from one node of each data_source

Sorry, I think my response has been discarted because of the
attachments. I send it again with my conf files on pastebin. Sorry to
bother.

My gmond conf has only minor changes. I'm happy to share them .

I link(pastebin) to 3 files, gmond from node01 , node02 and nodegpu01. 
node01: http://pastebin.com/wa9mmT3h
node02: http://pastebin.com/ZtwsqnNp
nodegpu01 :http://pastebin.com/3ztHULwd


As I remember, the only changes I had done are name and owner depending
on the Cluster group, and the upd send and recv channel to be different
for each Cluster group.


Thanks.

On 10/22/2010 12:30 PM, Martin Knoblauch wrote:

  Hi
Joan,
  
what you describe sounds fine with regard to "gmetad". "gmetad" will
only talk one node per data_source. If that node fails and you have
more than one node listed, it will [try to] failover to the next
available node. So far, everything is working as expected.
  
Your problem is that apparently each of node01..10 only "knows" its
own metrics. Nodes listed on the data_source line need to know the
metrics of all nodes in the respective cluster. So it is more a problem
with the configuration of your "gmond" services. Care to share the
configuration of one of the nodes?
  
Cheers
  Martin 
  
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
  www: http://www.knobisoft.de
  
  
  

From:
Joan Marc Riera marc.ri...@barcelonamedia.org
To:
ganglia-general@lists.sourceforge.net
Sent: Fri,
October
22, 2010 11:50:05 AM
Subject:
[Ganglia-general] gmetad only reads from one node of each data_source

Hello,

I have gmetad with following conf running :
r...@fbmsgga01:/var/lib/ganglia# cat /etc/ganglia/gmetad.conf |grep -v ^# |grep -v ^$
data_source "CPU cluster" node01 node02 node03 node04 node05 node06 node07 node08 node09 node10
data_source "GPU cluster" nodegpu01
gridname "FBM"
r...@fbmsgga01:/var/lib/ganglia#



All nodes and gmetad server are on the same vlan.

I onl

Re: [Ganglia-general] gmetad only reads from one node of each data_source

2010-10-22 Thread Martin Knoblauch

Hi Marc,on first sight, the configs for node01 and node02 look identical and correct. Have the "gmonds" on all nodes been restarted after the changes (just to be sure :-). What do you get from: "telnet node01 8649" and "telnet node02 8649"?Oh, which version of gmetad/gmond are you running?CheersMartin --Martin Knoblauchemail: k n o b i AT knobisoft DOT dewww:   http://www.knobisoft.deFrom: Joan Marc Riera marc.ri...@barcelonamedia.orgTo: Martin Knoblauch kn...@knobisoft.deCc: "ganglia-general@lists.sourceforge.net" ganglia-general@lists.sourceforge.netSent: Fri, October 22, 2010 12:51:55 PMSubject: Re: [Ganglia-general] gmetad only reads from one node of each data_source



  

Sorry, I think my response has been discarted because of the
attachments. I send it again with my conf files on pastebin. Sorry to
bother.

My gmond conf has only minor changes. I'm happy to share them .

I link(pastebin) to 3 files, gmond from node01 , node02 and nodegpu01. 
node01: http://pastebin.com/wa9mmT3h
node02: http://pastebin.com/ZtwsqnNp
nodegpu01 :http://pastebin.com/3ztHULwd


As I remember, the only changes I had done are name and owner depending
on the Cluster group, and the upd send and recv channel to be different
for each Cluster group.


Thanks.

On 10/22/2010 12:30 PM, Martin Knoblauch wrote:

  
  Hi
Joan,
  
what you describe sounds fine with regard to "gmetad". "gmetad" will
only talk one node per data_source. If that node fails and you have
more than one node listed, it will [try to] failover to the next
available node. So far, everything is working as expected.
  
Your problem is that apparently each of node01..10 only "knows" its
own metrics. Nodes listed on the data_source line need to know the
metrics of all nodes in the respective cluster. So it is more a problem
with the configuration of your "gmond" services. Care to share the
configuration of one of the nodes?
  
Cheers
  Martin 
  
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
  www: http://www.knobisoft.de
  
  
  

From:
Joan Marc Riera marc.ri...@barcelonamedia.org
To:
ganglia-general@lists.sourceforge.net
Sent: Fri, October
22, 2010 11:50:05 AM
Subject:
[Ganglia-general] gmetad only reads from one node of each data_source

Hello,

I have gmetad with following conf running :
r...@fbmsgga01:/var/lib/ganglia# cat /etc/ganglia/gmetad.conf |grep -v ^# |grep -v ^$
data_source "CPU cluster" node01 node02 node03 node04 node05 node06 node07 node08 node09 node10
data_source "GPU cluster" nodegpu01
gridname "FBM"
r...@fbmsgga01:/var/lib/ganglia#



All nodes and gmetad server are on the same vlan.

I only recieve nodegpu01 and node01 info, but if I stop gmond on node01
I start receiving from node02. If I stop node02 I start receiving from
node03, and so on.

I do not understant what is happening, everithing was working fine
until yesterday, when I restarted gmetad host.

data from nodegpu01 is being received and plotted fine. 


What is going on here?


Thanks.

Marc

-- 


Joan Marc Riera Duocastella

Barcelona Media - Centre d'Innovació
Av. Diagonal, 177, planta 9 08018 - BARCELONA
Telèfon +34 93 238 14 00 Fax +34 93 309 31 88
www.barcelonamedia.org



  
  


-- 
 


Joan Marc Riera Duocastella

Barcelona Media - Centre d'Innovació
Av. Diagonal, 177, planta 9 08018 - BARCELONA
Telèfon +34 93 238 14 00 Fax +34 93 309 31 88
www.barcelonamedia.org


--
Nokia and ATT present the 2010 Calling All Innovators-North America contest
Create new apps  games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Running multiple gmonds on the same server

2010-10-15 Thread Martin Knoblauch

Hi Anton,

 are you using a multiast or unicast setup? Unicast should work just fine. At 
least it did in 3.0.x. For multicast you *may* also need to run on distinct 
mc-addresseses in addition to the distinct ports. but I never tested that.

Cheers

Martin 
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



- Original Message 
 From: David Birdsong david.birds...@gmail.com
 To: Anton Yurchenko ayurche...@gmail.com
 Cc: ganglia-general@lists.sourceforge.net
 Sent: Fri, October 15, 2010 1:25:11 AM
 Subject: Re: [Ganglia-general] Running multiple gmonds on the same server
 
 I'm not there anymore, but I think it was 3.1.2.
 
 On Thu, Oct 14, 2010 at  4:23 PM, Anton Yurchenko ayurche...@gmail.com  
wrote:
 
  Well that is good to know :)
  What version of  ganglia are you running?
 
  Thanks!
 
 
  On  10/14/2010 4:21 PM, David Birdsong wrote:
 
  FYI, we did  exactly this for ~4-5 clusters at my last installation.
  It worked  fine.
 
  On Thu, Oct 14, 2010 at 4:16 PM, Anton  Yurchenkoayurche...@gmail.com
wrote:
 
   Hi all,
 
   I am tying to consolidate all the gmond aggregation nodes  for 3
   clusters that we have on a pair of servers.
  I tried to have  gmond for each cluster run on it own set of ports, but
  its not  working very well.
  In ganlia UI for the clusters I can see the  number of hosts is correct,
  but none of the other metrics are  showing.
  Is this not the right approach for running gmond for  multiple clusters?
 
  Thanks!
   Anton
 
 
   
--
   Download new Adobe(R) Flash(R) Builder(TM) 4
  The new Adobe(R)  Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly
  Flex(R)  Builder(TM)) enable the development of rich applications that  
run
  across multiple browsers and platforms. Download your free  trials today!
  http://p.sf.net/sfu/adobe-dev2dev
   ___
  Ganglia-general  mailing list
  Ganglia-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 
 
 
 --
 Download  new Adobe(R) Flash(R) Builder(TM) 4
 The new Adobe(R) Flex(R) 4 and Flash(R)  Builder(TM) 4 (formerly 
 Flex(R) Builder(TM)) enable the development of rich  applications that run
 across multiple browsers and platforms. Download your  free trials today!
 http://p.sf.net/sfu/adobe-dev2dev
 ___
 Ganglia-general  mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 e_

--
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly 
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Running multiple gmonds on the same server

2010-10-15 Thread Martin Knoblauch

Somehow Anton got lost ...

- Original Message 
 From: Martin Knoblauch kn...@knobisoft.de
 To: David Birdsong david.birds...@gmail.com
 Cc: David Birdsong david.birds...@gmail.com; ganglia general 
ganglia-general@lists.sourceforge.net
 Sent: Fri, October 15, 2010 9:31:41 AM
 Subject: Re: [Ganglia-general] Running multiple gmonds on the same server

 Hi Anton,

  are you using a multiast or unicast setup? Unicast should work  just fine. 
 At 

 least it did in 3.0.x. For multicast you *may* also need to  run on distinct 
 mc-addresseses in addition to the distinct ports. but I  never tested that.

 Cheers

 Martin 
 --
 Martin  Knoblauch
 email: k n o b i AT knobisoft DOT de
 www:  http://www.knobisoft.de

 - Original Message 
   From: David Birdsong david.birds...@gmail.com
   To: Anton Yurchenko ayurche...@gmail.com
  Cc: ganglia-general@lists.sourceforge.net
   Sent: Fri, October 15, 2010 1:25:11 AM
  Subject: Re: [Ganglia-general]  Running multiple gmonds on the same server

  I'm not there  anymore, but I think it was 3.1.2.

  On Thu, Oct 14, 2010  at  4:23 PM, Anton Yurchenko ayurche...@gmail.com  
 wrote:

   Well that is good to know :)
What version of  ganglia are you running?

Thanks!

   On  10/14/2010 4:21 PM,  David Birdsong wrote:

   FYI, we did   exactly this for ~4-5 clusters at my last installation.
   It  worked  fine.

   On Thu, Oct 14, 2010 at  4:16 PM, Anton  Yurchenkoayurche...@gmail.com
  wrote:

Hi  all,

I am tying to consolidate  all the gmond aggregation nodes  for 3
clusters  that we have on a pair of servers.
   I tried to have   gmond for each cluster run on it own set of ports, but
   its  not  working very well.
   In ganlia UI for the clusters  I can see the  number of hosts is 
correct,
   but none of  the other metrics are  showing.
   Is this not the right  approach for running gmond for  multiple 
clusters?

   Thanks!
 Anton

--
 Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R)  Flex(R) 4 and Flash(R) Builder(TM) 4  (formerly
   Flex(R)  Builder(TM)) enable the development  of rich applications that 

 run
   across multiple  browsers and platforms. Download your free  trials 
today!
http://p.sf.net/sfu/adobe-dev2dev
 ___
Ganglia-general  mailing list
   Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

--
   Download  new Adobe(R) Flash(R) Builder(TM) 4
  The new Adobe(R)  Flex(R) 4 and Flash(R)  Builder(TM) 4 (formerly 
  Flex(R)  Builder(TM)) enable the development of rich  applications that run
   across multiple browsers and platforms. Download your  free trials  today!
  http://p.sf.net/sfu/adobe-dev2dev
   ___
  Ganglia-general   mailing list
  Ganglia-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/ganglia-general
  e_

--
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly 
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Does Ganglia measure itself?

2010-09-21 Thread Martin Knoblauch

Hi Weston,

 gmond just looks at the low-level counters provided by the OS and has no 
awareness about its own resource usage. So, it will collect cpu-usage including 
its own cycles.

Does this answer your question?

 
Cheers
Martin
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



- Original Message 
 From: Stevens, Weston J weston.j.stev...@boeing.com
 To: ganglia-general@lists.sourceforge.net 
ganglia-general@lists.sourceforge.net
 Sent: Mon, September 20, 2010 8:20:21 PM
 Subject: [Ganglia-general] Does Ganglia measure itself?
 
 For instance, if gmetad and gmond are using a few percent of CPU, would this  
show up on the CPU usage graph? Or does it ignore itself and only count  
everything else?  Thanks
 
 
 --
 Start  uncovering the many advantages of virtual appliances
 and start using them to  simplify application deployment and
 accelerate your shift to cloud  computing.
 http://p.sf.net/sfu/novell-sfdev2dev
 ___
 Ganglia-general  mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 mG

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Multiple clusters with unicast

2010-07-16 Thread Martin Knoblauch

Jonathan,

  I do it this way.

- run the gmonds on each cluster on a dedicated port (per cluster)
- let them cast their messages to a dedicated aggregator gmond for each 
cluster
- let gmetad query those aggregators on their dedicated ports

 If you want to have one host in different clusters, you can run two gmonds on 
that host, with different port. I never did that, but it should work

Cheers

Martin 
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



- Original Message 
 From: Jonathan Weiss j...@innerewut.de
 To: ganglia-general@lists.sourceforge.net
 Sent: Fri, July 16, 2010 11:50:08 AM
 Subject: [Ganglia-general] Multiple clusters with unicast
 
 Cheers,
 
 
 I'm using Ganglia with unicast on EC2 (so there is no chance  for
 multicast). I have a typical web-app with load balancers, app  servers
 and database servers.
 Everything is working fine as one Ganglia  cluster with unicast by
 having all local gmonds using udp_send to send to one  monitoring
 server running gmond  gmetad.
 
 My problem is now that I  would like to list the different roles in my
 cluster in Ganglia. So that I  get a CPU overview for all app-servers
 separated from the CPU report for the  DB servers. I've tried doing
 this by setting a different cluster name in the  local gmonds.
 But it looks like whatever cluster name I have in the gmond of  the
 Monitoring server is overriding this so I end up having only  one
 cluster.
 
 Is there a way of doing this without having gmetad query  all gmonds?
 
 BTW can one host be in multiple clusters? So if I have a  server that
 is a app-server and a memcached server could I have it listed in  both
 clusters?
 
 Regards,
 Jonathan
 
 -- 
 Jonathan Weiss
 http://blog.innerewut.de
 http://twitter.com/jweiss
 
 --
 This  SF.net email is sponsored by Sprint
 What will you do first with EVO, the  first 4G phone?
 Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
 ___
 Ganglia-general  mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] bytes_in (and bytes_out): instantaneous or averaged?

2010-07-07 Thread Martin Knoblauch



 --
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



- Original Message 
 From: David Barnes david.g.bar...@gmail.com
 To: ganglia-general@lists.sourceforge.net
 Sent: Wed, July 7, 2010 1:59:13 AM
 Subject: [Ganglia-general] bytes_in (and bytes_out): instantaneous or 
 averaged?
 
 Hi all,

I am planning to use historical archives of ganglia data for 
 our
cluster to document its utilisation history and guide our 
 next
upgrade.

I would like to understand the bytes_in and bytes_out 
 metrics a bit
better.  Are they instantaneous, or average, 
 measurements?

Ie. say my gmond polling time is 5 seconds.  If the 
 following happens,
with nothing else going on of significance:

@ t = 0 
 second, gmond polls metrics (poll0)
@ t = 1 second, 1Mbyte transferred in 
 (effectively instantly)
@ t = 3s, 5Mbyte transferred out (effectively 
 instantly)
@ t = 5s, gmond polls metrics (poll1)

What is going to be 
 stored in bytes_in and bytes_out for poll1?

Will it be the *average* 
 (integrated) throughput:

bytes_in: 1Mbyte / 5s  = 200kByte/s = 
 bytes_in = 20
bytes_out: 5Mbyte / 5s = 1000kBytes/s = bytes_out = 
 100

Or will it be the instantaneous throughput measured at the time 
 of
poll1, ie. both bytes_in and bytes_out = 0 because there is 
 no
instantaneous activity?

Another way of asking the same question: is 
 it valid to deduce
long-term (aggregate) data transfer volumes from the rates 
 expressed
by bytes_in and bytes_out?

Thanks very much in advance - 
 David 
 Barnes.

--
This 
 SF.net email is sponsored by Sprint
What will you do first with EVO, the 
 first 4G phone?
Visit sprint.com/first -- 
 href=http://p.sf.net/sfu/sprint-com-first; target=_blank 
 http://p.sf.net/sfu/sprint-com-first
___
Ganglia-general 
 mailing list

 href=mailto:Ganglia-general@lists.sourceforge.net;Ganglia-general@lists.sourceforge.net

 href=https://lists.sourceforge.net/lists/listinfo/ganglia-general; 
 target=_blank 
 https://lists.sourceforge.net/lists/listinfo/ganglia-general

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] gmetad xml output is incomplete sometimes

2010-06-29 Thread Martin Knoblauch

Hi Miguel,

 good to know, that age hasn't stopped my memory from working :-)

 Maybe this asks for documentation.

Cheers

 Martin--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



- Original Message 
 From: Miguel A. Díaz Corchero miguelangel.d...@ciemat.es
 To: Martin Knoblauch kn...@knobisoft.de
 Cc: Bernard Li bern...@vanhpc.org; ganglia-general@lists.sourceforge.net 
 ganglia-general@lists.sourceforge.net
 Sent: Tue, June 29, 2010 8:17:51 AM
 Subject: Re: [Ganglia-general] gmetad xml output is incomplete sometimes
 
 Thanks Martin. Your solution solves my problem.


El lun, 28-06-2010 a 
 las 03:15 -0700, Martin Knoblauch escribió:
 Hi Miguel,
 
 
  just to rule that out: check the data_source lines in your 
 gmetad.conf to make sure that gmetad is not querying its own XML port. That 
 could result in incomplete/broken XML. And yes, we have seen it before 
 :-)
 
  Cheers
 Martin
 
 --
 Martin 
 Knoblauch
 email: k n o b i AT knobisoft DOT de
 www:  
 href=http://www.knobisoft.de; target=_blank 
 http://www.knobisoft.de
 
 
 
 - Original 
 Message 
  From: Miguel A. Díaz Corchero 
 ymailto=mailto:miguelangel.d...@ciemat.es; 
 href=mailto:miguelangel.d...@ciemat.es;miguelangel.d...@ciemat.es
 
  To: Bernard Li 
 href=mailto:bern...@vanhpc.org;bern...@vanhpc.org
  Cc: 
 ymailto=mailto:ganglia-general@lists.sourceforge.net; 
 href=mailto:ganglia-general@lists.sourceforge.net;ganglia-general@lists.sourceforge.net
  
 
 href=mailto:ganglia-general@lists.sourceforge.net;ganglia-general@lists.sourceforge.net
 
  Sent: Mon, June 28, 2010 8:27:50 AM
  Subject: Re: 
 [Ganglia-general] gmetad xml output is incomplete sometimes
  
 
  Hi Bernard.
 
 Now, I'm only monitoring 5 host. 
 
 -2/5 are switches and 
  only have 3 metrics. To do that I'm 
 using 3
 gmetric call every 
  minute.
 -3/5 are hosts 
 with the default metrics and default time 
  values.
 
 
 The problem appears in both cases: switches and 
  hosts.
 
 
 Seeing debug mode of gmetad, I noticed 3 events (updating, 
 
  writing,
 clearing). Maybe those events are relationed with my 
 problem 
  (perhaps
 clearing event).
 
 
 Thanks,
 Miguel.
 
 El vie, 25-06-2010 
  a las 
 10:57 -0700, Bernard Li escribió:
  Hi Miguel:
  
 
  How 
  many hosts and metrics are you monitoring with your 
 gmetad?
  
  
  Cheers,
  
 
  Bernard
  
  2010/6/25 Miguel A. 
  
 ymailto=mailto:
 href=mailto:miguelangel.d...@ciemat.es;miguelangel.d...@ciemat.es 
 
  href=mailto:
 href=mailto:miguelangel.d...@ciemat.es;miguelangel.d...@ciemat.es
 ymailto=mailto:miguelangel.d...@ciemat.es; 
 href=mailto:miguelangel.d...@ciemat.es;miguelangel.d...@ciemat.es:
 
  
   Hi.
  
   I'm getting the 
 XML output from gmetad and 
  saving it in a file.
   
 Sometimes, the output XML has more machine 
  than others. For 
 example,
   At 2 p.m the xml output is
   
 
  grid
  
 cluster1
  
  
 host 1
  
  
 host 2
  
  
 host 3
  
 
   /cluster1
  
  
  /grid
  
   And one 
 minute later, the xml output is 
  (for example)
   
 grid
  
  
 cluster1
  
   
host 1
  
 /cluster1
  
   /grid
  
  But other minute later, the xml output is (for 
  
 example)
   grid

   
  cluster1

 
host 1

 
host 2
 

   
 host 3
  /cluster1
 
  
   /grid
  
   I have 
 revised that hosts were 
  running and they were ok. I think 
 gmetad
   only shows updated data, 
  but I'm not 
 sure. Do you know why gmetad
   occassionally shows some 
 
  piece of data and not all of them?
  
   
 Regards
  
   Miguel.
  
  
  
   
  
 Confidencialidad:
   Este mensaje y sus ficheros adjuntos se 
 dirige 
  exclusivamente a su destinatario y puede contener 
 información privilegiada o 
  confidencial. Si no es vd. el 
 destinatario indicado, queda notificado de que la 
  utilización, 
 divulgación y/o copia sin autorización está prohibida en virtud de 
  
 la legislación vigente. Si ha recibido este mensaje por error, le rogamos que 
 
  nos lo comunique inmediatamente respondiendo al mensaje y proceda 
 a su 
  destrucción.
  
   
 Disclaimer:
   This message and 
  its attached files 
 is intended exclusively for its recipients and may contain 
  
 confidential information. If you received this e-mail in error you are hereby 
 
  notified that any dissemination, copy or disclosure of this 
 communication is 
  strictly prohibited and may be unlawful. In this 
 case, please notify us by a 
  reply and delete this email and its 
 contents immediately.
   
  
 
  
  
  
 
   
  
 --
 
  
   ThinkGeek and WIRED's GeekDad team up for the 
 Ultimate
   GeekDad

Re: [Ganglia-general] gmetad xml output is incomplete sometimes

2010-06-28 Thread Martin Knoblauch

Hi Miguel,

 just to rule that out: check the data_source lines in your gmetad.conf to make 
sure that gmetad is not querying its own XML port. That could result in 
incomplete/broken XML. And yes, we have seen it before :-)

 Cheers
Martin
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



- Original Message 
 From: Miguel A. Díaz Corchero miguelangel.d...@ciemat.es
 To: Bernard Li bern...@vanhpc.org
 Cc: ganglia-general@lists.sourceforge.net 
 ganglia-general@lists.sourceforge.net
 Sent: Mon, June 28, 2010 8:27:50 AM
 Subject: Re: [Ganglia-general] gmetad xml output is incomplete sometimes
 
 Hi Bernard.

Now, I'm only monitoring 5 host. 
-2/5 are switches and 
 only have 3 metrics. To do that I'm using 3
gmetric call every 
 minute.
-3/5 are hosts with the default metrics and default time 
 values.

The problem appears in both cases: switches and 
 hosts.

Seeing debug mode of gmetad, I noticed 3 events (updating, 
 writing,
clearing). Maybe those events are relationed with my problem 
 (perhaps
clearing event).

Thanks,
Miguel.

El vie, 25-06-2010 
 a las 10:57 -0700, Bernard Li escribió:
 Hi Miguel:
 
 How 
 many hosts and metrics are you monitoring with your gmetad?
 
 
 Cheers,
 
 Bernard
 
 2010/6/25 Miguel A. 
 ymailto=mailto:miguelangel.d...@ciemat.es; 
 href=mailto:miguelangel.d...@ciemat.es;miguelangel.d...@ciemat.es:
 
  Hi.
 
  I'm getting the XML output from gmetad and 
 saving it in a file.
  Sometimes, the output XML has more machine 
 than others. For example,
  At 2 p.m the xml output is
  
 grid
 cluster1
 
 host 1
 
 host 2
 
 host 3
 
  /cluster1
  
 /grid
 
  And one minute later, the xml output is 
 (for example)
  grid
 
 cluster1
   
   host 1
 /cluster1
 
  /grid
  But other minute later, the xml output is (for 
 example)
  grid
 
 cluster1
   
   host 1
   
   host 2
   
   host 3
 /cluster1
 
  /grid
 
  I have revised that hosts were 
 running and they were ok. I think gmetad
  only shows updated data, 
 but I'm not sure. Do you know why gmetad
  occassionally shows some 
 piece of data and not all of them?
 
  Regards
 
  Miguel.
 
  
  
 Confidencialidad:
  Este mensaje y sus ficheros adjuntos se dirige 
 exclusivamente a su destinatario y puede contener información privilegiada o 
 confidencial. Si no es vd. el destinatario indicado, queda notificado de que 
 la 
 utilización, divulgación y/o copia sin autorización está prohibida en virtud 
 de 
 la legislación vigente. Si ha recibido este mensaje por error, le rogamos que 
 nos lo comunique inmediatamente respondiendo al mensaje y proceda a su 
 destrucción.
 
  Disclaimer:
  This message and 
 its attached files is intended exclusively for its recipients and may contain 
 confidential information. If you received this e-mail in error you are hereby 
 notified that any dissemination, copy or disclosure of this communication is 
 strictly prohibited and may be unlawful. In this case, please notify us by a 
 reply and delete this email and its contents immediately.
  
 
 
 
 
  
 --
 
  ThinkGeek and WIRED's GeekDad team up for the Ultimate
  GeekDad 
 Father's Day Giveaway. ONE MASSIVE PRIZE to the
  lucky parental 
 unit.  See the prize list and enter to win:
  
 href=http://p.sf.net/sfu/thinkgeek-promo; target=_blank 
 http://p.sf.net/sfu/thinkgeek-promo
  
 ___
  Ganglia-general 
 mailing list
  
 ymailto=mailto:Ganglia-general@lists.sourceforge.net; 
 href=mailto:Ganglia-general@lists.sourceforge.net;Ganglia-general@lists.sourceforge.net
 
  
 target=_blank 
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 
 
 

--
This 
 SF.net email is sponsored by Sprint
What will you do first with EVO, the 
 first 4G phone?
Visit sprint.com/first -- 
 href=http://p.sf.net/sfu/sprint-com-first; target=_blank 
 http://p.sf.net/sfu/sprint-com-first
___
Ganglia-general 
 mailing list

 href=mailto:Ganglia-general@lists.sourceforge.net;Ganglia-general@lists.sourceforge.net

 href=https://lists.sourceforge.net/lists/listinfo/ganglia-general; 
 target=_blank 
 https://lists.sourceforge.net/lists/listinfo/ganglia-general

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https

Re: [Ganglia-general] Ganglia Cluster grouping issues...

2010-03-10 Thread Martin Knoblauch

From: Nitin Bharadwaj west.ni...@gmail.com
To: Ofer Inbar c...@a.org
Cc: ganglia-general@lists.sourceforge.net
Sent: Wed, March 10, 2010 10:30:32 AM
Subject: Re: [Ganglia-general] Ganglia Cluster grouping issues...

Kool! I did just that, but another additional thing (when wiping out the RRD 
didnt help at all):

additional lines in gmond.conf for cluster-B

trusted_hosts = IP Address of gmetad
all_trusted = on

Now, IT WORKS!! THANKS A LOT FOLKS!! REALLY APPRECIATE YOUR PATIENCE AND 
TIME!! :-)

 Good that it works. Did you have to make similar changes for cluster-H? I am a 
bit surprised, that those lines are necessary. What is the IP address of your 
gmetad host?

Cheers
Martin

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Ganglia Cluster grouping issues...

2010-03-09 Thread Martin Knoblauch

- Original Message 

 From: Nitin Bharadwaj nitin.bharad...@mkhoj.com
 To: ganglia-general@lists.sourceforge.net
 Sent: Tue, March 9, 2010 10:03:03 AM
 Subject: [Ganglia-general] Ganglia Cluster grouping issues...

 Hi,

 I have a scenario as belows (might be a silly one, I'm not used to the
 configs of Ganglia yet):

 I have h1-h7 hosts, which need to go Under Cluster-H in Ganglia and
 similarly, b1-b3, which need to go to Cluster-B. Now, here is what my
 gmond.conf (for both host groups) and gmetad.conf look like:

 h1-h7 gmond.conf:

 name Cluster-H (remaining default)

 b1-b3 gmond.conf:

 name Cluster-B (remaining default)

 gmetad.conf:

 data_source Cluster-B b1 b2 b3
 data_source Cluster-H h1 h2 h3 h4 h5 h6 h7

 Now, Whatever I do, I see all these 10 hosts (h1-h7 and b1-b3) under
 both Cluster-H and Cluster-B. How do I get this resolved? Any help will
 be greatly appreciated.

 Thanks,
 Nitin

Hi Nitin,

 your scenario is not silly at all. My guess is that all of your hosts operate 
their gmonds on the same UDP channel. You need to use different ports for b1-b3 
and h1-h7. Lets say you change b1-b3 to port 9649 (in gmond.conf), your gmetad 
configuration should look like:

data_source Cluster-B b1:9649 b2:9649 b3:9649
data_source Cluster-H h1 h2 h3 h4 h5 h6 h7

Btw. it is sufficient to name just one of the hosts on the data_source line. 
The others are only queried if the first one fails.

Cheers
Martin

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] web front end is receiving cut-off XML

2010-03-03 Thread Martin Knoblauch

From: Maes, Richard rm...@ciena.com
To: Bernard Li bern...@vanhpc.org
Cc: ganglia-general@lists.sourceforge.net
Sent: Wed, March 3, 2010 1:04:42 AM
Subject: Re: [Ganglia-general] web front end is receiving cut-off XML

Bernard, my bad for the poor information.  I’m using
ports 8649 and 8651.

From gmetad.conf from my concentrator
data_source wagrid waxgridqm.ciena.com:8651

 me thinks above should be 8649. As it is now, gmetad is querying itself.

gridname wagrid
xml_port 8651

From my gmond.conf file that I use across all clients and my
concentrator.
/* Feel free to specify as many udp_send_channels as you
like.  Gmond
   used to only support having a single channel */
udp_send_channel {
  host = waxgridqm.ciena.com
  port = 8649
  ttl = 1
}

/* You can specify as many udp_recv_channels as you like as
well. */
udp_recv_channel {
  port = 8649
}

/* You can specify as many tcp_accept_channels as you like to
share
   an xml description of the state of the cluster */

tcp_accept_channel {
  port = 8649
}

From:Bernard Li
[mailto:bern...@vanhpc.org] 
Sent: Tuesday, March 02, 2010 12:47 PM
To: Maes, Richard
Cc: ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] web front end is receiving cut-off XML

Hi Richard:

On Fri, Feb 26, 2010 at 10:44 AM, Maes, Richard rm...@ciena.com wrote:

I
have been having a problem with my web front end 3.1.2 where many of my hosts
do or don’t show up in the web GUI. 

What OS are you running? 

If
I do a telnet localhost 8650 or 8651, I get full uncorrupted XML output with a
message at the bottom that says “Connection closed by foreign
host.”

Did you mean 8651 and 8652?  8650 is not a standard
Ganglia port.

Can you please post the data_source line in your gmetad.conf file?

One thing you can do to troubleshoot the problem is lower the number of hosts
in your cluster and see if the situation changes.  Also, try to see if you
can isolate a host that could potentially be causing this issue?

Cheers,

Bernard

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] gmond memory leaks

2010-03-03 Thread Martin Knoblauch



 --
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



- Original Message 
 From: Scott Dworkis s...@mylife.com
 To: Martin Knoblauch kn...@knobisoft.de
 Cc: ganglia-general@lists.sourceforge.net
 Sent: Wed, March 3, 2010 5:21:32 AM
 Subject: Re: [Ganglia-general] gmond memory leaks
 
 finally had some time to do a few attempts at valgrind... so far it 
 doesn't seem to be telling me much... the numbers it reports are in the 
 megabyte and not gigabyte range that i'm seeing.
 
 after a couple hours of valgrind i see:
 
 ==31952== LEAK SUMMARY:
 ==31952==definitely lost: 532 bytes in 23 blocks.
 ==31952==indirectly lost: 271 bytes in 16 blocks.
 ==31952==  possibly lost: 13,872 bytes in 30 blocks.
 ==31952==still reachable: 1,626,182 bytes in 2,188 blocks.
 ==31952== suppressed: 0 bytes in 0 blocks.
 ==31952== Reachable blocks (those to which a pointer was found) are not 
 shown.
 ==31952== To see them, rerun with: --leak-check=full --show-reachable=yes
 
 this doesn't grow much even after valgrinding overnight
 
 ==24957== LEAK SUMMARY:
 ==24957==definitely lost: 2,404 bytes in 179 blocks.
 ==24957==indirectly lost: 271 bytes in 16 blocks.
 ==24957==  possibly lost: 13,872 bytes in 30 blocks.
 ==24957==still reachable: 1,626,182 bytes in 2,188 blocks.
 ==24957== suppressed: 0 bytes in 0 blocks.
 
 in fact most of these numbers are identical, so they must be fixed losses 
 in terms of valgrind accounting.


 Did you try --leak-check=full --show-reachable=yes. I believe that is 
supposed to show all allocations. Might be a bit of output, but as far as I can 
see you are able to reproduce early.

 this does not really reflect the growth of my gmond process (running under 
 valgrind here, so reported as memcheck), which i tracked with 5 minute 
 samples of top for an hour, shows a linear leak of over 1GB during that 
 period:
 
 (s...@admin3:16:43:/home/admin/monitoring/scripts) while [ 1 ];do top -n 1 
 | grep mem;sleep 300;done
 24957 nobody20   0 5623m 3.5g 3648 R   80 11.1 121:49.98 memcheck
 24957 nobody20   0 5753m 3.6g 3648 R   76 11.4 126:43.25 memcheck
 24957 nobody20   0 5948m 3.7g 3652 R  101 11.8 131:36.26 memcheck
 24957 nobody20   0 6108m 3.8g 3652 R   99 12.1 136:29.35 memcheck
 24957 nobody20   0 6267m 3.9g 3652 R   97 12.4 141:17.02 memcheck
 24957 nobody20   0 6436m 4.0g 3652 R   97 12.7 146:07.58 memcheck
 24957 nobody20   0 6547m 4.1g 3652 R   63 13.0 150:56.74 memcheck
 24957 nobody20   0 6707m 4.2g 3652 R   99 13.3 155:47.88 memcheck
 24957 nobody20   0 6917m 4.3g 3652 R   99 13.7 160:40.30 memcheck
 24957 nobody20   0 7055m 4.4g 3652 R   97 14.0 165:28.40 memcheck
 24957 nobody20   0 7201m 4.5g 3652 R  101 14.3 170:20.32 memcheck
 24957 nobody20   0 7340m 4.6g 3652 R   99 14.6 175:08.75 memcheck
 
 if i understand valgrind right, it's only orphaned data that's counted as 
 lost... perhaps some structure is not orphaned but bloating?
 
 one other accidental observation, i have a job that generates 70k metrics 
 every 5 minutes (a few dozen for every port on each of our switches)... 
 these are all spoof ip metrics.  this job had been accidentally disabled 
 for a few days and i noticed that the leak virtually stopped.  i can play 
 some more with various parameters of this job and see if i find anything 
 more... could be the spoof thing is coincidental but Rick Cobb also 
 mentioned his leak seemed to be spoof related.  i'll also see if sending 
 heartbeats for the spoof ips helps anything.


 spoofing might indeed be a hint.

Martin
 -scott
 
  Message: 2
  Date: Thu, 18 Feb 2010 07:15:33 -0800 (PST)
  From: Martin Knoblauch 
  Subject: Re: [Ganglia-general] gmond memory leaks
  To: Scott Dworkis 
  Cc: ganglia-general@lists.sourceforge.net
  Message-ID: 880015.28351...@web113306.mail.gq1.yahoo.com
  Content-Type: text/plain; charset=us-ascii
 
  - Original Message 
 
  From: Scott Dworkis 
  To: Martin Knoblauch 
  Cc: ganglia-general@lists.sourceforge.net
  Sent: Wed, February 17, 2010 8:32:32 PM
  Subject: Re: [Ganglia-general] gmond memory leaks
 
  3.1.2 on gentoo (that solaris must be a sourceforge ad?).  i have zero
  experience with valgrind... i'll have a look but a smidge of guidance
  would be appreciated.  :)
 
 
  Just  get valgrind and run the leaking gmond under its control. gmond 
 should be configured to not run in background. After some time interrupt it 
 and 
 you will get a report of valgrinds findings.
 
  For example, a simple program leaking 8x1MB will produce:
 
  [mknob...@l6g0223j ~]$ valgrind  ./memeat
  ==13647== Memcheck, a memory error detector.
  ==13647== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
  ==13647== Using LibVEX rev 1658, a library for dynamic binary translation.
  ==13647== Copyright (C) 2004-2006, and GNU GPL'd

Re: [Ganglia-general] [Ganglia-developers] Ganglia 3.1.7 ready for testing

2010-03-02 Thread Martin Knoblauch

- Original Message 

 From: Daniel Pocock dan...@pocock.com.au
 To: kn...@knobisoft.de
 Cc: ganglia-develop...@lists.sourceforge.net; 
 ganglia-general@lists.sourceforge.net 
 ganglia-general@lists.sourceforge.net
 Sent: Tue, March 2, 2010 12:23:32 PM
 Subject: Re: [Ganglia-developers] [Ganglia-general] Ganglia 3.1.7 ready for 
 testing

 Thanks to those who provided feedback - any objections to making 3.1.7
 generally available?  I would like to make it GA within the next 1-2
 days now.

 unless there is a [severe] regression compared to 3.1.2 - just let it escape. 
You know, the perfect is the enemy of the good.

Cheers
Martin

 Michael Perzl wrote:
  I have successfully compiled and tested 3.1.7 on
  - AIX 5.1 ML04
  - AIX 5.3 ML00
  - AIX 5.3 TL07
  - AIX 6.1 TL03

  Regards,
  Michael

  On 02/22/2010 12:15 PM, Daniel Pocock wrote:

  Just a reminder - any feedback is welcome, or feel free to discuss 3.1.7
  on IRC

  It would be good to have positive confirmation of which platforms this
  has been tested on, so far, I have tested
  - Debian lenny,
  - RHEL3/4/5,
  - CentOS 5,
- Solaris 8 and
  - Cygwin.

  and Brad has done some testing on SLES10

  Regards,

  Daniel

  Daniel Pocock wrote:

  I've tagged 3.1.7 and built a tarball:

   http://ganglia.info/testing/ganglia-3.1.7.tar.gz

  The md5sum for 3.1.7 is: 6aa5e2109c2cc8007a6def0799cf1b4c

  Since 3.1.6, only two things have changed and may need to be tested
  again by those who tested 3.1.6:
- the build system (support for commas in CFLAGS)
- the multicpu module - percentages reported differently

  This is not confirmation that the release is in GA status - a further
  notification will be sent when the testing period has elapsed without
  any serious defect.  Users are invited to test the tarball and submit
  feedback.

  Please do not commit on branches/monitor-core-3.1 until after 3.1.7
  goes GA, in case further tweaks are needed to facilitate a successful
  release.

  Below are the release notes from the STATUS file.  Other documentation
  has also changed since 3.1.2 and should be reviewed:

  GANGLIA 3.1 STATUS:   -*-text-*-
  Last modified at [$Date: 2010-02-17 11:01:08 + (Wed, 17 Feb 2010) $]

  The current version of this file can be found at:

 *

 http://ganglia.svn.sourceforge.net/svnroot/ganglia/branches/monitor-core-3.1/STATUS

  Release history:

   3.1.7 : Tagged: Feb 17, 2010
   3.1.6 : Tagged: Feb  4, 2010 (not released for GA)
   3.1.5(hargrave)   : Tagged: Nov 24, 2009 (not released for GA)
   3.1.4(hargrave)   : Tagged: Oct 26, 2009 (not released for GA)
   3.1.3(avenger): Tagged: Sep 19, 2009 (not released for GA)
   3.1.2(langley): Released: Feb 17, 2009
   3.1.1(wien)   : Released: Sep 10, 2008
   3.1.0(amelia) : Released: Jul 30, 2008

  Contributors looking for a mission:

 * Just do an egrep on TODO, XXX or FIXME in the source.
 * Review the bug database at: http://bugzilla.ganglia.info/
 * Open bugs in the bug database.
 * Implement a feature from the wishlist at:
  http://sourceforge.net/apps/trac/ganglia/wiki/ganglia_wish-list

  CURRENT RELEASE NOTES:
 (Please update this area with a brief description of bug fixes and
  enhancements that have been backported for the current release)

 Note: 3.1.3, 3.1.4, 3.1.5 and 3.1.6 never became GA, therefore,
 the release notes for all of them are combined below.

 3.1.7:

 * Fix build support for RHEL5/issue with commas in CFLAGS
 * multicpu module: show CPU utilization as a value between 0-100% for
   each core

 3.1.6:

 * Merge commit 1966 from trunk to fix contrib/removespikes.pl
 * Bootstrapping with Debian 5.0 (lenny) versions of autotools for
   this and future releases.

 http://www.mail-archive.com/ganglia-develop...@lists.sourceforge.net/msg05352.html

 http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg04688.html
 * Require user to explicitly specify sysconfdir when building from
  source,
   due to the fact that the old behavior was not consistent with the
   documented behavior.
 * Configuration files and scripts are now created during the install
  phase
   rather than during configure.   This allows values such as
  @sysconfdir@
   to be used in the template configuration files.
 * Abolish the use of release names - only release numbers will be used
   to distinguish versions in future
 * libmetrics: workaround system header conflict in DFBSD= 2.4 (BUG245)
 * Use PCRE regex matching to configure metrics using the name_match
  directive
 * rrdcached support
 * gmetad now uses apr and the sleep intervals between polls are
  randomized
   in a way that supports shorter polling intervals
 * FreeBSD support: fixes for crashes

Re: [Ganglia-general] replaced a host, new host not seen

2010-02-28 Thread Martin Knoblauch

 Original Message 

 From: Rick Cobb rc...@quantcast.com
 To: Cameron Spitzer cspit...@nvidia.com
 Cc: ganglia-general@lists.sourceforge.net 
 ganglia-general@lists.sourceforge.net
 Sent: Sat, February 27, 2010 4:03:05 AM
 Subject: Re: [Ganglia-general] replaced a host, new host not seen

 Well, one cause of the confusion is your /etc/ganglia/gmetad.conf data_source 
 entry.  It should *only* have the address of gmonds  that collect all metrics 
 for a cluster, and only one of your gmonds is doing that.

 Correct, listing gmonds that do not have all the information is the way to 
desaster.

 The Ganglia architecture can be very confusing. A 'gmond' has 3 tasks, and 
 all 
 but one of yours are only doing one of them:
 * Measure things about the local host and send them to the 'udp_send_channel'.

 which in case of multicast means send to every gmond that cares (is 
listening). In the case of unicast, it sends to *all* udp_send_channels. 
This is what I usually do: have two servers acting as headnodes for the 
monitoring. All monitoring clients have two udp_send_channels, sending their 
data to the two headnodes.

 I call these gmonds collectors, as they collect the data in the first place. 
And I made a mistake in my reply :-(

 * Receive measurements from any gmond (even itself) or gmetric on the 
 'udp_recv_channel' and put them in a local datastructure, which is basically 
 a 
 set (hash) of hosts with a set of current metrics per host. This is the step 
 that resolves addresses to names.
 * Answer requests from gmetad for the whole cluster's metrics. (It does this 
 on 
 the tcp_accept_channel).  Gmond just serializes the whole metrics 
 datastructure 
 into an XML document as the reply.

 In my usualy setup, these two functionalities reside on the headnode gmonds, 
which I call aggregators.

 If you have all your gmonds sending to one unicast address, only one of your 
 gmonds *has* all the metrics for that cluster.  That's what Martin called 
 designated as a collector.   In that case, your data_source line should 
 only 

 Actually I wanted to write aggregator for these gmonds. 

 include that gmond (host). Adding the others can only cause problems -- if 
 the 
 first gmond fails, your gmetad will contact the second one in the list, and 
 that 
 won't actually have any metrics on it, since no one (including itself) is 
 sending it any.  All your nodes will (gradually as timeouts expire) appear to 
 be 
 down.  'gmond' will expire hosts if your gmond.conf has a non-zero 
 'host_dmax' 
 entry (see http://linux.die.net/man/5/gmond.conf, among others).

 'gmetad' is an entirely different beast from gmond; sometimes I think it was 
 written by a completely different team.  It polls your gmonds, writes the 
 numeric metric values to RRDtool files, and responds to queries for (subsets 
 of) 
 metrics so front-ends can present them.  It has *no* relationship with your 
 udp_send_channel or udp_receive_channel; also, it has almost no (AFAIK) 
 relationship to your network infrastructure -- it doesn't reverse-lookup 
 addresses, for example.

 On the other hand, it does combine all the metrics for a cluster into a 
 long-term in memory data structure, and then combines those into a single 
 'grid-level' datastructure.  In gmetad, metrics (including a last-heard-from 
 metric ('RECORDED') for a host) can expire, but hosts just go 'down'; they 
 never 
 go away.

 So: if you haven't set a host_dmax, you have to stop gmetad, restart every 
 gmond 
 that the gmetad could talk to (i.e., everything on the data_source line), 
 start 
 gmetad.  In your case, there's only one gmond that gmetad should talk to, so 
 simplify your life by removing the rest from your data_source line.  I'd set 
 host_dmax, too, but that's a matter of taste.

 -- ReC
 On Feb 26, 2010, at 12:22 PM, Cameron Spitzer wrote:

  I was able to remove the dead host (that isn't really dead) from the 
 overview display.
  I had to kill all gmond's everywhere, and the gmetad.
  Then I removed the rrd files for the dead host from gmetad's rrds 
  directory,
  and the rrd directory itself.
  Then I removed the dead host's IP address from gmetad.conf.
  Then I brought up all the gmonds (except the dead one) and then the 
  gmetad.
  Apparently, these steps will have to be added to our failover procedure.

  Martin Knoblauch wrote:

  ...

   Also, just to better understand the situation, what is the exact setup? 
  Is 
 one of the gmonds designated as a collector? Or do all gmonds carry all 
 metrics from all hosts? Which gmond is queried by gmetad (snippet from 
 config file)? You should telnet/nc to that gmond and check whether it has 
 current metrics from B.

  I don't know what designated as a collector means.

 s/collector/aggregator/ ans see above.

  Nor do I know how to control which gmonds carry all metrics from which 
  hosts.  
 There is only one udp_send_channel
  in gmond.conf, and the host

Re: [Ganglia-general] replaced a host, new host not seen

2010-02-26 Thread Martin Knoblauch

From: Ramon Bastiaans ramon.bastia...@sara.nl
To: Cameron Spitzer cspit...@nvidia.com
Cc: ganglia-general@lists.sourceforge.net 
ganglia-general@lists.sourceforge.net
Sent: Fri, February 26, 2010 9:14:58 AM
Subject: Re: [Ganglia-general] replaced a host, new host not seen

On 02/26/2010 02:46 AM, Cameron Spitzer wrote:

Bernard Li wrote:

Same hostname too I presume?  On gmetad, your hosts show up with
hostnames, correct?

Yes, same hostname.

Is it perhaps showing up in the gmetad/web by it's IP address in stead
of it's hostname? That might indicate a DNS/hostname issue.

Also make sure the newly replaced gmond host is not set to mute in
the gmond.conf

Telnet from the master to the new host gives an XML document, same as
the old one.

What I would test is telnet (or nc) from master to _another_ host and
make sure that it has metrics from the new host.

I don't understand that at all.  Host A is running gmetad.
Host B (gmond)  is not getting graphed, even though it sends XML.
Hosts C through W are working fine.

How would telnet from A to C tell me what's wrong with B?

When using multicast, all other gmond's contain the information of the
other gmond's. Since you are using unicast that is not the case here.

Why
would host C know anything about host B?
Should any gmond host have information about all the other gmond hosts?
In any case, the telnet output is the same from B and from C.
There is no reference to any hosts in it.

Are you using multicast (default) or unicast?\

Unicast.

Is the route from gmond host B to gmetad host A set correctly? Perhaps
the gmond traffic is getting sent over the wrong interface.

When in doubt I tend to use tcpdump myself to verify the traffic is
getting sent.

 Also, just to better understand the situation, what is the exact setup? Is one 
of the gmonds designated as a collector? Or do all gmonds carry all metrics 
from all hosts? Which gmond is queried by gmetad (snippet from config 
file)? You should telnet/nc to that gmond and check whether it has current 
metrics from B.

Cheers
Martin

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] gmond memory leaks

2010-02-18 Thread Martin Knoblauch

- Original Message 

 From: Scott Dworkis s...@mylife.com
 To: Martin Knoblauch kn...@knobisoft.de
 Cc: ganglia-general@lists.sourceforge.net
 Sent: Wed, February 17, 2010 8:32:32 PM
 Subject: Re: [Ganglia-general] gmond memory leaks

 3.1.2 on gentoo (that solaris must be a sourceforge ad?).  i have zero 
 experience with valgrind... i'll have a look but a smidge of guidance 
 would be appreciated.  :)

 Just  get valgrind and run the leaking gmond under its control. gmond 
should be configured to not run in background. After some time interrupt it and 
you will get a report of valgrinds findings.

For example, a simple program leaking 8x1MB will produce:

[mknob...@l6g0223j ~]$ valgrind  ./memeat
==13647== Memcheck, a memory error detector.
==13647== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==13647== Using LibVEX rev 1658, a library for dynamic binary translation.
==13647== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==13647== Using valgrind-3.2.1, a dynamic binary instrumentation framework.
==13647== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==13647== For more details, rerun with: -v
==13647==
^C
==13647==
==13647== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 5 from 1)
==13647== malloc/free: in use at exit: 8,000,000 bytes in 8 blocks.
==13647== malloc/free: 8 allocs, 0 frees, 8,000,000 bytes allocated.
==13647== For counts of detected errors, rerun with: -v
==13647== searching for pointers to 8 not-freed blocks.
==13647== checked 66,440 bytes.
==13647==
==13647== LEAK SUMMARY:
==13647==definitely lost: 8,000,000 bytes in 8 blocks.
==13647==  possibly lost: 0 bytes in 0 blocks.
==13647==still reachable: 0 bytes in 0 blocks.
==13647== suppressed: 0 bytes in 0 blocks.
==13647== Use --leak-check=full to see details of leaked memory.

 If you use --leak-check=full, it will tell you where the leaking memory was 
allocated. gmond needs to be compiled with debug info (-g).

 A few questions. 

 - What is your setup? I assume quite a few hosts monitoring (collectors) 
metrics and one aggregating the results.
 - Which of the gmonds leak? The collectors, the aggregator or both?

Cheers
Martin

 yeah 150k metrics is a lot... i have an interest in scaling this thing. 
 i'll post another thread bout things i've done to scale so far that seem 
 to be working well.

 On Wed, 17 Feb 2010, Martin Knoblauch wrote:

  Hi Scott,

  which version of Ganglia and which operating environment do you have 
  (guessing 
 Solaris from your signature :-)? Any chance that you could run valgrind or 
 equivalent on your setup? 10GB/day is a lot, as is 150k metrics.

  Cheers
  Martin
  --
  Martin Knoblauch
  email: k n o b i AT knobisoft DOT de
  www:  http://www.knobisoft.de

  - Original Message 
  From: Scott Dworkis 
  To: ganglia-general@lists.sourceforge.net
  Sent: Wed, February 17, 2010 3:08:26 AM
  Subject: [Ganglia-general] gmond memory leaks

  (sorry if this is a repost... i tried previously without having first
  subscribed to the list, and fear i got lost somewhere along the moderation
  path)

  hi all - i am seeing gmond leak about 10GB/day on about 150k metrics
  collected.  it seemed like things worsened when i added dmax to all my
  custom metrics, but maybe it was always bad.  is this a known issue?

  sorry if it is already known... i couldn't see that there was a good way
  to search the forums or if there is a bug tracker to search.

  -scott

 --
  SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
  Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
  http://p.sf.net/sfu/solaris-dev2dev
  ___
  Ganglia-general mailing list
  Ganglia-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/ganglia-general

 --
 SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
 Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
 http://p.sf.net/sfu/solaris-dev2dev
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general

--
Download Intelreg; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs 
proactively, and fine-tune applications for parallel performance. 
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

[Ganglia-general] Fw: any workaround for the bogus spikes problem?

2010-02-03 Thread Martin Knoblauch

forgot the list ...

- Forwarded Message 

 From: Martin Knoblauch kn...@knobisoft.de
 To: Cameron Spitzer cspit...@nvidia.com
 Sent: Wed, February 3, 2010 11:48:10 AM
 Subject: Re: [Ganglia-general] any workaround for the bogus spikes problem?

 From: Cameron Spitzer 
 To: kn...@knobisoft.de
 Cc: ganglia-general@lists.sourceforge.net 

 Sent: Tue, February 2, 2010 6:49:52 PM
 Subject: Re: [Ganglia-general] any workaround for the bogus spikes problem?

 Martin Knoblauch wrote:

 We're trying to use Ganglia to monitor some HP DL580-G5 machines.
 We're using a 64-bit linux-2.6.16.

 which version of Ganglia?

 ganglia-3.1.2

 The network traffic information is polluted with phantom 20 PB traffic 
 spikes.

 I tried lowering the silliness threshold from 1e13 and 1e8 to 4.0e9 and
 3.0e6,
 and I cranked the collect_every on that group from 40 (seconds?) to 5.
 Now I get exabyte peaks instead of petabyte peaks.

  what kind of NIC do you have (1GB, 10 GB)? Which hardware and driver is 
 loaded? What is the average network throughput you see?

 It's the 1 Gbps NIC on the server motherboard, BCM5708 Rev 12.
 dmesg says, Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.5.5b
 (January 31, 2007).

 BCM sounds familiar. Which distro are you using, which kernel?

 I found an ifdef for REMOVE_BOGUS_SPIKES in libmetrics/linux/metrics.c
 Defining it has no effect. 
  Maybe you can add some debugging output and check whether that stuff is 
 triggered at all. Maybe the thresholds are not good anymore.

 Some hints about how to do that would help.  I've tried adding
 err_msg() calls and
 I can't find where the messages go.  They're not in any of the syslog
 channels.
 I don't understand the structure of libmetrics/linux/metrics.c well
 enough to guess
 where it would make sense to open a new log file.

 If daemonized, messages go to syslog. If run in foreground, they go to stderr.

 Just try running the gmond with -d 1 in foreground. You should already get 
 some output in the overflow case.

  And btw. that code does not *remove* bogus spikes from the RRD database. It 
 just is supposed to prevent their generation.

 I realize that.  With each hack to libmetrics/linux/metrics.c, I've
 been stopping gmetad and removing all the
 corrupted rrd files.  I don't know how to edit an rrd file.

 The contrib directory in trunk has the actual removespikes.pl file from 
 the 
 RRD source repository. Useful for updating databases that you do not want to 
 throw away.

 Can anyone tell me the unit of measure which applies to l_bin and l_bout 
 in that file?
 Is it bytes per second, bytes per collect_every, bytes per time_threshold?

  Not completely sure.

 It would be really great if the authors of libmetrics/linux/metrics.c
 would document it.

 Looking at the code, it is per second:

  /*
  ** Compute timediff. Check for bogus delta-t
  */
  float t = timediff(proc_net_dev.last_read,stamp);
  if ( t   proc_net_dev.thresh) {
err_msg(update_ifdata(%s) - Dubious delta-t: %f,caller,t);
return;
  }
  stamp = proc_net_dev.last_read;

  /*
  ** Compute rates in local variables
  */
  l_bin = l_bytes_in / t;
  l_bout = l_bytes_out / t;
  l_pin = l_pkts_in / t;
  l_pout = l_pkts_out / t;

 Cheers
 Martin

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] any workaround for the bogus spikes problem?

2010-02-02 Thread Martin Knoblauch

- Original Message 

 From: Cameron Spitzer cspit...@nvidia.com
 To: ganglia-general@lists.sourceforge.net
 Sent: Tue, February 2, 2010 12:41:46 AM
 Subject: [Ganglia-general] any workaround for the bogus spikes problem?

Hi Cameron,

 We're trying to use Ganglia to monitor some HP DL580-G5 machines.
 We're using a 64-bit linux-2.6.16.

which version of Ganglia?

 The network traffic information is polluted with phantom 20 PB traffic 
 spikes.

 what kind of NIC do you have (1GB, 10 GB)? Which hardware and driver is 
loaded? What is the average network throughput you see?

 I found an ifdef for REMOVE_BOGUS_SPIKES in libmetrics/linux/metrics.c
 Defining it has no effect.  I see in the archive this problem has been 
 around for years.
 Has anyone solved this problem?

 I am kind of surprised that it does not help. When I wrote that hack  a few 
years ago for 3.0.X, it worked perfectely. I was fighting a driver bug that 
caused spurious overruns of the driver counters.

 Maybe you can add some debugging output and check whether that stuff is 
triggered at all. Maybe the thresholds are not good anymore.

 And btw. that code does not *remove* bogus spikes from the RRD database. It 
just is supposed to prevent their generation.

 Can anyone tell me the unit of measure which applies to l_bin and l_bout 
 in that file?
 Is it bytes per second, bytes per collect_every, bytes per time_threshold?

 Not completely sure.

Cheers
Martin

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Line width in small report graphs

2009-09-17 Thread Martin Knoblauch

+1 looks really better

 --
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



- Original Message 
 From: Bernard Li bern...@vanhpc.org
 To: Jesse Becker haw...@gmail.com
 Cc: Ganglia Mailing List ganglia-general@lists.sourceforge.net
 Sent: Wednesday, September 16, 2009 8:31:23 PM
 Subject: Re: [Ganglia-general] Line width in small report graphs
 
 +1 as long as there isn't a compelling reason otherwise ;-)
 
 Cheers,
 
 Bernard
 
 On Wed, Sep 16, 2009 at 11:25 AM, Jesse Becker wrote:
  Right now, the {load,packet,network}_report graphs are all hard-coded
  to use LINE2 for several of the metrics.  This looks quite nice on
  the larger graph sizes (i.e. 'medium' and 'large'), but doesn't look
  quite so good on smaller sizes.  I'd like to change this to LINE1, but
  only for the small graph sizes.
 
  Now, with LINE2:
   http://bayimg.com/PAdpBAACo
 
  Proposed, with LINE1:
   http://bayimg.com/paDpdAAcO
 
  Comments?
 
 
  --
  Jesse Becker
 
  --
  Come build with us! The BlackBerry® Developer Conference in SF, CA
  is the only developer event you need to attend this year. Jumpstart your
  developing skills, take BlackBerry mobile applications to market and stay
  ahead of the curve. Join us from November 9-12, 2009. Register now!
  http://p.sf.net/sfu/devconf
  ___
  Ganglia-general mailing list
  Ganglia-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 
 --
 Come build with us! The BlackBerry® Developer Conference in SF, CA
 is the only developer event you need to attend this year. Jumpstart your
 developing skills, take BlackBerry mobile applications to market and stay 
 ahead of the curve. Join us from November 9-12, 2009. Register now!
 http://p.sf.net/sfu/devconf
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] gmetric fails when disk is unwriteable?

2008-11-26 Thread Martin Knoblauch

- Original Message 

 From: Carlo Marcelo Arenas Belon [EMAIL PROTECTED]
 To: Ofer Inbar [EMAIL PROTECTED]
 Cc: ganglia-general@lists.sourceforge.net
 Sent: Tuesday, November 25, 2008 9:49:22 AM
 Subject: Re: [Ganglia-general] gmetric fails when disk is unwriteable?

 On Fri, Nov 21, 2008 at 11:33:05PM -0500, Ofer Inbar wrote:

  What's the dependency that causes gmetric to require that the
  filesystem the CWD is on be writeable?

 as explained by Brad it is not the CWD that needs to be writeable but a
 TMPDIR (which for root can also be the current directory) and that is
 detected by APR.

 Recent Linux (since around kernel 2.4.16) requires a ramdrive mounted in
 /dev/shm, so one way to workaround this problem is to define :

   TMPDIR=/dev/shm

 Is TMPDIR only used for the include file handler, or also for other stuff. Not 
that we fill memory with lots of unexpected data.

Cheers
Martin

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] crazy network graph spikes

2008-11-20 Thread Martin Knoblauch



  
  There is currently no resolution to this issue.  3.1.x does not fix
  this problem, however you could work around it by doing this:
 
 It could be nice to have the option to suply max values so any data
 bigger than that max get discarded :-) for specific metric offcourse :-)
 
 

 definitely. That would be a useful addition.

Martin


-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Question about the bytes_in and bytes_out reports

2008-09-25 Thread Martin Knoblauch

- Original Message 

 From: Bryan Duxbury [EMAIL PROTECTED]
 To: ganglia-general@lists.sourceforge.net
 Sent: Thursday, September 25, 2008 3:47:48 AM
 Subject: [Ganglia-general] Question about the bytes_in and bytes_out reports

 Hey all,

 I'm a new user of Ganglia. Right now I'm running it on a 7-machine  
 cluster of Centos 5 boxes. Everything appears to be working pretty  
 well, except for the bytes_in and bytes_out graph. It appears to  
 always be zero, no matter how much traffic there is. I think I read  
 in some mailing list thread somewhere that this has to do with having  
 gigabit ethernet on the machines.

 Is this a known issue? Does anyone know what the proper path to a fix  
 would be?

 Thanks,
 Bryan Duxbury

Hi Bryan,

 in short, GigaBit NICs should not cause such problems. We need some more 
information:

- which version of Ganglia are you running? 3.1.x or 3.0.x?
- are the pkts_in pkts_out graphs showing anything useful?

Thanks
Martin

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Anyone experience petabyte peaks in network metric in ganglia 3.x.y ?

2008-09-10 Thread Martin Knoblauch

- Original Message 

 From: Witham, Timothy D [EMAIL PROTECTED]
 To: Escobio, Roger  [EMAIL PROTECTED]; 
 ganglia-general@lists.sourceforge.net 
 ganglia-general@lists.sourceforge.net
 Sent: Tuesday, September 9, 2008 9:42:34 PM
 Subject: Re: [Ganglia-general] Anyone experience petabyte peaks in network 
 metric in ganglia 3.x.y ?

 I am testing ganglia in a cluster of linux but we are getting this
 confusing peaks in the bytes/s and in the packets/s (image attached)

 I have been able to minimize this significantly by using code from svn trunk 
 and 
 building with

 make CPPFLAGS=-DREMOVE_BOGUS_SPIKES

 IMHO, that should be the default.

Hi Tim,

 the problem is that with NICs faster than 1000 Mbit, the naturally occuring 
wrap-arounds will come too frequently (especially for the byte counters) and 
will trigger the remove mechanism and really mess up the data. The better 
solution would be to bring the networking counters in the Linux kernel to 
64-bit (they are 32-bit right now). Then we would not have to care about 
natural wrap-around for a few years. I once proposed this change, but it was 
not greeted with much enthusiasm :-(

 Therefore I #ifdef-ed my check. Especailly as the effect seems to be really a 
very NIC specific bug.

 Escobio - what NICs are in the systems in question (all the same?). As I 
undertand, you are using some 2.6.9 kernel?

Cheers
Martin

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Anyone experience petabyte peaks in network metric in ganglia 3.x.y ?

2008-09-10 Thread Martin Knoblauch



 --
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



- Original Message 
 From: Escobio, Roger  [EMAIL PROTECTED]
 To: ganglia-general@lists.sourceforge.net
 Sent: Wednesday, September 10, 2008 2:40:27 PM
 Subject: Re: [Ganglia-general] Anyone experience petabyte peaks in network 
 metric in ganglia 3.x.y ?
 
 
  -Original Message-
  From: Martin Knoblauch [mailto:[EMAIL PROTECTED] 
  Sent: September 10, 2008 6:55 AM
  To: Witham, Timothy D; Escobio, Roger [CMB-IT]; 
  ganglia-general@lists.sourceforge.net
  Subject: Re: [Ganglia-general] Anyone experience petabyte 
  peaks in network metric in ganglia 3.x.y ?
  
  - Original Message 
  
   From: Witham, Timothy D 
   To: Escobio, Roger  ; 
  ganglia-general@lists.sourceforge.net 
  
   Sent: Tuesday, September 9, 2008 9:42:34 PM
   Subject: Re: [Ganglia-general] Anyone experience petabyte 
  peaks in network metric in ganglia 3.x.y ?
   
   I am testing ganglia in a cluster of linux but we are getting this
   confusing peaks in the bytes/s and in the packets/s (image 
  attached)
   
   I have been able to minimize this significantly by using 
  code from svn trunk and 
   building with
   
   make CPPFLAGS=-DREMOVE_BOGUS_SPIKES
   
   IMHO, that should be the default.
   
  Hi Tim,
  
   the problem is that with NICs faster than 1000 Mbit, the 
  naturally occuring wrap-arounds will come too frequently 
  (especially for the byte counters) and will trigger the 
  remove mechanism and really mess up the data. The better 
  solution would be to bring the networking counters in the 
  Linux kernel to 64-bit (they are 32-bit right now). Then we 
  would not have to care about natural wrap-around for a few 
  years. I once proposed this change, but it was not greeted 
  with much enthusiasm :-(
  
   Therefore I #ifdef-ed my check. Especailly as the effect 
  seems to be really a very NIC specific bug.
  
   Escobio - what NICs are in the systems in question (all the 
  same?). As I undertand, you are using some 2.6.9 kernel?
  
 You right, we have been seeing this random peaks in HP servers with:
 
 Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet
 Broadcom Corporation NetXtreme II BCM5706 Gigabit Ethernet

Hi Escobio,

 I observed the problem on:

 2.6.9-42.ELsmp

 and BCM5708 Gigabit Ethernet (rev 11) NICs with the bnx2 drivers. The 
problem is some weird bug when DMAing the counters. Solved in the 2.6.17 
timeframe IIRC. The fix might even have been backported to RHEL4Ux, where x  4.

 Running 2.6.9 (redhat kernel :-) )
 Kernel 2.4.9 do not seeing affect, right?


 Not sure whether those NICs were supported in the stone age :-)
 
 How good is to have a maxvalue for bytes/s in the definition of the
 metrics? So if the counter's diff give more than that just discard that
 read
 
 I know that that will not solve the packets/s peak but it could be a
 safe check before add the values to stat
 
 I created a patch again linux/metrics.c (3.1.1 version) to add the
 counterdiff function found in *bsd/metrics.c 
 Are you interested in it? Just let me know and I'll send it to the list
 

 Yes please. I am definitely like to have a look at your patch.

Cheers
Martin

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] [Ganglia-developers] [ANNOUNCEMENT] Ganglia 3.1.0 tarball ready fortesting...

2008-07-30 Thread Martin Knoblauch

Hi Craig,

 basically it is summing up all network interfaces with the exception of loX 
and the bonding interfaces (at least for Linux). Per-Interface sampling is 
planned for some future release (not the upcoming 3.1.0).

Cheers
Martin
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de



- Original Message 
From: Craig Simpson [EMAIL PROTECTED]
To: ganglia-general@lists.sourceforge.net
Sent: Tuesday, July 29, 2008 8:12:47 PM
Subject: Re: [Ganglia-general] [Ganglia-developers] [ANNOUNCEMENT] Ganglia 
3.1.0 tarball ready fortesting...



Pardon my uncertainty but about the default checks in /etc/gmond.conf. For the 
Network stuff, what interface is it binding to? How does it figure that out. On 
my cluster I have several interfaces and am doing NIC Bonding on Linux. So 
really I would want to bind that to and alias. 

Thanks!
Craig






-- 
Get Creative!!! @ http://3rdstone.net
Use your BRAIN @ http://brainradar.com
Get Wisdom @ http://www.youtube.com/profile_videos?user=drturistarp=r


In the circle the beginning and the end are common
~ Heraclitis (540-480BC)-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Is there any APIs or DB data I can use to getmetrics?

2008-05-07 Thread Martin Knoblauch

Hi Igor,

 unless you want to rewrite gmetad completely, this is the way to query the 
database. Basically port 8651 gives you everything, while 8652 allows to do 
specific queries. Not sure where/whether the query mechanism is actually 
documented outside the gmetad sources. You can have a look at how the 
web-frontend uses port 8652.

Cheers
Martin
--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de


- Original Message 
 From: Igor Rosenberg [EMAIL PROTECTED]
 To: Hu, Wenzhong  [EMAIL PROTECTED]
 Cc: ganglia-general@lists.sourceforge.net
 Sent: Wednesday, May 7, 2008 9:54:57 AM
 Subject: Re: [Ganglia-general] Is there any APIs or DB data I can use to 
 getmetrics?
 
 Hi
 Well, I looked for a way to make sure ganglia was working. The doc suggests 
 polling these interfaces with telnet. Then I understood this only was opening 
 a 
 socket. I decided to make my own in java when I counld't find any existing 
 example. But I'm not sure it's the best way. I am quite certain there must be 
 a 
 way to perform database queries directly.
 Best
 Igor
 
 -Original Message-
 From: Hu, Wenzhong [mailto:[EMAIL PROTECTED] 
 Sent: miércoles, 07 de mayo de 2008 5:01
 To: Igor Rosenberg
 Cc: ganglia-general@lists.sourceforge.net
 Subject: RE: [Ganglia-general] Is there any APIs or DB data I can use to 
 getmetrics?
 
 Thanks Igor,
 
 How did you find out this method? It's quite amazing.
 
 I will try it on other versions if I have time. And maybe somebody somewhere 
 can 
 try on other versions also, hopefully :)
 
 Regards,
 Stephen
 
 -Original Message-
 From: Igor Rosenberg [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, May 06, 2008 9:57 PM
 To: Hu, Wenzhong [CMB-IT]
 Cc: ganglia-general@lists.sourceforge.net
 Subject: RE: [Ganglia-general] Is there any APIs or DB data I can use to
 getmetrics?
 
 
 Hello,
 I've also come upon the same need, and have resolved (by lack of
 information) to polling directly the gmetad. My solution works for
 version 3.0.6, I've never tested any other. You can connect a socket to
 ports 8651 and 8652 of the machine running gmetad (I don't know what is
 the difference between both ports). You receive an XML file of the last
 status monitored. The schema of the result is provided within the
 answer. I've attached sample output to this mail (one Grid containing
 one cluster containing one machine). To test the gmetad output yourself,
 see it running 
telnet ip 8651 
 where ip is the IP of the machine running gmetad
 
 If you speak java, you may use ganglia in your programs modifying the
 following code snippet :
 
 /**
  * Get a reader on the Ganglia output, whihc you can then parse
 with your prefered XML parser 
  * @see
 http://www.mail-archive.com/[EMAIL PROTECTED]/msg
 03642.html
  **/
 protected BufferedReader openGangliaSocket() throws
 UnknownHostException, IOException {
 String gangliaHost =192.168.1.2;
 int gangliaPort = 8651;
 String socketCall = ;  // another poll string can be
 something matching /GRIDNAME/MACHINENAME/METRIC
 System.out.println(Polling socket  + gangliaHost + :
 + gangliaPort + 
 , cmd =  +
 socketCall);
 Socket gangliaSocket = new Socket(gangliaHost,
 gangliaPort);
 PrintWriter gangliaWriter = new
 PrintWriter(gangliaSocket.getOutputStream(), true);
 gangliaWriter.println(socketCall);
 BufferedReader gangliaReader;
 gangliaReader = new BufferedReader( new
 InputStreamReader(gangliaSocket.getInputStream()) );
 return gangliaReader;
 }
 
 Hope that helps somebody somewhere :)
 
 Igor
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Hu,
 Wenzhong 
 Sent: lunes, 05 de mayo de 2008 15:04
 To: Carlo Marcelo Arenas Belon
 Cc: ganglia-general@lists.sourceforge.net
 Subject: Re: [Ganglia-general] Is there any APIs or DB data I can use to
 getmetrics?
 
 Hi Carlo,
 
 Your explanation is very clear. Now I know where I should start.
 
 Thanks very much indeed.
 Stephen
 
 -Original Message-
 From: Carlo Marcelo Arenas Belon [mailto:[EMAIL PROTECTED]
 Sent: Monday, May 05, 2008 7:30 PM
 To: Hu, Wenzhong [CMB-IT]
 Cc: Ron Wellnitz; ganglia-general@lists.sourceforge.net
 Subject: Re: [Ganglia-general] Is there any APIs or DB data I can use to
 get metrics?
 
 
 On Mon, May 05, 2008 at 06:11:51PM +0800, Hu, Wenzhong  wrote:
  
  What I need is the rrdtool schema or something for Ganglia :)
 
 rrdtool is a time series database, so there is technically no such thing
 as a
 schema (like you would expect on a relational database), as each
 metric is
 stored in an independent file (of fixed size and continuously doing
 summarizations), and the cluster is represented by a directory tree on
 disk.
 
 the definition of which and how many buckets (known as RRAs) to have
 for
 each metric

Re: [Ganglia-general] Need a script to remove spikes from network RRDs

2008-02-29 Thread Martin Knoblauch

- Original 
 From: Martin Knoblauch [EMAIL PROTECTED]
 To: john allspaw [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Cc: ganglia general ganglia-general@lists.sourceforge.net
 Sent: Wednesday, February 27, 2008 8:55:26 AM
 Subject: Re: [Ganglia-general] Need a script to remove spikes from network 
 RRDs

  Original Message 
  From: john allspaw 
  To: Martin Knoblauch ; 
 [EMAIL PROTECTED]
  Cc: ganglia general 
  Sent: Tuesday, February 26, 2008 7:38:07 PM
  Subject: Re: [Ganglia-general] Need a script to remove spikes from 
  network 
 RRDs

  Here is what comes with rrdtool, I've used it with some success...

  http://oss.oetiker.ch/rrdtool/pub/contrib/removespikes.tar.gz

  -john

  cool. Almost what I need. It seems to be a bit to smart for my purpose, but 
 making things stupid is easy :-)

Hi John,

 after adding an option/mode to remove based on value instead of 
bin-distribution the tool did exactely what I needed. I have pushed back my 
changes to the rrd people. Thanks a lot.

 For the meeting: Should we contact the author and ask wheter we can put the 
script into the distribution under cool-stuff?

Cheers
Martin

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Need a script to remove spikes from network RRDs

2008-02-27 Thread Martin Knoblauch

- Original Message 
 From: aurbain [EMAIL PROTECTED]
 To: Martin Knoblauch [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]; ganglia general ganglia-general@lists.sourceforge.net
 Sent: Wednesday, February 27, 2008 5:11:48 PM
 Subject: Re: [Ganglia-general] Need a script to remove spikes from network 
 RRDs

 Thanks for the info Martin.  So its not a rollover issue after all.
 By the way, this issue also lives in rhel4u4 32 bit with bnx2 version
 1.4.43f

 interesting. From my reading only the 64-bit version was affected. Anyway, I 
have a fix which just throws away any samples  where an overflow, correct or 
bogus, occurs. That is definitely fine in 64-bit land. Even at full speed, a 
1GBit NIC would overflow only after 5000 years. Nothing that I worry about 
much :-) Even 5 years for a future 1Tbit NIC is not that bad... But in 32-bit, 
a 1Gbit NIC could overflow every 40 seconds. And that is very short.

Cheers
Martin

 Martin Knoblauch wrote:
  - Original Message 
  From: aurbain 
  To: Martin Knoblauch 
  Cc: [EMAIL PROTECTED]; ganglia general 

  Sent: Tuesday, February 26, 2008 8:25:13 PM
  Subject: Re: [Ganglia-general] Need a script to remove spikes from 
  network 
 RRDs

  Happens only on 64-bit systems. Now, my fix kills the generation of the 
 spikes, but my RRD database is now tainted for another 12 month.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] [Ganglia-developers] Moving all built-in metrics to metric modules...

2007-12-20 Thread Martin Knoblauch

Hi Brad,

 that seems to be a pretty useful move. Seems it is time that I really start 
looking closely at 3.1.x

Cheers
Martin

Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

- Original Message 
 From: Brad Nicholes [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]; ganglia-general@lists.sourceforge.net
 Sent: Tuesday, December 18, 2007 11:44:45 PM
 Subject: [Ganglia-developers] Moving all built-in metrics to metric modules...
 
I just committed a rather substantial patch to Ganglia 3.1.0
 trunk
 
 which will affect the way that gmond 3.1.x is deployed.  I am
 posting
 
 this to both the developer list and the general list so that all will
 be
 
 aware of the changes and why they are important.  The primary
 purpose
 
 for the patch was to remove all of the built in metrics out of the
 gmond
 
 binary and allow them to be built as loadable modules.  The
 following
 
 is a more detailed list of what has changed.  Hopefully from a
 user
 
 perspective, gmond will continue to work as it has in the past.  But
 going
 
 forward, it will be much more flexible with regards to the core set
 of
 
 metrics.
 
 * All built-in metrics have been removed from the gmond binary
   - A new set of core metric modules have been created that
 represent
 
 the same set metrics that gmond has always gathered.  These new
 core
 
 modules are mod_cpu.so, mod_disk.so, mod_load.so, mod_mem.so,
 mod_net.so,
 
 mod_proc.so and mod_sys.so.  Each of these modules is basically
 a
 
 wrapper around the metric functions that exist in libmetrics. 
 Being
 
 wrappers, they still make the same metric function calls as have always
 been
 
 made.  And since libmetrics contains all of the platform specific
 metric
 
 code, the metric function calls made by the core modules will
 continue
 
 to do the right thing for all of the platforms that have
 been
 
 previously supported.  
  - There is also an extra module called core_metrics which contains
 the
 
 heartbeat, location and gexec metrics.  Even though this module
 could
 
 be dynamically loaded in the same manner as the others, it is
 always
 
 statically linked simply because gmond would not be able to
 function
 
 properly without these metrics so there is no real reason to allow
 these
 
 metrics to be dynamically loaded.
   - Some additional configuration has been added to the
 gmond.conf
 
 file.  Because the core metrics are now implemented as modules,
 this
 
 requires a module configuration block that instructs gmond to load
 each
 
 module.  A set of module blocks has been added to the default
 gmond.conf
 
 file.
 
 * All metric specific metadata definitions have been removed
 from
 
 protocol.x
   - With the  refactoring of the XDR data and removal of the
 builtin
 
 metrics, there is no longer any need for XDR to have intimate
 knowledge
 
 of the core metrics.  Therefore the metric structure array and enum
 have
 
 been removed and are now part of the core metric modules themselves.
 
 * --enable-static-build statically links the core metric modules
   - Building gmond statically will statically link not only APR,
 expat
 
 and libconfuse, it will also statically link all of the core
 metric
 
 modules into the gmond binary.  The result should be a gmond binary
 that
 
 looks and feels just like the old 3.0.x statically linked gmond
 binary.
 
  The one exception is that a module statement is still required in
 the
 
 gmond.conf file.  The difference between the module
 configuration
 
 block for dynamically loaded modules and the module blocks for
 statically
 
 linked modules is whether or not a path to the .so is included. 
 The
 
 configure script and makefiles have been modified to
 detect
 
 --enable-static-build and build the default gmond.conf file appropriately.
 
 * --enable-static-build + --enable-python statically links the
 python
 
 module
   - One of the downsides of building gmond 3.1.x statically was
 that
 
 doing so would disable all of the dynamically loadable module
 capability.
 
  The reason for this is the need for both gmond and the
 pluggable
 
 modules to dynamically link with libapr1.  However, if
 both
 
 --enable-static-build and --enable-python are specified during configure, a
 gmond
 
 binary will be built with mod_python statically linked.  This
 provides
 
 gmond with the ability to continue to load and run python metric modules
 in
 
 the same manner as the non-static build.  In other words, even
 though
 
 statically linking gmond will disable pluggable C interface
 modules,
 
 python pluggable modules will still continue to work.
 
 * All metrics carry a group designation
   - Now that all metrics have been implemented as loadable modules,
 the
 
 metrics have also been assigned to groups.  The XML that is
 produced
 
 by gmond and gmetad will carry an  tag
 that
 
 defines which group each metric belongs to.  This will allow the web
 front

Re: [Ganglia-general] Overriding hostname

2007-09-20 Thread Martin Knoblauch


--- Andy Brody [EMAIL PROTECTED] wrote:

 I'd also really like this functionality. A slightly different but 
 related problem: it's been tremendously annoying that gmond on the
 head 
 node doesn't know that data coming from different interfaces of a 
 multihomed machine is really just one machine. Having each gmond pass
 
 some unique per-host identifier other than ip address would be great.
 
 -Andy Brody
 
 Richard Mohr wrote:
  On Thu, 2007-09-20 at 05:44 +0100, richard grevis wrote:
  
  There have been discussions earlier about getting each gmond to
 send a hostname
  rather that using the source address and reverse DNSing it on the
 headnode.
  
  I would definitely like this functionality.
 

 me to.

Martin 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Re: [Ganglia-general] Bad Network data

2007-04-24 Thread Martin Knoblauch

Ian,

 long day :-(

Thanks
Martin


--- Ian Cunningham [EMAIL PROTECTED] wrote:

 Martin,
 
 I think bnx2 is the kernel module for the NIC. B.N.X. meaning
 Broadcom 
 NetXtreme.
 
 Cheers,
 Ian
 
 Martin Knoblauch wrote:
  Hi Jeff,
 
   could you provide me with the output from:
 
  ifconfig -a
  netstat -i
  cat /proc/net/dev
 
   And what is bnx?
 
  Thanks
  Martin
  --- Jeff Blasius [EMAIL PROTECTED] wrote:
 

  Hello Martin,
  Here is some more information regarding the setup. Thank You!
  -jeff
 
  06:00.0 Ethernet controller: Broadcom Corporation NetXtreme II
  BCM5708S Gigabit Ethernet (rev 11)
  0d:00.0 PCI bridge: Intel Corporation 6702PXH PCI Express-to-PCI
  Bridge A (rev 09)
 
  Linux c001 2.6.9-42.ELsmp #1 SMP Tue Aug 15 10:35:26 BST 2006
 x86_64
  x86_64 x86_64 GNU/Linux
 
  Red Hat Enterprise Linux WS release 4 (Nahant Update 4)
 
  [EMAIL PROTECTED] ~]# dmesg |grep eth0
  divert: allocating divert_blk for eth0
  eth0: Broadcom NetXtreme II BCM5708 1000Base-SX (B1) PCI-X 64-bit
  133MHz found at mem f400, IRQ 11, node addr 0015c5f7cc3e
  bnx2: eth0: using MSI
  eth0: no IPv6 routers present
  [EMAIL PROTECTED] ~]# dmesg |grep eth1
  divert: allocating divert_blk for eth1
  eth1: Broadcom NetXtreme II BCM5708 1000Base-SX (B1) PCI-X 64-bit
  133MHz found at mem f800, IRQ 11, node addr 0015c5f7cc3c
  bnx2: eth1: using MSI
  bnx2: eth1 NIC Link is Up, 1000 Mbps full duplex
  eth1: no IPv6 routers present
 
 
  On 4/23/07, Martin Knoblauch [EMAIL PROTECTED] wrote:
  
  Hi Jeff,
 
   what kind of nodes and networking? We have known problems with
 AIX

  and
  
  Gigabit due to overruns in the byte_in/out code.
 
  Cheers
  Martin
  --- Jeff Blasius [EMAIL PROTECTED] wrote:
 

  Hello!
  On one of our clusters, ganglia seems to be reporting erroneous
  network information.
  See:
  http://research.yale.edu/hpc/net.jpeg
  Notice the Pb range spikes? Unfortunately this happens randomly,
  
  at
  
  least once an hour, on single nodes, which makes any real
 network
  information from the cluster Network plot disappear.
 
  This is gmond/gmetad version 3.0.3-1, which is running just fine
  
  on
  
  most of the clusters in our grid. Any ideas? The only unique
  
  network
  
  setup here is that eth0 and eth1 are both up, but only eth1 has
 a
  connection to the switch.
 
  Thank You,
   jeff
 
  --
  Jeff Blasius / [EMAIL PROTECTED]
  Phone: (203)432-9940  51 Prospect Rm. 011
  High Performance Computing (HPC)
  UNIX Systems Administrator, WorkStation Support (WSS)
  Yale University Information Technology Services (ITS)
 
 
  
 

-

  This SF.net email is sponsored by DB2 Express
  Download DB2 Express C - the FREE version of DB2 express and
 take
  control of your XML. No limits. Just data. Click to get it now.
  http://sourceforge.net/powerbar/db2/
  ___
  Ganglia-general mailing list
  Ganglia-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 
  
  --
  Martin Knoblauch
  email: k n o b i AT knobisoft DOT de
  www:   http://www.knobisoft.de
 

  -- 
  Jeff Blasius / [EMAIL PROTECTED]
  Phone: (203)432-9940  51 Prospect Rm. 011
  High Performance Computing (HPC)
  UNIX Systems Administrator, WorkStation Support (WSS)
  Yale University Information Technology Services (ITS)
 
 
  
 
 
  --
  Martin Knoblauch
  email: k n o b i AT knobisoft DOT de
  www:   http://www.knobisoft.de
 
 

-
  This SF.net email is sponsored by DB2 Express
  Download DB2 Express C - the FREE version of DB2 express and take
  control of your XML. No limits. Just data. Click to get it now.
  http://sourceforge.net/powerbar/db2/
  ___
  Ganglia-general mailing list
  Ganglia-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/ganglia-general
 

 
-
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Bad Network data

2007-04-23 Thread Martin Knoblauch

Hi Jeff,

 what kind of nodes and networking? We have known problems with AIX and
Gigabit due to overruns in the byte_in/out code.

Cheers
Martin
--- Jeff Blasius [EMAIL PROTECTED] wrote:

 Hello!
 On one of our clusters, ganglia seems to be reporting erroneous
 network information.
 See:
 http://research.yale.edu/hpc/net.jpeg
 Notice the Pb range spikes? Unfortunately this happens randomly, at
 least once an hour, on single nodes, which makes any real network
 information from the cluster Network plot disappear.
 
 This is gmond/gmetad version 3.0.3-1, which is running just fine on
 most of the clusters in our grid. Any ideas? The only unique network
 setup here is that eth0 and eth1 are both up, but only eth1 has a
 connection to the switch.
 
 Thank You,
  jeff
 
 -- 
 Jeff Blasius / [EMAIL PROTECTED]
 Phone: (203)432-9940  51 Prospect Rm. 011
 High Performance Computing (HPC)
 UNIX Systems Administrator, WorkStation Support (WSS)
 Yale University Information Technology Services (ITS)
 

-
 This SF.net email is sponsored by DB2 Express
 Download DB2 Express C - the FREE version of DB2 express and take
 control of your XML. No limits. Just data. Click to get it now.
 http://sourceforge.net/powerbar/db2/
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Bad Network data

2007-04-23 Thread Martin Knoblauch

Hi Jeff,

 could you provide me with the output from:

ifconfig -a
netstat -i
cat /proc/net/dev

 And what is bnx?

Thanks
Martin
--- Jeff Blasius [EMAIL PROTECTED] wrote:

 Hello Martin,
 Here is some more information regarding the setup. Thank You!
 -jeff
 
 06:00.0 Ethernet controller: Broadcom Corporation NetXtreme II
 BCM5708S Gigabit Ethernet (rev 11)
 0d:00.0 PCI bridge: Intel Corporation 6702PXH PCI Express-to-PCI
 Bridge A (rev 09)
 
 Linux c001 2.6.9-42.ELsmp #1 SMP Tue Aug 15 10:35:26 BST 2006 x86_64
 x86_64 x86_64 GNU/Linux
 
 Red Hat Enterprise Linux WS release 4 (Nahant Update 4)
 
 [EMAIL PROTECTED] ~]# dmesg |grep eth0
 divert: allocating divert_blk for eth0
 eth0: Broadcom NetXtreme II BCM5708 1000Base-SX (B1) PCI-X 64-bit
 133MHz found at mem f400, IRQ 11, node addr 0015c5f7cc3e
 bnx2: eth0: using MSI
 eth0: no IPv6 routers present
 [EMAIL PROTECTED] ~]# dmesg |grep eth1
 divert: allocating divert_blk for eth1
 eth1: Broadcom NetXtreme II BCM5708 1000Base-SX (B1) PCI-X 64-bit
 133MHz found at mem f800, IRQ 11, node addr 0015c5f7cc3c
 bnx2: eth1: using MSI
 bnx2: eth1 NIC Link is Up, 1000 Mbps full duplex
 eth1: no IPv6 routers present
 
 
 On 4/23/07, Martin Knoblauch [EMAIL PROTECTED] wrote:
  Hi Jeff,
 
   what kind of nodes and networking? We have known problems with AIX
 and
  Gigabit due to overruns in the byte_in/out code.
 
  Cheers
  Martin
  --- Jeff Blasius [EMAIL PROTECTED] wrote:
 
   Hello!
   On one of our clusters, ganglia seems to be reporting erroneous
   network information.
   See:
   http://research.yale.edu/hpc/net.jpeg
   Notice the Pb range spikes? Unfortunately this happens randomly,
 at
   least once an hour, on single nodes, which makes any real network
   information from the cluster Network plot disappear.
  
   This is gmond/gmetad version 3.0.3-1, which is running just fine
 on
   most of the clusters in our grid. Any ideas? The only unique
 network
   setup here is that eth0 and eth1 are both up, but only eth1 has a
   connection to the switch.
  
   Thank You,
jeff
  
   --
   Jeff Blasius / [EMAIL PROTECTED]
   Phone: (203)432-9940  51 Prospect Rm. 011
   High Performance Computing (HPC)
   UNIX Systems Administrator, WorkStation Support (WSS)
   Yale University Information Technology Services (ITS)
  
  
 

-
   This SF.net email is sponsored by DB2 Express
   Download DB2 Express C - the FREE version of DB2 express and take
   control of your XML. No limits. Just data. Click to get it now.
   http://sourceforge.net/powerbar/db2/
   ___
   Ganglia-general mailing list
   Ganglia-general@lists.sourceforge.net
   https://lists.sourceforge.net/lists/listinfo/ganglia-general
  
  
 
 
  --
  Martin Knoblauch
  email: k n o b i AT knobisoft DOT de
  www:   http://www.knobisoft.de
 
 
 
 -- 
 Jeff Blasius / [EMAIL PROTECTED]
 Phone: (203)432-9940  51 Prospect Rm. 011
 High Performance Computing (HPC)
 UNIX Systems Administrator, WorkStation Support (WSS)
 Yale University Information Technology Services (ITS)
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Help! I have a petabyte/s network

2007-03-29 Thread Martin Knoblauch

David,

 good catch. I will have to look at it for a bit.

Cheers
Martin
--- David Wong [EMAIL PROTECTED] wrote:

 I don't write much code nowadays, so I'm going to need a lot of help
 with this.
 
 I dug through the ganglia code and I found this interesting tidbit in
 libmetrics/aix/metrics.c which may be indicative of the problem.
 
 There's an assignment from cur_ninfo.ibytes to cur_net_stat.ibytes,
 but
 the types of the two variables are different.
 
 net_stat::ibytes is a double: 
 
 struct net_stat{
   double ipackets;
   double opackets;
   double ibytes;
   double obytes;
 } cur_net_stat;
 
 and we have *ninfo declared here:
 
 perfstat_netinterface_total_t ninfo[2],*last_ninfo, *cur_ninfo ;
 
 libperfstat.h has perfstat_netinterface_total_t::ibytes as
 u_longlong_t.
 
 Does this code try to do what I think it is doing, i.e. assign an
 unsigned 64 bit integer to a signed 64bit integer?
 
 I'm willing to test the code if someone who's more adept at coding
 and
 building will take on the challenge.
 
 It looks to me that the type mismatch will have to fixed in a few
 places, such as CALC_NETSTAT, and we'll have to add an unsigned long
 long to g_val_t too.  Those are the ones I can see so far.
 
 David Wong
 Senior Systems Engineer
 Management Dynamics, Inc.
 Phone: 201-804-6127
 [EMAIL PROTECTED]
 
 -Original Message-
 From: Martin Knoblauch [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, March 28, 2007 12:00 PM
 To: David Wong; ganglia-general@lists.sourceforge.net
 Subject: Re: [Ganglia-general] Help! I have a petabyte/s network
 
 David,
 
  as far as I remember, the AIX metrics code had an
 overflow/wrap-around
 problem prior to 3.0.4. Maybe the fixes are not thorough enough.
 
  The packets/sec are of course less affected.
 
 Cheers
 Martin
 
 --- David Wong [EMAIL PROTECTED] wrote:
 
  Ganglia is reporting that I'm pushing up to 200 Petabytes/s through
  my
  network.  Nobody tell the network admin!
  
  I'm running Ganglia 3.0.4 with the Power5 add-ons on AIX5.3
  
  Bytes in and out statistics generally appear to have the right
  values.
  However at random times, I get spikes in the petabytes/s range.
  
  Here's a dump of the bytes_in database.  At first, I suspected
  perhaps
  these coincide with some counters getting reset, but they don't
 occur
  at
  regular intervals.
  
  !-- 2007-03-27 20:42:00 GMT / 1175028120
 --
  rowv 1.9268390706e+05 /v/row
  !-- 2007-03-27 20:48:00 GMT / 1175028480
 --
  rowv 1.5833184975e+05 /v/row
  !-- 2007-03-27 20:54:00 GMT / 1175028840
 --
  rowv 1.6838302753e+05 /v/row
  !-- 2007-03-27 21:00:00 GMT / 1175029200
 --
  rowv 1.3766069592e+05 /v/row
  !-- 2007-03-27 21:06:00 GMT / 1175029560
 --
  rowv 2.1711888414e+05 /v/row
  !-- 2007-03-27 21:12:00 GMT / 1175029920
 --
  rowv 4.9959709273e+16 /v/row
  !-- 2007-03-27 21:18:00 GMT / 1175030280
 --
  rowv 1.7401339783e+05 /v/row
  !-- 2007-03-27 21:24:00 GMT / 1175030640
 --
  rowv 2.0955720861e+05 /v/row
  !-- 2007-03-27 21:30:00 GMT / 1175031000
 --
  rowv 1.9032255300e+05 /v/row
  !-- 2007-03-27 21:36:00 GMT / 1175031360
 --
  rowv 1.9162727036e+05 /v/row
  !-- 2007-03-27 21:42:00 GMT / 1175031720
 --
  rowv 1.2703790825e+05 /v/row
  
  Can anyone shed light on what might be happening?  Any pointers for
  debugging?
  
  David Wong
  Senior Systems Engineer
  Management Dynamics, Inc.
  Phone: 201-804-6127
  [EMAIL PROTECTED]
  
  
  
 


 -
  Take Surveys. Earn Cash. Influence the Future of IT
  Join SourceForge.net's Techsay panel and you'll get the chance to
  share your
  opinions on IT  business topics through brief surveys-and earn
 cash
 

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDE
 V
  ___
  Ganglia-general mailing list
  Ganglia-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/ganglia-general
  
  
 
 
 --
 Martin Knoblauch
 email: k n o b i AT knobisoft DOT de
 www:   http://www.knobisoft.de
 
 
 

-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys-and earn cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b

Re: [Ganglia-general] gmetad patch to contact random data_source hosts

2007-03-29 Thread Martin Knoblauch

Tim,

 your diff command looks a bit surprising to me. The revision number
looks like CVS to me and we are SVN since quite some time.

 Which version of Ganglia have you checked out?

Cheers
Martin
--- Witham, Timothy D [EMAIL PROTECTED] wrote:

 Hi,
 
 I just had a situation where the first host in a gmetad data_source
 accepts the connection but offers no data, like this:
 
   poll() timeout for [clustername] data source after 0 bytes read
 
 Gmetad always tries the sources in order and so it just keeps getting
 stuck on this first one, and losing the data for the entire cluster.
 
 Here is a quick patch that tries random hosts from the list instead,
 and solved my problem.  It is not careful to make sure it tried them
 all, but if it fails it will just try again next time.  If someone
 wants to fix it to try all the sources in a random order, that would
 be fine.  Perhaps this could be included in the next release unless
 someone knows a good reason to always try the sources in order.
 
 Thanks!
 
 -8-
 diff -c -r1.1.1.1 data_thread.c
 *** data_thread.c 19 Mar 2007 18:52:32 -  1.1.1.1
 --- data_thread.c 28 Mar 2007 18:12:08 -
 ***
 *** 18,24 
   void *
   data_thread ( void *arg )
   {
 !int i, sleep_time, bytes_read, rval;
  data_source_list_t *d = (data_source_list_t *)arg;
  g_inet_addr *addr;
  g_tcp_socket *sock=0;
 --- 18,24 
   void *
   data_thread ( void *arg )
   {
 !int i, j, sleep_time, bytes_read, rval;
  data_source_list_t *d = (data_source_list_t *)arg;
  g_inet_addr *addr;
  g_tcp_socket *sock=0;
 ***
 *** 60,75 
if(d-last_good_index = 0)
  sock = g_tcp_socket_new ( d-sources[d-last_good_index] );
   
 !  /* If there was no good connection last time or the above
 connect failed then try each host in the list. */
if(!sock)
  {
 !  for(i=0; i  d-num_sources; i++)
  {
 !  /* Find first viable source in list. */
 !  sock = g_tcp_socket_new ( d-sources[i] );
if( sock )
  {
 !  d-last_good_index = i;
break;
  }
  }
 --- 60,80 
if(d-last_good_index = 0)
  sock = g_tcp_socket_new ( d-sources[d-last_good_index] );
   
 !  /* If there was no good connection last time or the above
 ! connect failed then try random hosts in the list.  We try
 ! random ones in case someone is accepting the connection
 ! but refusing to provide any data; we don't want to get
 ! stuck with a non-working host. */
if(!sock)
  {
 !  for(i=0; i  d-num_sources * 2; i++)
  {
 !  /* Find random viable source in list. */
 !  j = d-num_sources * (rand() / (RAND_MAX - 1.0));
 !  sock = g_tcp_socket_new ( d-sources[j] );
if( sock )
  {
 !  d-last_good_index = j;
break;
  }
  }
 -8--
 
 -- 
 [EMAIL PROTECTED]; I don't speak for Intel or anyone.
 

-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys-and earn cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Gmetad and web frontend on different machines.

2007-03-29 Thread Martin Knoblauch

Richard,

 depending on the cluster size, writing the RRDs via NFS might turn out
to be a huge bottleneck.

Cheers
Martin
--- [EMAIL PROTECTED] wrote:

 Saundry,
  
 It sort of looks like you can, but actually you can't.
 gmetad writes to rrd databases as local files,
 and the web and php read rrd databases as local
 (actually it invokes rrdtool itself).
  
 I imagine you could separate the two using NFS filessystems,
 but I have not tried this.
 
 kind regards,
 
 Richard Grevis 
 Production Architecture 
 Barclays Capital, Canary Wharf, London, E14 4BB 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of
 saundrya mishra
 Sent: 29 March 2007 14:30
 To: ganglia-general@lists.sourceforge.net
 Subject: [Ganglia-general] Gmetad and web frontend on different
 machines.
 
 
 
   Hi There,
   
   I am new to Ganglia. Can we have gmetad and web frontend for a
 cluster to be running on two different machines?? If yes, then how is
 it
 possible since i read in the configuration file of the web frontend
 that
 the RRDTool databases  need to be local to be read? 
   
   Greetings,
   Saundrya.
   
 
 


 For more information about Barclays Capital, please visit our web
 site at http://www.barcap.com.
 
 Internet communications are not secure and therefore the Barclays
 Group does not accept legal responsibility for the contents of this
 message.  Although the Barclays Group operates anti-virus programmes,
 it does not accept responsibility for any damage whatsoever that is
 caused by viruses being passed.  Any views or opinions presented are
 solely those of the author and do not necessarily represent those of
 the Barclays Group.  Replies to this email may be monitored by the
 Barclays Group for operational or business reasons.


 
-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys-and earn cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia custom Round-Robin archives RRA

2007-03-29 Thread Martin Knoblauch

Hi,

 the definition in gmetad.conf is only for new RRD files. There are two
options:

- throw your data away
- modify the old data. If you look at bugzilla #33 you will find an
attached script that should do what you want. It is not in the sources
because I am lazy and the Licensing is not clear yet.

Cheers
Martin
--- CASTRO Paulo Edgar [EMAIL PROTECTED] wrote:

 Hi all.
 
 We have been testing ganglia here implemented in about 250 machines.
 By the way, good job on the tool guys.
 
 We've been peeking at the conf files namely gmetad.conf and we found
 this commented option about Custom Round-Robin archives.
 The thing is, we wanted to be able to have a RRA of our own who could
 aggregate all the 5 minute PDP for a whole year. See what I mean ;),
 So
 we wouldn't lose granularity while reading directly from the rrd
 files.
 
 We tried adding this to the gmetad.conf 
 RRAs RRA:AVERAGE:0.5:1:105408 being 105408 the number of 5 minutes
 in
 a year.
 
 But we still haven't noticed any change nor the rrd files have grown
 enough to accommodate the new RRA.
 
 How can we manage to do this?
 Do we need to start the whole colection process again, erasing the
 previous data and files?
 Will it work with this new option?
 Is this syntax for the conf file correct?
 
 Tkx in advance,
 
 
   PECastro
 
 
-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys-and earn cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Help! I have a petabyte/s network

2007-03-29 Thread Martin Knoblauch

David,

 after some looking at CALC_NETSTAT I see no *type* problems here:

#define CALC_NETSTAT(type) (double)
  ((cur_ninfo-typelast_ninfo-type)?
 -1:(cur_ninfo-type - last_ninfo-type)/timediff)

 cur_ninfo-type and last_ninfo-type are of the same type and the
macro will just return a double float of either -1 or a positive rate.

 It would be interesting to see the values of cur_ninfo-type,
last_ninfo-type and timediff when you observe the petabyte
performance. Can you add some debug statements around lines 873-876?

Cheers
Martin

--- David Wong [EMAIL PROTECTED] wrote:

 I don't write much code nowadays, so I'm going to need a lot of help
 with this.
 
 I dug through the ganglia code and I found this interesting tidbit in
 libmetrics/aix/metrics.c which may be indicative of the problem.
 
 There's an assignment from cur_ninfo.ibytes to cur_net_stat.ibytes,
 but
 the types of the two variables are different.
 
 net_stat::ibytes is a double: 
 
 struct net_stat{
   double ipackets;
   double opackets;
   double ibytes;
   double obytes;
 } cur_net_stat;
 
 and we have *ninfo declared here:
 
 perfstat_netinterface_total_t ninfo[2],*last_ninfo, *cur_ninfo ;
 
 libperfstat.h has perfstat_netinterface_total_t::ibytes as
 u_longlong_t.
 
 Does this code try to do what I think it is doing, i.e. assign an
 unsigned 64 bit integer to a signed 64bit integer?
 
 I'm willing to test the code if someone who's more adept at coding
 and
 building will take on the challenge.
 
 It looks to me that the type mismatch will have to fixed in a few
 places, such as CALC_NETSTAT, and we'll have to add an unsigned long
 long to g_val_t too.  Those are the ones I can see so far.
 
 David Wong
 Senior Systems Engineer
 Management Dynamics, Inc.
 Phone: 201-804-6127
 [EMAIL PROTECTED]
 
 -Original Message-
 From: Martin Knoblauch [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, March 28, 2007 12:00 PM
 To: David Wong; ganglia-general@lists.sourceforge.net
 Subject: Re: [Ganglia-general] Help! I have a petabyte/s network
 
 David,
 
  as far as I remember, the AIX metrics code had an
 overflow/wrap-around
 problem prior to 3.0.4. Maybe the fixes are not thorough enough.
 
  The packets/sec are of course less affected.
 
 Cheers
 Martin
 
 --- David Wong [EMAIL PROTECTED] wrote:
 
  Ganglia is reporting that I'm pushing up to 200 Petabytes/s through
  my
  network.  Nobody tell the network admin!
  
  I'm running Ganglia 3.0.4 with the Power5 add-ons on AIX5.3
  
  Bytes in and out statistics generally appear to have the right
  values.
  However at random times, I get spikes in the petabytes/s range.
  
  Here's a dump of the bytes_in database.  At first, I suspected
  perhaps
  these coincide with some counters getting reset, but they don't
 occur
  at
  regular intervals.
  
  !-- 2007-03-27 20:42:00 GMT / 1175028120
 --
  rowv 1.9268390706e+05 /v/row
  !-- 2007-03-27 20:48:00 GMT / 1175028480
 --
  rowv 1.5833184975e+05 /v/row
  !-- 2007-03-27 20:54:00 GMT / 1175028840
 --
  rowv 1.6838302753e+05 /v/row
  !-- 2007-03-27 21:00:00 GMT / 1175029200
 --
  rowv 1.3766069592e+05 /v/row
  !-- 2007-03-27 21:06:00 GMT / 1175029560
 --
  rowv 2.1711888414e+05 /v/row
  !-- 2007-03-27 21:12:00 GMT / 1175029920
 --
  rowv 4.9959709273e+16 /v/row
  !-- 2007-03-27 21:18:00 GMT / 1175030280
 --
  rowv 1.7401339783e+05 /v/row
  !-- 2007-03-27 21:24:00 GMT / 1175030640
 --
  rowv 2.0955720861e+05 /v/row
  !-- 2007-03-27 21:30:00 GMT / 1175031000
 --
  rowv 1.9032255300e+05 /v/row
  !-- 2007-03-27 21:36:00 GMT / 1175031360
 --
  rowv 1.9162727036e+05 /v/row
  !-- 2007-03-27 21:42:00 GMT / 1175031720
 --
  rowv 1.2703790825e+05 /v/row
  
  Can anyone shed light on what might be happening?  Any pointers for
  debugging?
  
  David Wong
  Senior Systems Engineer
  Management Dynamics, Inc.
  Phone: 201-804-6127
  [EMAIL PROTECTED]
  
  
  
 


 -
  Take Surveys. Earn Cash. Influence the Future of IT
  Join SourceForge.net's Techsay panel and you'll get the chance to
  share your
  opinions on IT  business topics through brief surveys-and earn
 cash
 

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDE
 V
  ___
  Ganglia-general mailing list
  Ganglia-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/ganglia-general
  
  
 
 
 --
 Martin Knoblauch
 email: k n o b i AT knobisoft DOT de
 www:   http://www.knobisoft.de
 
 
 

-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join

Re: [Ganglia-general] Help! I have a petabyte/s network

2007-03-28 Thread Martin Knoblauch

David,

 as far as I remember, the AIX metrics code had an overflow/wrap-around
problem prior to 3.0.4. Maybe the fixes are not thorough enough.

 The packets/sec are of course less affected.

Cheers
Martin

--- David Wong [EMAIL PROTECTED] wrote:

 Ganglia is reporting that I'm pushing up to 200 Petabytes/s through
 my
 network.  Nobody tell the network admin!
 
 I'm running Ganglia 3.0.4 with the Power5 add-ons on AIX5.3
 
 Bytes in and out statistics generally appear to have the right
 values.
 However at random times, I get spikes in the petabytes/s range.
 
 Here's a dump of the bytes_in database.  At first, I suspected
 perhaps
 these coincide with some counters getting reset, but they don't occur
 at
 regular intervals.
 
 !-- 2007-03-27 20:42:00 GMT / 1175028120 --
 rowv 1.9268390706e+05 /v/row
 !-- 2007-03-27 20:48:00 GMT / 1175028480 --
 rowv 1.5833184975e+05 /v/row
 !-- 2007-03-27 20:54:00 GMT / 1175028840 --
 rowv 1.6838302753e+05 /v/row
 !-- 2007-03-27 21:00:00 GMT / 1175029200 --
 rowv 1.3766069592e+05 /v/row
 !-- 2007-03-27 21:06:00 GMT / 1175029560 --
 rowv 2.1711888414e+05 /v/row
 !-- 2007-03-27 21:12:00 GMT / 1175029920 --
 rowv 4.9959709273e+16 /v/row
 !-- 2007-03-27 21:18:00 GMT / 1175030280 --
 rowv 1.7401339783e+05 /v/row
 !-- 2007-03-27 21:24:00 GMT / 1175030640 --
 rowv 2.0955720861e+05 /v/row
 !-- 2007-03-27 21:30:00 GMT / 1175031000 --
 rowv 1.9032255300e+05 /v/row
 !-- 2007-03-27 21:36:00 GMT / 1175031360 --
 rowv 1.9162727036e+05 /v/row
 !-- 2007-03-27 21:42:00 GMT / 1175031720 --
 rowv 1.2703790825e+05 /v/row
 
 Can anyone shed light on what might be happening?  Any pointers for
 debugging?
 
 David Wong
 Senior Systems Engineer
 Management Dynamics, Inc.
 Phone: 201-804-6127
 [EMAIL PROTECTED]
 
 
 

-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys-and earn cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] mcast_ttl in 3.0 gmond.conf

2007-03-14 Thread Martin Knoblauch


--- Ian Cunningham [EMAIL PROTECTED] wrote:

 Gil,
 
 Gilad Raphaelli wrote:
  Hello,
 
I'm having a problem increasing gmond's multicast packet ttl. 
 I've tried putting mcast_ttl on a line of its own and inside the
 global { } and udp_send_channel {} directives and always get
 gmond.conf parsing errors when trying to start gmond-3.0.4.  Any
 pointers on where mcast_ttl can be set?
 
  The error message is:
 
  gmond.conf:200: no such option 'mcast_ttl'
 
  Finally, mcast_ttl doesn't appear in gmond -t - has this
 functionality been removed altogether?
 
  Thanks,
 
  Gil
 I no longer use multicast so I not sure it works, but from looking at
 
 the source code, It looks like it was changed to 'ttl' under 
 'udp_send_channel'.
 

 which is even correctly documented in the shipping tarball. We should
update the stuff on the weg-page though ...

Cheers
Martin


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] PBS Queue visualisation

2007-01-16 Thread Martin Knoblauch

Adam,

 look at the report/compound graphs in web/graph.php They should
basically do what you want.

Cheers
Martin
--- Adam Gray [EMAIL PROTECTED] wrote:

 I'm running ganglia on a cluster managed with OpenPBS. I have made a
 few
 extra metrics for monitoring CPU temp and batch system jobs on each
 node. I was wondering how I could go about making a sort of cluster
 queue usage graph. Each queue would pile on top of each other the
 number
 of nodes it is using.
 
 E.g. if queue1 was using 24 of 124 available nodes, and queue2 was
 using
 96, there would be a section at the bottom 20% and a different
 colored
 section on the next 75%, and the top 5% would be empty.
 
-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys - and earn
 cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] XML error: no element found at 1

2007-01-16 Thread Martin Knoblauch

Ashutok,

 you need to do a query if you use port 8562 (the web interface
does). What happens if you do telnet localhost 8561. That should give
you the complete gmetad XML stream.

 Is the rrdroot directory writable to the owner of the gmetad
process? It should belong to e.g. nobody. This is a common mistake.

cheers
Martin
--- Ashutosh Mahajan [EMAIL PROTECTED] wrote:

 hello everyone,
We are having problems installing ganglia version 3.0.4 with
 rrdtool-1.2.15.
 we can successfully do make, make install. gstat -a also seems to
 work.
 telnet localhost 8649 seems to throw out correct XML file. However,
 gmetad
 seems to be having some problems.
 
 telnet localhost 8652 seems to hang forever with the message:
 Trying 127.0.0.1...
 Connected to localhost.
 Escape character is '^]'.
 
 if i access ganglia through the web, i get this message after a long 
 
 long time:
 There was an error collecting ganglia data (192.168.1.1:8652): XML
 error: no
 element found at 1
 
 rrd_rootdir also remains empty. what could be wrong? i can provide
 more
 details if necessary.
 
 thanks in advance.
 -- 
 Regards
 Ashutosh
 www.lehigh.edu/~asm4
 
 
 
 This message was sent using IMP, the Internet Messaging Program.
 
 

-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys - and earn
 cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Two similar linux hosts provides different metrics

2007-01-16 Thread Martin Knoblauch

Vitaly,

 in this case try to run gmond with a debug level higher that 2.
Maybe this sheds some light on it.

 Or, you could add debug statements to the proc_run_func and
proc_total_func code.

 But: first of all show us the output of cat /proc/loadavg on both
nodes.

cheers
Martin
--- Vitaly Karasik [EMAIL PROTECTED] wrote:

 It seems like we have different numbers in gmond:
 
 HOST NAME=5.5.5.5 IP=5.5.5.5 REPORTED=1168934873 TN=2
 TMAX=20
 DMAX=0 LOCATION=unspecified GMOND_STARTED=1166534354 
 ..
 METRIC NAME=proc_total VAL=185 TYPE=uint32 UNITS= TN=229
 TMAX=950 DMAX=0 SLOPE=both SOURCE=gmond/
 ..
 METRIC NAME=proc_run VAL=0 TYPE=uint32 UNITS= TN=229
 TMAX=950 DMAX=0 SLOPE=both SOURCE=gmond/
 
 
 HOST NAME=5.5.5.6 IP=5.5.5.6 REPORTED=1168934871 TN=3
 TMAX=20
 DMAX=0 LOCATION=unspecified GMOND_STARTED=1166534349
 
 METRIC NAME=proc_run VAL=15 TYPE=uint32 UNITS= TN=68
 TMAX=950 DMAX=0 SLOPE=both SOURCE=gmond/
 
 METRIC NAME=proc_total VAL=439 TYPE=uint32 UNITS= TN=68
 TMAX=950 DMAX=0 SLOPE=both SOURCE=gmond/
 
 Thanks,
 Vitaly
 
  -Original Message-
  From: Martin Knoblauch [mailto:[EMAIL PROTECTED] 
  Sent: Monday, January 15, 2007 12:30 PM
  To: Vitaly Karasik; ganglia-general@lists.sourceforge.net
  Subject: RE: [Ganglia-general] Two similar linux hosts 
  provides different metrics
  
  Hi Vitaly,
  
   where do you see the invalid numbers:
  
  a) in the gmond XML Stream (telnet/nc to the gmond XML port)
  b) in the XML Stream from gmetad (telnet/nc to the gmetad XML port)
  c) only in the web-frontend
  
  Cheers
  Martin
  --- Vitaly Karasik [EMAIL PROTECTED] wrote:
  
   NON-BUSY HOST:
   # ps axl|wc
61 8625865
   # uptime
08:54:55  up 204 days,  2:00,  1 user,  load average: 0.00,
 0.00, 
   0.00
   
   BUSY HOST
]# ps axl|wc
62 8775977
]# uptime
08:55:18  up 31 days, 16:30,  1 user,  load average: 0.04, 
  0.01, 0.00

   
-Original Message-
From: Martin Knoblauch [mailto:[EMAIL PROTECTED] 
Sent: Thursday, January 11, 2007 10:54 AM
To: Vitaly Karasik; ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] Two similar linux hosts 
provides different metrics

Hi Vitaly,

 what does ps axl show on both hosts, as that is basically 
what gmond looks at? If it is already different there, the 
problem is not ganglia related. (OK, I see you already
 checked
   ...)

 What are the load averages according to uptime?

Cheers
Martin


--- Vitaly Karasik [EMAIL PROTECTED] wrote:

   Hi,
 
 I have a weird problem - two linux hosts with similar
   configuration 
 provide very different metrics about  number of running 
  processes
   - 
 one shows about 2, and second about 20-40 (I speak about 
concentrated 
 load
 graph at top right.)
 proc_total is different too - 171 vs. 350 (BTW,  ps -ef |wc 
== 61 on 
 both boxes)
 
 Both machines are RHEL3 kernel 2.4.21-37.ELsmp with
 ganglia-gmond-3.0.3-1 installed from RPM.
 
 Any ideas?
 Thanks,
 Vitaly
 
  
 

--
---
 Take Surveys. Earn Cash. Influence the Future of IT Join 
 SourceForge.net's Techsay panel and you'll get the chance to
   share 
 your opinions on IT  business topics through brief surveys 
- and earn 
 cash

http://www.techsay.com/default.php?page=join.phpp=sourceforge
CID=DEVDEV
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

   
   
  
  
  --
  Martin Knoblauch
  email: k n o b i AT knobisoft DOT de
  www:   http://www.knobisoft.de
  
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Two similar linux hosts provides different metrics

2007-01-16 Thread Martin Knoblauch

Vitaly,

 gmond on Linux just interprets the fourth filed of /proc/loadavg. The
number in front of the slash is the number of running processes, the
number following the slash is the total number of processes.

Cheers
Martin
 
--- Vitaly Karasik [EMAIL PROTECTED] wrote:

 .5:
  cat /proc/loadavg
 0.04 0.06 0.01 1/185 10512
 
 .6:  cat /proc/loadavg
 1.03 1.01 1.00 2/441 19965 
 
 Oops! I think I'm starting to  understand - number of processes on
 both
 machines are the same, but number the threads are different. probably
 gmond counts threads, not processes:
 
 .5: ps -ef|wc
  64
  ps -efm|wc
 187
 
 .6:
   ps -ef|wc
  62 
   ps -efm|wc
 441   
 
 
  -Original Message-
  From: Martin Knoblauch [mailto:[EMAIL PROTECTED] 
  Sent: Tuesday, January 16, 2007 11:59 AM
  To: Vitaly Karasik; [EMAIL PROTECTED]; 
  ganglia-general@lists.sourceforge.net
  Subject: RE: [Ganglia-general] Two similar linux hosts 
  provides different metrics
  
  Vitaly,
  
   in this case try to run gmond with a debug level higher that 2.
  Maybe this sheds some light on it.
  
   Or, you could add debug statements to the proc_run_func and 
  proc_total_func code.
  
   But: first of all show us the output of cat /proc/loadavg 
  on both nodes.
  
  cheers
  Martin
  --- Vitaly Karasik [EMAIL PROTECTED] wrote:
  
   It seems like we have different numbers in gmond:
   
   HOST NAME=5.5.5.5 IP=5.5.5.5 REPORTED=1168934873 TN=2
   TMAX=20
   DMAX=0 LOCATION=unspecified GMOND_STARTED=1166534354
 ..
   METRIC NAME=proc_total VAL=185 TYPE=uint32 UNITS=
 TN=229
   TMAX=950 DMAX=0 SLOPE=both SOURCE=gmond/ ..
   METRIC NAME=proc_run VAL=0 TYPE=uint32 UNITS= TN=229
   TMAX=950 DMAX=0 SLOPE=both SOURCE=gmond/
   
   
   HOST NAME=5.5.5.6 IP=5.5.5.6 REPORTED=1168934871 TN=3
   TMAX=20
   DMAX=0 LOCATION=unspecified GMOND_STARTED=1166534349 
   METRIC NAME=proc_run VAL=15 TYPE=uint32 UNITS= TN=68
   TMAX=950 DMAX=0 SLOPE=both SOURCE=gmond/ 
   METRIC NAME=proc_total VAL=439 TYPE=uint32 UNITS=
 TN=68
   TMAX=950 DMAX=0 SLOPE=both SOURCE=gmond/
   
   Thanks,
   Vitaly
   
-Original Message-
From: Martin Knoblauch [mailto:[EMAIL PROTECTED]
Sent: Monday, January 15, 2007 12:30 PM
To: Vitaly Karasik; ganglia-general@lists.sourceforge.net
Subject: RE: [Ganglia-general] Two similar linux hosts provides
 
different metrics

Hi Vitaly,

 where do you see the invalid numbers:

a) in the gmond XML Stream (telnet/nc to the gmond XML port)
b) in the XML Stream from gmetad (telnet/nc to the gmetad 
  XML port)
c) only in the web-frontend

Cheers
Martin
--- Vitaly Karasik [EMAIL PROTECTED] wrote:

 NON-BUSY HOST:
 # ps axl|wc
  61 8625865
 # uptime
  08:54:55  up 204 days,  2:00,  1 user,  load average: 0.00,
   0.00,
 0.00
 
 BUSY HOST
  ]# ps axl|wc
  62 8775977
  ]# uptime
  08:55:18  up 31 days, 16:30,  1 user,  load average: 0.04,
0.01, 0.00
  
 
  -Original Message-
  From: Martin Knoblauch [mailto:[EMAIL PROTECTED]
  Sent: Thursday, January 11, 2007 10:54 AM
  To: Vitaly Karasik; ganglia-general@lists.sourceforge.net
  Subject: Re: [Ganglia-general] Two similar linux 
  hosts provides 
  different metrics
  
  Hi Vitaly,
  
   what does ps axl show on both hosts, as that is
 basically 
  what gmond looks at? If it is already different there, the 
  problem is not ganglia related. (OK, I see you already
   checked
 ...)
  
   What are the load averages according to uptime?
  
  Cheers
  Martin
  
  
  --- Vitaly Karasik [EMAIL PROTECTED] wrote:
  
 Hi,
   
   I have a weird problem - two linux hosts with similar
 configuration
   provide very different metrics about  number of running
processes
 -
   one shows about 2, and second about 20-40 (I speak about
  concentrated
   load
   graph at top right.)
   proc_total is different too - 171 vs. 350 (BTW,  ps -ef
 |wc
  == 61 on
   both boxes)
   
   Both machines are RHEL3 kernel 2.4.21-37.ELsmp with
   ganglia-gmond-3.0.3-1 installed from RPM.
   
   Any ideas?
   Thanks,
   Vitaly
   

   
  
 
 --
  ---
   Take Surveys. Earn Cash. Influence the Future of IT Join 
   SourceForge.net's Techsay panel and you'll get the chance
 to
 share
   your opinions on IT  business topics through brief
 surveys
  - and earn
   cash
  
 
 http://www.techsay.com/default.php?page=join.phpp=sourceforge
  CID=DEVDEV
   ___
   Ganglia-general mailing list
   Ganglia-general@lists.sourceforge.net
  
 https://lists.sourceforge.net

Re: [Ganglia-general] XML error: no element found at 1

2007-01-16 Thread Martin Knoblauch

Hi Ashutosh,

 sorry for the wrong port. I meant of course 8651.

 You could try to run gmetad with a high debug level. This could help
to track down the problem.

 Also, could you please post the gmetad.conf file?

Cheers
Martin
--- Ashutosh Mahajan [EMAIL PROTECTED] wrote:

 Quoting Martin Knoblauch [EMAIL PROTECTED]:
 
  Ashutok,
 
   you need to do a query if you use port 8562 (the web interface
  does). What happens if you do telnet localhost 8561. That should
 give
  you the complete gmetad XML stream.
 
 
 thanks for the prompt reply.
 you meant 8651, rather than 8561?
 [EMAIL PROTECTED] ~]$ telnet localhost 8651
 Trying 127.0.0.1...
 Connected to localhost.
 Escape character is '^]'.
 
 seems to hang forever there.
 
 
   Is the rrdroot directory writable to the owner of the gmetad
  process? It should belong to e.g. nobody. This is a common
 mistake.
 
 
 yeah. it is writable.
 
 
  cheers
  Martin
  --- Ashutosh Mahajan [EMAIL PROTECTED] wrote:
 
  hello everyone,
 We are having problems installing ganglia version 3.0.4 with
  rrdtool-1.2.15.
  we can successfully do make, make install. gstat -a also seems to
  work.
  telnet localhost 8649 seems to throw out correct XML file.
 However,
  gmetad
  seems to be having some problems.
 
  telnet localhost 8652 seems to hang forever with the message:
  Trying 127.0.0.1...
  Connected to localhost.
  Escape character is '^]'.
 
  if i access ganglia through the web, i get this message after a
 long
 
  long time:
  There was an error collecting ganglia data (192.168.1.1:8652): XML
  error: no
  element found at 1
 
  rrd_rootdir also remains empty. what could be wrong? i can provide
  more
  details if necessary.
 
  thanks in advance.
 
 
 
 
 This message was sent using IMP, the Internet Messaging Program.
 
 

-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys - and earn
 cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Two similar linux hosts provides different metrics

2007-01-15 Thread Martin Knoblauch

Hi Vitaly,

 where do you see the invalid numbers:

a) in the gmond XML Stream (telnet/nc to the gmond XML port)
b) in the XML Stream from gmetad (telnet/nc to the gmetad XML port)
c) only in the web-frontend

Cheers
Martin
--- Vitaly Karasik [EMAIL PROTECTED] wrote:

 NON-BUSY HOST:
 # ps axl|wc
  61 8625865
 # uptime
  08:54:55  up 204 days,  2:00,  1 user,  load average: 0.00, 0.00,
 0.00
 
 BUSY HOST 
  ]# ps axl|wc
  62 8775977
  ]# uptime
  08:55:18  up 31 days, 16:30,  1 user,  load average: 0.04, 0.01,
 0.00
  
 
  -Original Message-
  From: Martin Knoblauch [mailto:[EMAIL PROTECTED] 
  Sent: Thursday, January 11, 2007 10:54 AM
  To: Vitaly Karasik; ganglia-general@lists.sourceforge.net
  Subject: Re: [Ganglia-general] Two similar linux hosts 
  provides different metrics
  
  Hi Vitaly,
  
   what does ps axl show on both hosts, as that is basically 
  what gmond looks at? If it is already different there, the 
  problem is not ganglia related. (OK, I see you already checked
 ...)
  
   What are the load averages according to uptime?
  
  Cheers
  Martin
  
  
  --- Vitaly Karasik [EMAIL PROTECTED] wrote:
  
 Hi,
   
   I have a weird problem - two linux hosts with similar
 configuration 
   provide very different metrics about  number of running processes
 - 
   one shows about 2, and second about 20-40 (I speak about 
  concentrated 
   load
   graph at top right.)
   proc_total is different too - 171 vs. 350 (BTW,  ps -ef |wc 
  == 61 on 
   both boxes)
   
   Both machines are RHEL3 kernel 2.4.21-37.ELsmp with
   ganglia-gmond-3.0.3-1 installed from RPM.
   
   Any ideas?
   Thanks,
   Vitaly
   

   
  
  --
  ---
   Take Surveys. Earn Cash. Influence the Future of IT Join 
   SourceForge.net's Techsay panel and you'll get the chance to
 share 
   your opinions on IT  business topics through brief surveys 
  - and earn 
   cash
  
  http://www.techsay.com/default.php?page=join.phpp=sourceforge
  CID=DEVDEV
   ___
   Ganglia-general mailing list
   Ganglia-general@lists.sourceforge.net
   https://lists.sourceforge.net/lists/listinfo/ganglia-general
   
   
  
  
  --
  Martin Knoblauch
  email: k n o b i AT knobisoft DOT de
  www:   http://www.knobisoft.de
  
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Two similar linux hosts provides different metrics

2007-01-11 Thread Martin Knoblauch

Hi Vitaly,

 what does ps axl show on both hosts, as that is basically what gmond
looks at? If it is already different there, the problem is not
ganglia related. (OK, I see you already checked ...)

 What are the load averages according to uptime?

Cheers
Martin


--- Vitaly Karasik [EMAIL PROTECTED] wrote:

   Hi,
 
 I have a weird problem - two linux hosts with similar configuration
 provide very different metrics about  number of running processes -
 one
 shows about 2, and second about 20-40 (I speak about concentrated
 load
 graph at top right.) 
 proc_total is different too - 171 vs. 350 (BTW,  ps -ef |wc == 61 on
 both boxes)
 
 Both machines are RHEL3 kernel 2.4.21-37.ELsmp with
 ganglia-gmond-3.0.3-1 installed from RPM.
 
 Any ideas?
 Thanks,
 Vitaly
 
  
 

-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys - and earn
 cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Windows port issues

2007-01-04 Thread Martin Knoblauch


--- Vladimir Vuksan [EMAIL PROTECTED] wrote:

 matt massie wrote:
  you need to install the cygwin sunrpc package which is not
 installed by
  default during the cygwin install...

 That was it.
 
 I still wasn't able to compile 3.0.4 (xdr_create? can't be find)  
 however 3.0.3 compiles with no problem.


 could you be more specific on the error message? Is it compile time,
or link time? There is no such thing as xdr_create. Maybe
xdrmem_create.
 
 Who is the person that packaged it initially since 3.0.3 corrects the
 
 Wait CPU issue ie. instead of showing 100% idle shows 100% Wait CPU.
 
 Also it may be nice to include gmetric.
 

 Hmm. What package are you refering to? There is no official windows
(cygwin) binary distribution.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Compatibility mode for gmetad?

2007-01-04 Thread Martin Knoblauch


--- Jason Faulkner [EMAIL PROTECTED] wrote:

 Martin Knoblauch wrote:
  --- Jason Faulkner [EMAIL PROTECTED] wrote:
 

  I'm curious about how possible or difficult it would be to make
  gmetad 
  backwards compatible -- i.e. where I could leave my 2.5.x gmond 
  installations alone, and install 3.x gmetad on my main server (and
 be
 
  able to collect stats despite having a heterogeneous 2.5.x and 3.x
 
  environment). This would allow me to (hopefully) live-migrate my
  ganglia 
  install up to the new version.
 
  -- 
  Jason Faulkner
  Systems Manager
  Broadwick Corporation
  (919) 459-2509
 
  
  Hi Jason,
 
   although we bumped the major number in the 2.5.x - 3.0
 transition, we
  took care to not introduce incompatible changes to the core metrics
  framework. In short, I see no reason why a 3.0.4 gmetad should not
 be
  able to query 2.5.x gmond data.
 
   It should even be possible to have a 3.0.4 gmond listen to older
  gmonds. Of course, you are limited to multicast until you have
 replaced
  all gmonds.

 Jan  3 23:12:07 intranet1 ./gmetad[25006]: RRD_update 
 (/var/lib/ganglia/rrds/Dev Login 
 Servers/__SummaryInfo__/part_max_used.rrd): illegal attempt to update
 
 using time 1167883927 when last update time is 1167883927 (minimum
 one 
 second step)
 
 I've been receiving repeated errors like this attempting to use a
 3.0.x 
 gmetad with a 2.5.7 gmond. The times are synced perfectly to a local
 NTP 
 server, so I'm sure that's not the issue.
 

 Not an NTP issue, you are most likely right. The message tells that
the current timestamp for the metrics in question did not change from
the previous invocation of the call. 

 Does this only happen on part_max_used, or are other metrics showing
up as well? part_max_used is likely changeing very slow, this might be
an indicator. also interesting to note that in your example the metrics
is not a host, but a summary metrics.

 Does it prevent useful operation of the 3.0.x gmetad together with
2.5.7 gmonds? Or is it just annoying?

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Windows port issues

2007-01-04 Thread Martin Knoblauch


--- Vladimir [EMAIL PROTECTED] wrote:

 Martin Knoblauch wrote:
   could you be more specific on the error message? Is it compile
 time,
  or link time? There is no such thing as xdr_create. Maybe
  xdrmem_create.
 Sorry I should have been more precise. It is a linking error. Here is
 
 the log
 
 gmond.o: In function `Ganglia_collection_group_send':
 /ganglia-3.0.4/gmond/gmond.c:1633: undefined reference to
 `_xdrmem_create'
 gmond.o: In function `main':
 /ganglia-3.0.4/gmond/gmond.c:897: undefined reference to
 `_xdrmem_create'
 /ganglia-3.0.4/gmond/gmond.c:828: undefined reference to
 `_xdr_free'
 /ganglia-3.0.4/gmond/gmond.c:912: undefined reference to
 `_xdr_free'
 ../lib/.libs/libganglia.a(libgmond.o): In function
 `Ganglia_gmetric_send':
 /ganglia-3.0.4/lib/libgmond.c:695: undefined reference to
 `_xdrmem_create'
 ../lib/.libs/libganglia.a(libgmond.o): In function
 `Ganglia_gmetric_send_spoof':
 /ganglia-3.0.4/lib/libgmond.c:748: undefined reference to
 `_xdrmem_create'
 ../lib/.libs/libganglia.a(protocol_xdr.o): In function
 `xdr_Ganglia_value_types':
 /ganglia-3.0.4/lib/protocol_xdr.c:13: undefined reference to
 `_xdr_enum'
 ../lib/.libs/libganglia.a(protocol_xdr.o): In function
 `xdr_Ganglia_gmetric_message':
 /ganglia-3.0.4/lib/protocol_xdr.c:23: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:25: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:27: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:29: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:31: undefined reference to
 `_xdr_u_int'
 /ganglia-3.0.4/lib/protocol_xdr.c:33: undefined reference to
 `_xdr_u_int'
 /ganglia-3.0.4/lib/protocol_xdr.c:35: undefined reference to
 `_xdr_u_int'
 ../lib/.libs/libganglia.a(protocol_xdr.o): In function
 `xdr_Ganglia_spoof_header':
 /ganglia-3.0.4/lib/protocol_xdr.c:45: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:47: undefined reference to
 `_xdr_string'
 ../lib/.libs/libganglia.a(protocol_xdr.o): In function
 `xdr_Ganglia_message_formats':
 /ganglia-3.0.4/lib/protocol_xdr.c:69: undefined reference to
 `_xdr_enum'
 ../lib/.libs/libganglia.a(protocol_xdr.o): In function
 `xdr_Ganglia_message':
 /ganglia-3.0.4/lib/protocol_xdr.c:116: undefined reference to
 `_xdr_u_int'
 /ganglia-3.0.4/lib/protocol_xdr.c:124: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:151: undefined reference to
 `_xdr_float'
 /ganglia-3.0.4/lib/protocol_xdr.c:156: undefined reference to
 `_xdr_double'
 /ganglia-3.0.4/lib/protocol_xdr.c:95: undefined reference to
 `_xdr_u_short'
 ../lib/.libs/libganglia.a(protocol_xdr.o): In function
 `xdr_Ganglia_25metric':
 /ganglia-3.0.4/lib/protocol_xdr.c:170: undefined reference to
 `_xdr_int'
 /ganglia-3.0.4/lib/protocol_xdr.c:172: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:174: undefined reference to
 `_xdr_int'
 /ganglia-3.0.4/lib/protocol_xdr.c:178: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:180: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:182: undefined reference to
 `_xdr_string'
 /ganglia-3.0.4/lib/protocol_xdr.c:184: undefined reference to
 `_xdr_int'
 collect2: ld returned 1 exit status
 make[3]: *** [gmond.exe] Error 1
 make[2]: *** [all-recursive] Error 1
 make[1]: *** [all-recursive] Error 1
 make: *** [all] Error 2
 

 OK, seems ld is unable to find all of the xdr functions. Maybe
someone removed a library from the library list. Although under Linux
those functions are in libc.

 
   Hmm. What package are you refering to? There is no official
 windows
  (cygwin) binary distribution.

 Perhaps it is unofficial but it is on SourceForge e.g.
 

http://downloads.sourceforge.net/ganglia/ganglia-3.0.0-setup.exe?modtime=1107790662big_mirror=0
 

 Ah. I forgot about this one. And I do not recall who donated the work.
I am adding the developers list. Apparently, the installer was never
updated after the initial release.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Compatibility mode for gmetad?

2007-01-03 Thread Martin Knoblauch


--- Jason Faulkner [EMAIL PROTECTED] wrote:

 I'm curious about how possible or difficult it would be to make
 gmetad 
 backwards compatible -- i.e. where I could leave my 2.5.x gmond 
 installations alone, and install 3.x gmetad on my main server (and be
 
 able to collect stats despite having a heterogeneous 2.5.x and 3.x 
 environment). This would allow me to (hopefully) live-migrate my
 ganglia 
 install up to the new version.
 
 -- 
 Jason Faulkner
 Systems Manager
 Broadwick Corporation
 (919) 459-2509
 
Hi Jason,

 although we bumped the major number in the 2.5.x - 3.0 transition, we
took care to not introduce incompatible changes to the core metrics
framework. In short, I see no reason why a 3.0.4 gmetad should not be
able to query 2.5.x gmond data.

 It should even be possible to have a 3.0.4 gmond listen to older
gmonds. Of course, you are limited to multicast until you have replaced
all gmonds.

 Just try it out.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia+OpenBSD?

2006-12-27 Thread Martin Knoblauch


--- Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote:

 On Tue, Dec 26, 2006 at 02:38:01PM -0500, Jason Faulkner wrote:
  Ooops -- sent first email directly to Martin instead of list.
  
  Martin Knoblauch wrote:
   Jason,
  
apparently configure fails to realize that you are on OpenBSD,
 which
   is not supported currently. The unknown part is telling.
  
 
  I thought that might be the case.
  
   In order to support OpenBSD one needs to fix the recognition
 process
   in configure and add OpenBSD-specific metrics code to
 libmetrics.
 
  I'm confused though, according to this page: 
  http://sourceforge.net/projects/ganglia/ ganglia runs on all
 openbsd 
  platforms. I was going on the, apparently false, presumption that
 this 
  meant the libmetrics code already existed for openbsd.
 
 not in 3.0.4, but I have a rough version that will be hopefully
 merged for
 3.0.5 and that so far compiles and works (not all metrics though) in
 the hosts
 i have to test:
 
   OpenBSD 3.7 (i386)
   OpenBSD 4.0 (i386 and amd64))
 
  IANAP, but if there's anything I can do to help get this working on
 
  OpenBSD, let me know.
 
 what versions/arch are you interested on?, would you be able to
 deploy test
 snapshots of ganglia on them?
 
 Carlo
 
Carlo,

 I see no problem to add OpenBSD support in 3.0.5. Just go on and check
it in once you are satisfied with your stuff.

 Just out of curiosity: how similar are the BSD flavours? We already
have NetBSD and FreeBSD support in.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] gmond problem on SLES 10 x64 with floats

2006-12-27 Thread Martin Knoblauch

Hi Ludovic,

 do you happen to have some stange/unusual setting of your locale
(LANG variable and friends) when you start the gmond executable?

 The output definitely looks broken. Could you please file a bug on
bugzilla?

Cheers
Martin
--- Ludovic Drolez [EMAIL PROTECTED] wrote:

 Hi !
 
 I installed the official Ganglian RPM on a SLES 10 x64. My graphs are
 really 
 strange, and the percentage values show random characters. I've just
 found 
 that the problem is in gmond, which sends random strings in the XML
 dialog. 
 I've tried to recompile gmond, but I have still the same problem.
 
 Here's some of the strace output:
 
 =
 accept(6, {sa_family=AF_INET, sin_port=htons(43998), 
 sin_addr=inet_addr(127.0.0.1)}, [17179869200]) = 9
 write(9, ?xml version=\1.0\ encoding=\ISO-8859-1\ 
 standalone=\yes\?\n!DOCTYPE GANGLIA_XML [\n   !ELEMENT G...,
 2328) = 2328
 write(9, GANGLIA_XML VERSION=\3.0.3\ SOURCE=\gmond\\n, 45) =
 45
 write(9, CLUSTER NAME=\cluster\ LOCALTIME=\1166087533\ 
 OWNER=\unspecified\ LATLONG=\unspecified\ URL=\unspe..., 108) =
 108
 write(9, HOST NAME=\master.localdomain\ IP=\192.168.0.106\ 
 REPORTED=\1166087527\ TN=\5\ TMAX=\20\ DMAX=\0\ ..., 150) =
 150
 write(9, METRIC NAME=\disk_total\ VAL=\1A.\332\326\260\
 TYPE=\double\ 
 UNITS=\GB\ TN=\1500\ TMAX=\1200\ DMAX=\0\ SLOP..., 125) =
 125
 write(9, METRIC NAME=\cpu_speed\ VAL=\2993\ TYPE=\uint32\ 
 UNITS=\MHz\ TN=\300\ TMAX=\1200\ DMAX=\0\ SLOPE=\..., 122)
 = 122
 write(9, METRIC NAME=\part_max_used\ VAL=\7y.\n\ TYPE=\float\
 
 UNITS=\\ TN=\60\ TMAX=\180\ DMAX=\0\ SLOPE=\bo..., 120) =
 120
 write(9, METRIC NAME=\swap_total\ VAL=\4194296\ TYPE=\uint32\
 
 UNITS=\KB\ TN=\300\ TMAX=\1200\ DMAX=\0\ SLOP..., 125) = 125
 write(9, METRIC NAME=\os_name\ VAL=\Linux\ TYPE=\string\
 UNITS=\\ 
 TN=\300\ TMAX=\1200\ DMAX=\0\ SLOPE=\zero..., 118) = 118
 write(9, METRIC NAME=\cpu_user\ VAL=\2.F\ TYPE=\float\
 UNITS=\%\ 
 TN=\20\ TMAX=\90\ DMAX=\0\ SLOPE=\both\ SO..., 114) = 114
 write(9, METRIC NAME=\cpu_system\ VAL=\3.0\ TYPE=\float\
 UNITS=\%\ 
 TN=\20\ TMAX=\90\ DMAX=\0\ SLOPE=\both\ ..., 116) = 116
 =
 
 As you can see, there's garbage for disk_total, part_max_used,
 cpu_user...
 So all values of type float or double, are not properly converted.
 The SLES runs under Qemu.
 
 I've also added some printfs in the host_metric_value and here's what
 I get:
 On the left the float converted by apr_* and on the right the
 prinf(%f) !!!
 
 VALUE =2.G= =2.343750=
 VALUE =2.G= =2.343750=
 VALUE =9.Ö= =93.487236=
 VALUE =0.6o= =0.64=
 VALUE =0.1;= =0.119600=
 VALUE =0.00= =0.000311=
 VALUE =0.0= =0.00=
 VALUE =0.0= =0.00=
 VALUE =9.ê= =95.312500=
 VALUE =0.9= =0.94=
 VALUE =0.4Y= =0.42=
 VALUE =0.1;= =0.113054=
 VALUE =0.00= =0.000536=
 
 
 Any ideas ?
 
 Cheers,
 
 -- 
 Ludovic DROLEZ  Linbox / FreeALter Soft
 www.linbox.com www.linbox.org
 
 
-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys - and earn
 cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] [Ganglia-developers] Correct counting of CPUs, Cores, Siblings (bz #84)

2006-12-27 Thread Martin Knoblauch

Hi Jarod,

 thanks. Your and Bens input were really useful for detecting patterns
in 2.6 based configurations.

 What I now need is the output from 2.4 based configs. Only multi-core
and/or HT-enabled systems actually.

Thanks and have a Godd new Year 2007
Martin
--- Jarod Wilson [EMAIL PROTECTED] wrote:

 On Friday 22 December 2006 11:05, Martin Knoblauch wrote:
  Hi Folks,
 
   in order to fix bz#84 for Linux, I would like to collect some data
  from different system configurations. Could you please create the
 file
  cpu.grep and execute the cat/grep chain below.
 
   Please report the results together with uname -a output which
 distro
  you are running.
 
  # more cpu.grep
  processor
  vendor
  model name
  physical id
  siblings
  core id
  cpu cores
  # cat /proc/cpuinfo  | grep -f cpu.grep
 
 Here's the data from my Fedora Core 6 workstation in the office,
 since its 
 fairly interesting for this specific topic. Its a dual-socket,
 dual-core Xeon 
 system with hyperthreading turned on, so two sockets, four cores,
 eight 
 logical cpus...
 
 Linux xavier.boston.redhat.com 2.6.18-1.2849.fc6 #1 SMP Fri Nov 10
 12:34:46 
 EST 2006 x86_64 x86_64 x86_64 GNU/Linux
 
 processor   : 0
 vendor_id   : GenuineIntel
 model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
 physical id : 0
 siblings: 4
 core id : 0
 cpu cores   : 2
 processor   : 1
 vendor_id   : GenuineIntel
 model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
 physical id : 1
 siblings: 4
 core id : 0
 cpu cores   : 2
 processor   : 2
 vendor_id   : GenuineIntel
 model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
 physical id : 0
 siblings: 4
 core id : 1
 cpu cores   : 2
 processor   : 3
 vendor_id   : GenuineIntel
 model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
 physical id : 1
 siblings: 4
 core id : 1
 cpu cores   : 2
 processor   : 4
 vendor_id   : GenuineIntel
 model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
 physical id : 0
 siblings: 4
 core id : 0
 cpu cores   : 2
 processor   : 5
 vendor_id   : GenuineIntel
 model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
 physical id : 1
 siblings: 4
 core id : 0
 cpu cores   : 2
 processor   : 6
 vendor_id   : GenuineIntel
 model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
 physical id : 0
 siblings: 4
 core id : 1
 cpu cores   : 2
 processor   : 7
 vendor_id   : GenuineIntel
 model name  :   Intel(R) Xeon(TM) CPU 3.00GHz
 physical id : 1
 siblings: 4
 core id : 1
 cpu cores   : 2
 
 
 -- 
 Jarod Wilson
 [EMAIL PROTECTED]
 
-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys - and earn
 cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
 Ganglia-developers mailing list
 [EMAIL PROTECTED]
 https://lists.sourceforge.net/lists/listinfo/ganglia-developers
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia+OpenBSD?

2006-12-27 Thread Martin Knoblauch


--- Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote:

 On Wed, Dec 27, 2006 at 12:38:00AM -0800, Martin Knoblauch wrote:
 
   I see no problem to add OpenBSD support in 3.0.5. Just go on and
 check
  it in once you are satisfied with your stuff.
 
 checked it in already in revision 697.


 saw it.
 
 
   Just out of curiosity: how similar are the BSD flavours? We
 already
  have NetBSD and FreeBSD support in.
 
 I used NetBSD as a base from my port (as it is the closest), sadly
 they are not that similar as to just work with the other source
 as you can see by the diff.


 Understand. Btw. you should check the use of the strings NetBSD /
FreeBSD in you patch :-)

 DragonflyBSD will be most likely closer to FreeBSD and the same for
 MacOS X (AKA Darwin), but I have no interest on adding those yet
 (DragonFlyBSD could be an interesting option for clusters, but
 I'd heard of no one using it in a cluster yet).
 

 You realize that we already have a Darwin port, although I do not know
the quality/completeness of the metrics code.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] [Ganglia-developers] Ganglia 3.0.4 released

2006-12-26 Thread Martin Knoblauch


--- Carlo Marcelo Arenas Belon [EMAIL PROTECTED] wrote:

 On Mon, Dec 25, 2006 at 02:32:30AM -0800, Martin Knoblauch wrote:
  Ho ho ho,
  
  Santa just released version 3.0.4 of Ganglia. This is mainly a
 bugfix
  release. See the ChangeLog in the tarball for a complete list of
  changes.
 
 thanks Santa, and I got to be the first kid that went to the
 sourceforge
 tree for the nicely wrapped package :) which was far nicer than that
 Wii that
 Matt is probably still waiting to get a hold of.
 
 since I was running tests on the last SVN anyway, I got some more
 platforms
 where gmond/gmetric (and therefore libmetrics) were tested (*):
 
 * Gentoo Linux 2006.1 (amd64), Fedora Core 6 (i386)
 * Solaris 9 (sparc), Solaris 10 (i386, amd64 and sparc)
 * NetBSD 2.0.2 (i386), NetBSD 3.0 (i386), NetBSD 3.1 (i386, amd64)
 * FreeBSD 6.1 (amd64)
 
Hi Carlo,

 thanks for the feedback. Could you just tell us which toolchains were
used on the non-Linux platforms? Especially which compiler?

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia+OpenBSD?

2006-12-26 Thread Martin Knoblauch

Jason,

 apparently configure fails to realize that you are on OpenBSD, which
is not supported currently. The unknown part is telling.

 In order to support OpenBSD one needs to fix the recognition process
in configure and add OpenBSD-specific metrics code to libmetrics.

 So I am afraid that it is not as easy as you believe.

 Btw. what is the output of config/config.guess?

Cheers
Martin

--- Jason Faulkner [EMAIL PROTECTED] wrote:

 Anybody have even a direction to point me in? I'm at my wits end.
 
 Jason Faulkner wrote:
  I've been trying all morning (about 5 hours now, heh) to get
 Ganglia 
  3.0.3 to compile on OpenBSD to no avail.
 
  Here's the error it spits at me:
 
  ./configure --prefix=/opt ran without a hitch, but when I said
 make...
 
  /bin/sh ../libtool --tag=CC --mode=link /usr/bin/gcc -I.. -I. 
  -I../srclib/expat/lib/ -I../srclib/libmetrics/
 -I../srclib/apr/include/ 
  -I../srclib/apr/include/arch/unix/ -I../srclib/confuse/src -g -O2 
  -Wall-o libganglia.la -rpath /opt/lib -version-info 0:0:0 
 -release 
  3.0.3  -export-dynamic become_a_nobody.lo debug_msg.lo 
 daemon_init.lo 
  file.lo dotconf.lo error.lo ganglia.lo hash.lo  inetaddr.lo
 llist.lo 
  my_inet_ntop.lo rdwr.lo readdir.lo tcp.lo  protocol_xdr.lo
 apr_net.lo 
  libgmond.lo  -lkvm -lresolv -lpthread
 
  *** Warning: linker path does not have real file for library
 -lresolv.
  *** I have the capability to make that library automatically link
 in when
  *** you link to this library.  But I can only do this if you have a
  *** shared version of the library, which you do not appear to have
  *** because I did check the linker path looking for a file starting
  *** with libresolv and none of the candidates passed a file format
 test
  *** using a regex pattern. Last file checked: /usr/lib//libresolv.a
  *** The inter-library dependencies that have been dropped here will
 be
  *** automatically added whenever a program is linked with this
 library
  *** or is declared to -dlopen it.
  /usr/bin/gcc -shared  -fPIC -DPIC -o .libs/libganglia-3.0.3.so.0.0 
 
  .libs/become_a_nobody.o .libs/debug_msg.o .libs/daemon_init.o 
  .libs/file.o .libs/dotconf.o .libs/error.o .libs/ganglia.o
 .libs/hash.o 
  .libs/inetaddr.o .libs/llist.o .libs/my_inet_ntop.o .libs/rdwr.o 
  .libs/readdir.o .libs/tcp.o .libs/protocol_xdr.o .libs/apr_net.o 
  .libs/libgmond.o  -lkvm -lpthread
  (cd .libs  rm -f libganglia.so.0.0  ln -s
 libganglia-3.0.3.so.0.0 
  libganglia.so.0.0)
  ar cru .libs/libganglia.a  become_a_nobody.o debug_msg.o
 daemon_init.o 
  file.o dotconf.o error.o ganglia.o hash.o inetaddr.o llist.o 
  my_inet_ntop.o rdwr.o readdir.o tcp.o protocol_xdr.o apr_net.o
 libgmond.o
  ranlib .libs/libganglia.a
  creating libganglia.la
  (cd .libs  rm -f libganglia.la  ln -s ../libganglia.la
 libganglia.la)
  Making all in srclib
  Making all in libmetrics
  make  all-recursive
  Making all in unknown
  /bin/sh: cd: /usr/src/ganglia-3.0.3/srclib/libmetrics/unknown - No
 such 
  file or directory
  *** Error code 1
 
  Stop in /usr/src/ganglia-3.0.3/srclib/libmetrics (line 342 of
 Makefile).
  *** Error code 1
 
  Stop in /usr/src/ganglia-3.0.3/srclib/libmetrics (line 204 of
 Makefile).
  *** Error code 1
 
  Stop in /usr/src/ganglia-3.0.3/srclib (line 243 of Makefile).
  *** Error code 1
 
  Stop in /usr/src/ganglia-3.0.3 (line 332 of Makefile).
  *** Error code 1
 
  Stop in /usr/src/ganglia-3.0.3 (line 214 of Makefile).
 
 
 
 
  This is on OpenBSD 3.8.
 

 
 
 -- 
 Jason Faulkner
 Systems Manager
 Broadwick Corporation
 (919) 459-2509
 
 

-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys - and earn
 cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia+OpenBSD?

2006-12-26 Thread Martin Knoblauch


--- Jason Faulkner [EMAIL PROTECTED] wrote:

 
  http://j.oldos.org/configguess.txt

 I feel less than smart.
 
 You wanted this, didn't you:


:-)
 
 [EMAIL PROTECTED]:/usr/src/ganglia-3.0.3/config$ ./config.guess
 i386-unknown-openbsd3.8
 

 guess this explains the unknown. But from the other follow-ups there
seems to be hope for you.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

[Ganglia-general] New/Last Snapshot for 3.0.4

2006-09-24 Thread Martin Knoblauch

Hi,

 please have a look at the 2nd 3.0.4 snapshot located at:

http://www.knobisoft.de/ganglia/ganglia-3.0.4.200609241751.tar.gz

 This snapshot contains the following changes compared to the last one:

- fixup of the corrupted JPG images
- move libmetrics to top-level in order to prepare removal of
external sources in 3.1
- fix a stray debug message going to STDOUT instead of SDTERR
- fix two stupid HP-UX syntax errors reported ages ago

 The full list of Changes is in the ChangeLog. There has not been a lot
of feedback since the first snapshot. If nothing serious comes out
during the next week, I will push out 3.0.4.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Problem with metrics

2006-09-20 Thread Martin Knoblauch



--- Ben Hartshorne [EMAIL PROTECTED] wrote:

 On Tue, Sep 19, 2006 at 03:11:26PM +0200, Rafal Masztalerz wrote:
  Hi
  I added some new metrics for my ganglia software using the
 gmetric
  command.  When I run the webpage without parameters :
  http://computer/ganglia/ everything seems to be ok and I can choose
 my
  new metrics.
  
  But when I try to do other things on this page, for expample, when
 I
  choose some metric  (bytes_out) then there are no my new metrics
 on
  the new/refreshed page.
 

http://computer/ganglia/?m=bytes_outr=hours=descendingc=comph=sh=1hc=4
  
 
 Rafael,
 
 Be careful that your metric only sends numbers.  In some versions of
 ganglia, if your script that reports the gmetric accidentally sends
 letters instead, Bad Things(tm) happen.  I wrote a script to parse
 the
 output of 'who' to count the number of logged in users, but I did it
 wrong.  Occasionally it got a word instead of a number.  This caused
 unexplained metric-loss throughout my gangila installation.  
 
 A newer version of gmetric fixed this problem, but it is a good place
 to -ben
 
 -- 
 Ben Hartshorne
 email: [EMAIL PROTECTED]
 http://ben.hartshorne.net
 
-
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to
 share your
 opinions on IT  business topics through brief surveys -- and earn
 cash

http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 

 start looking.
 
 I'm sorry, but I don't remember what versions are affected.
 

 The fix for the gmetric bug went in on 25-Jan-2006. So, it should be
in 3.0.3.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] First Snapshot for 3.0.4

2006-08-28 Thread Martin Knoblauch

--- Bernard Li [EMAIL PROTECTED] wrote:

   It is the first release after moving from CVS to SVN. 
  Changes compared
  to 3.0.3 are:
  
  - Fix bz #110 by allowing higher sampling rates for 
  cpu/net/load/mem in
  Linux/Cygwin. Likely needs similar changes in other platforms.
  - Add Yemis Host-Spoofing patch (bz #99)
  - Fix bz #77 (Diskless NFS Root not treated correctly)
  - Compile fixes for IRIX (bz #73/79)
  - Fix locking problems in gmetad (bz #56)
  - Fix incorrect writing of RRDs (bz #105)
  - Increases the number of rows in newly created RRAs (bz #33)
  - Better handling of bonding interfaces in Linux (bz #102/104)
  - Fix for network metrics overrun by Andreas Schoenfeld in AIX
  - SVN related cleanups in distribution targets
  - Take some of the proposed AIX changes from Micheal Perzl. The
 real
  stuff will come in 3.1.x
 
 I would also add:
 
 - Better RPM support for SUSE Linux 10.0/10.1 x86 and x86_64
 
 Cheers,
 
 Bernard
 

 Oops. Sorry. Yes, the list is not neccessarily complete. I should also
have mentioned the generated ChangeLog, which gives some more info.

Martin


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] monitoring

2006-08-25 Thread Martin Knoblauch


 Nagios?

Cheers
Martin

--- Dirk Roessler [EMAIL PROTECTED] wrote:

 Does someone knows an easy to install and easy to use solution for 
 monitoring and sending email notifications of down nodes and health 
 state on a Linux HPC cluster?
 
 Dirk
  begin:vcard
 fn;quoted-printable:Dirk R=C3=B6=C3=9Fler
 n;quoted-printable:R=C3=B6=C3=9Fler;Dirk
 org:_University of Potsdam;Department of Geosciences
 adr:;;K.-Liebknecht-Str. 24/25;Golm/Potsdam;;14476;Germany
 email;internet:[EMAIL PROTECTED]
 title:Geophysicist
 tel;work:+49 331 977 5795
 tel;fax:+49 331 977 5700
 x-mozilla-html:FALSE
 url:http://www.geo.uni-potsdam.de/mitarbeiter/Roessler/roessler.html
 version:2.1
 end:vcard
 
 
-
 Using Tomcat but need to do more? Need to support web services,
 security?
 Get stuff done quickly with pre-integrated technology to make your
 job easier
 Download IBM WebSphere Application Server v.1.0.1 based on Apache
 Geronimo

http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia scaling testing?

2006-08-25 Thread Martin Knoblauch

-integrated technology to make 
  your job easier
  Download IBM WebSphere Application Server v.1.0.1 based on 
  Apache Geronimo
  http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057;
  dat=121642
  ___
  Ganglia-general mailing list
  Ganglia-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/ganglia-general
  
 

-
 Using Tomcat but need to do more? Need to support web services,
 security?
 Get stuff done quickly with pre-integrated technology to make your
 job easier
 Download IBM WebSphere Application Server v.1.0.1 based on Apache
 Geronimo

http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Obtaining Immediate Interval Data From Ganglia

2006-08-11 Thread Martin Knoblauch

Correct. Below code limits the sampling rate for the cpu*, load*, mem*
and net* graphs. Setting them to 0 will give you 1 second accuracy.
Or nice furry graphs as Richard said (actually the furriness is
what the original authors wanted to prevent :-). Personally I doubt
that sampling load* and mem* at that rate. cpu* and net* may make
sense.

 Richard, yes please file a report. Unfortunatelly I spoke to soon when
I mentioned that we should get rid of the intervalls at all. Reason is
that we need to compute differences for the cpu* and net* metrics (they
are rates after all). If we want to have sub-second sampling rates, we
need to use getimeofday instead of time.

--- [EMAIL PROTECTED] wrote:

 If you do want to do fast polling on the Linux or cygwin gmond, I
 found
 some hardwired code in there which effectively limits the polling
 rate
 for
 some metrics no matter what you put in the config files. (Sorry
 martin,
 have not raised a bug report yet). Anyway:
  the code below is in the cygwin and linux metric.c files.
  
  
  typedef struct {
uint32_t last_read;
uint32_t thresh;
char *name;
char buffer[BUFFSIZE];
  } timely_file;
  
  timely_file proc_stat= { 0, 15, /proc/stat };
  timely_file proc_loadavg = { 0, 15, /proc/loadavg };
  timely_file proc_meminfo = { 0, 30, /proc/meminfo };
  timely_file proc_net_dev = { 0, 30, /proc/net/dev };
  
  char *update_file(timely_file *tf)
  {
int now,rval;
now = time(0);
if(now - tf-last_read  tf-thresh) {
  rval = slurpfile(tf-name, tf-buffer, BUFFSIZE);
  if(rval == SYNAPSE_FAILURE) {
err_msg(update_file() got an error from slurpfile() reading 
  %s,
tf-name);
return (char *)SYNAPSE_FAILURE;
  }
  else tf-last_read = now;
}
return tf-buffer;
  }
  
 
 I have set those timeout values zero, which works well and gives
 me nice spiky furry graphs.
 
 - richard


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Troubles linking: Linux (SUSE 9.3) on Itanium (ia64, Altix)

2006-08-09 Thread Martin Knoblauch

On a RedHat-ish distro you would need to check that the RPMs for libpng
*and* libpng-devel are installed. Not sure about SuSE though.

Martin

--- Ryurick Marius Hristev [EMAIL PROTECTED] wrote:

 Hello,
 
 I was trying to compile the ganglia package (rpm version) on the
 following system:
 
 SuSE 9.3 (Linux) running on Itaniums (ia64, SGI Altix )
 
  and I am getting this error:
 
 gcc -O0 -I../lib -I../gmond -I../srclib/expat/lib/ -g -O2 -Wall
 -D_REENTRANT -o gmetad gmetad.o cmdline.o data_thread.o server.o
 process_xml.o rrd_helpers.o conf.o type_hash.o xml_hash.o cleanup.o 
 ../lib/.libs/libganglia.a /usr/lib/librrd.a -lpng -lz -lm
 ../srclib/expat/lib/.libs/libexpat.a -ldl -lresolv -lnsl -lpthread
 

/usr/lib/gcc-lib/ia64-suse-linux/3.3.3/../../../../ia64-suse-linux/bin/ld:
 cannot find -lpng
 
 but I do have a /usr/lib/libpng.so.3
 
 Are there any known quirks with respect to my OS/Distro and
 CPU/Machine ? (I am new to this one, apologies if I missed something
 obvious).
 
 TIA
 
 Cheers,
 -- 
 Ryurick M. Hristev -- Systems Administrator (Unix)
 University of Queensland -- ITS Dept.
 mailto: [EMAIL PROTECTED]
 the greatest hacking experience: hack your own mind -- me
 
 

-
 Using Tomcat but need to do more? Need to support web services,
 security?
 Get stuff done quickly with pre-integrated technology to make your
 job easier
 Download IBM WebSphere Application Server v.1.0.1 based on Apache
 Geronimo

http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] changed ip

2006-07-24 Thread Martin Knoblauch

Hi Toney,

 my first guess would be that you are:

a) using multicast and
b) your default gateway goes via eth0
c) your compute nodes are on the 192.168.180.x network

 After the change the MC packets are still expected via eth0, but come
in from eth1.

 Try adding this from the documentation:

mcast_if=eth1 in your headnodes gmond.conf and

route add -host 239.2.11.71 dev eth1

Hope this helps
Martin

--- toney samuel [EMAIL PROTECTED] wrote:

 I have a 4 node cluster. my head node has got two gigabit card and
 infiniband card my cluster ip is
 
  eth0  192.168.180.17/255.255.252.0
 ipoib0 192.168.0.1/255.255.255.0
 
 I have installed ganglia with this configuration. ganglia was working
 properly.
 
 later i changed my network configuration to this
 
 eth0  192.168.1.1/255.255.255.0
 eth1  192.168.180.17/255.255.252.0
 ipoib0 192.168.0.1/255.255.255.0
 
 
 Now i can't see any information in my web page
 
 Pls guide how to resolve this issue.
 
 Regards.


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] not showing all hosts

2006-07-13 Thread Martin Knoblauch



--- Ian Cunningham [EMAIL PROTECTED] wrote:

 
 Solution B:
 increase the Time To Live or ttl on the gmond multicast packets.
 This assumes that multicast packets can get from one vlan to the
 other.

 The configuration option used to be available in the 2.x codebase,
 but I don't see it in 3.0.x code. I think it would be mcast_ttl
 but I can't say if that will work or not.
 

 it is ttl in the udp_send_channel section. It will be used, if
mcast_join is set.

Cheers
Martin

--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia History

2006-06-09 Thread Martin Knoblauch

Adam,

 do you still have those error messages? And: which version of the
web-frontend are you using? We fixed quite a few of the php messages in
3.0.3.

Martin

--- Adam Brust [EMAIL PROTECTED] wrote:

 At the beginning of the month, ganglia/php were producing massive 
 amounts of httpd errors which filled up my / partition causing the 
 machine to crash... since then, I believe my ganglia history had been
 
 effected... I tried to restore from the three tar files located in 
 /var/lib/ganglia/archives/  and each one only had about a weeks worth
 of 
 history... I was able to restore from an earlier backup, which has my
 
 previous history, although now I am missing roughly these last three 
 weeks.  Also, I'm not certain if the problem is corrected now... I
 don't 
 know if I'll lose this history again upon a reboot.
 
 -adam
 
 Martin Knoblauch wrote:
 
 Adam,
 
  that sounds OK. Do you see any messages in either /var/log/messages
 or
 in your webservers log files?
 
 Martin
 
 --- Adam Brust [EMAIL PROTECTED] wrote:
 
   
 
 Ian,  Thanks for your reply.
 
 My rrd files appear to in the default /var/lib/ganglia directory, I
 
 could not find any other instances of them.  gmetad is running as 
 nobody and the rrds are owned by nobody... do you know if
 that's
 the 
 correct user/permissions?
 
 thanks,
 
 adam
 
 Ian Cunningham wrote:
 
 
 
 Look at where gmetad is storing the rrd files now. You can find it
   
 
 in 
 
 
 your gmetad.conf under rrd_rootdir. Maybe you didn't specify it
 for
   
 
 
 
 --
 Martin Knoblauch
 email: k n o b i AT knobisoft DOT de
 www:   http://www.knobisoft.de
 
 
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
   
 
 
 
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] New issue with hosts reporting

2006-06-06 Thread Martin Knoblauch

Hi Mark,

 you have configured a tcp_accept_channel for each of your two clusters
master gmonds?

 Then you may need to define an acl for your gmetad server. Something
like:

tcp_accept_channel {
 port = 8649 
 acl {
default = deny
access {
  ip = ip-of-the-gmetad-server
  mask = 32
  action = allow
}
  }
}

Cheers
Martin

--- Mark Haney [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 David Zaltron wrote:
   Probably you have a gmond configuration on each node that muticast
 the
  cluster status to every node.
 
  For example, if you have a configuration like this in the nodes:
 
  -
  cluster {
 name = dummy_cluster
  }
 
  udp_send_channel {
 mcast_join = 239.2.11.71
 port = 8649
  }
 
  udp_recv_channel {
 mcast_join = 239.2.11.71
 port = 8649
 bind = 239.2.11.71
  }
  
 
  This means that every node know to belong to the dummy_cluster,
 and
  every gmond can return the status of the entire cluster because it
 knows
  about every each other node (talking in the same multicast channel
 with
  each other) if telled at the default 8649 TCP port.
 
  You can find the solution unicasting the traffic between the node 
 itself:
 
  
  udp_send_channel {
 host = hostname of 127.0.0.1
 port = 8649
  }
 
  udp_recv_channel {
 port = 8649
  }
  ---
 
  In this way you can simulate a cluster of a single node,
 monitoring in
  reality the single node.
 
 Okay, I did that and that /sort of/ fixed it, except for now I do not
 see the nodes in my web interface.  Keep in mind the web interface is
 running on a completely separate box that's not either newton or
 winterstar.  So, how do I get the node showing up in the web
 interface now?
 
 (And David, I apologize for sending to you and not the list, my
 fingers
 got ahead of me today.)
 
 
 
 
 - --
 Fere libenter homines id quod volunt credunt.
 
 Mark Haney
 Sr. Systems Administrator
 ERC Broadband
 (828) 350-2415
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.2.2 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 
 iD8DBQFEhDXZYQhnfRtc0AIRAj07AJwNaTsNHM02oJaznXnO0qECZEPZUwCfa6JR
 0rLX5KWkRW9MjL/5/J/Igj0=
 =iIJp
 -END PGP SIGNATURE-
 
 
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia History

2006-06-06 Thread Martin Knoblauch

Adam,

 that is unexpected. The RRDs are supposed to keep one year (the
default) of history.

Martin

--- Adam Brust [EMAIL PROTECTED] wrote:

 I recently had to reboot the Front End of my cluster... upon the
 reboot, 
 my Ganglia history is gone... Gangila is only keeping data from the
 time 
 of the reboot... it was nearly a years worth of history... can anyone
 
 offer any suggestions?
 
 thanks,
 
 -adam
 
 
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

Re: [Ganglia-general] Ganglia 3.0.3 compilation on AIX 5.2

2006-05-23 Thread Martin Knoblauch

Hi Knut,

 there is supposed to be a README.AIX file in the 3.0.3 distribution.
This explains a few things.

 Basically, building with xlc is not supported. There are a few hints
on how to do it under 2.)

 And you absolutely need to build non-shared. That is where most likely
your core-dump comes from. Explained under 1)

Cheers
Martin

--- Knut Hellebï¿½ [EMAIL PROTECTED] wrote:

 Regards,
 
 I'm trying to compile Ganglia 3.0.3 on an AIX 5.2 box using the
 native
 IBM compiler and have encountered two problems compiling and one
 fatal
 when running gmond.
 
 Compilation problems:
 
 1. The compilation breaks on the file ./srclib/confuse/src/lexer.c at
 line 786 which stems from the lex file lexer.l line 82:
 
 #line 82 lexer.l
 cfg-line++; /* keep track of line number */
  YY_BREAK
 
 saying undeclared identifier cfg. I put in a cfg_t *cfg;
 declaration
 in line 696 and then the compilation proceeds.
 
 2. Also, I need to use the -qcpluscmt switch allowing C++ comment
 style or else the compilation bombs in gmond.c
 
 3. Running gmond always crashes with a SIGSEGV. The trace shows that
 the
 crash occurs when opening the /etc/gmond.conf file. A dbx session on
 the
 core file shows the crash seems to be related to the parser file
 fix i
 did in section 1. above. Here's the backtrace:
 
 (dbx) where
 cfg_yylex() at 0x1000af28
 cfg_parse_internal() at 0x1000821c
 cfg_parse_fp() at 0x1000a5a0
 cfg_parse() at 0x1000a684
 Ganglia_gmond_config_create() at 0x10006d58
 process_configuration_file() at 0x100036dc
 main() at 0x14b4
 
 What's up here ?
 -- 
 
   
 **
* Knut Hellebï¿½ | DAMN GOOD
COFFEE
 !! *
* Hydro IS Partner ESI (Unix) Team | (and hot too)
   *
*  |  
   *
* E-mail: [EMAIL PROTECTED]   | Dale Cooper, FBI 
   *
   
 **
 
  
 

***
 NOTICE: This e-mail transmission, and any documents, files or
 previous
 e-mail messages attached to it, may contain confidential or
 privileged
 information. If you are not the intended recipient, or a person
 responsible for delivering it to the intended recipient, you are
 hereby notified that any disclosure, copying, distribution or use of
 any of the information contained in or attached to this message is
 STRICTLY PROHIBITED. If you have received this transmission in error,
 please immediately notify the sender and delete the e-mail and
 attached
 documents. Thank you.

***
 
 


--
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de

1 2 >

1 - 100 of 164 matches

Mail list logo