[Ganglia-general] Monitoring Linux services

2016-12-16 Thread Peter Phaal
Hi All,

For anyone interesting in monitoring Linux services, the latest Host sFlow
release can automatically track and monitor services running under systemd:
http://blog.sflow.com/2016/12/monitoring-linux-services.html

Ganglia already includes support for the sFlow metrics:
http://blog.sflow.com/2016/12/using-ganglia-to-monitor-linux-services.html

Peter
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] ganglia web to monitor apache servers?

2015-12-31 Thread Peter Phaal
You could use a combination of Host sFlow and mod-sflow on your Apache
web servers:
http://www.sflow.net/
https://github.com/sflow/mod-sflow

The following article describes how to configure the head-end gmond:
http://blog.sflow.com/2011/12/using-ganglia-to-monitor-web-farms.html

mod-sflow also exports Apache worker pool stats to Ganglia:
http://blog.sflow.com/2012/10/thread-pools.html

mod-sflow also exports URL, referrer, user-agent, response time and
status code information that you can use to derive metrics for each
web service. You could use sFlow-RT to calculate the derived metrics
and proxy them to gmetad:
http://blog.sflow.com/2015/12/using-proxy-to-feed-metrics-into-ganglia.html

On Thu, Dec 31, 2015 at 10:40 AM, Aaron  wrote:
> Hi, I would like to monitor linux apache servers where the apache servers
> would have gmond running, and the stats would be reported back to the
> ganglia server running gmetad and ganglia web to be displayed in a graph.
> Is there a php or python script to do this?  Any recommendations?
>
> Thanks, Aaron
>
> --
>
> ___
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>

--
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] ganglia web to monitor apache servers?

2015-12-31 Thread Peter Phaal
Vladimir's blog has a solution that involves tailing the Apache log files:
http://vuksan.com/linux/ganglia/#Apache_Traffic_Stats

The sFlow protocol packs a large number of metrics in each UDP
datagram, so you should see a reduction in UDP datagrams per second
associated with monitoring. The C based mod-sflow / host-sflow agents
have a small memory and CPU footprint.

On Thu, Dec 31, 2015 at 3:14 PM, Aaron <hawaiiaa...@gmail.com> wrote:
> Thanks Peter.  Is there a way to use more a pure ganglia solution?  Will
> sflow generate more udp traffic and/or cpu cycles?
>
> On Thu, Dec 31, 2015 at 12:05 PM, Peter Phaal <peter.ph...@gmail.com> wrote:
>>
>> You could use a combination of Host sFlow and mod-sflow on your Apache
>> web servers:
>> http://www.sflow.net/
>> https://github.com/sflow/mod-sflow
>>
>> The following article describes how to configure the head-end gmond:
>> http://blog.sflow.com/2011/12/using-ganglia-to-monitor-web-farms.html
>>
>> mod-sflow also exports Apache worker pool stats to Ganglia:
>> http://blog.sflow.com/2012/10/thread-pools.html
>>
>> mod-sflow also exports URL, referrer, user-agent, response time and
>> status code information that you can use to derive metrics for each
>> web service. You could use sFlow-RT to calculate the derived metrics
>> and proxy them to gmetad:
>>
>> http://blog.sflow.com/2015/12/using-proxy-to-feed-metrics-into-ganglia.html
>>
>> On Thu, Dec 31, 2015 at 10:40 AM, Aaron <hawaiiaa...@gmail.com> wrote:
>> > Hi, I would like to monitor linux apache servers where the apache
>> > servers
>> > would have gmond running, and the stats would be reported back to the
>> > ganglia server running gmetad and ganglia web to be displayed in a
>> > graph.
>> > Is there a php or python script to do this?  Any recommendations?
>> >
>> > Thanks, Aaron
>> >
>> >
>> > --
>> >
>> > ___
>> > Ganglia-general mailing list
>> > Ganglia-general@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/ganglia-general
>> >
>
>

--
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Unable to collect Sflow data

2015-11-19 Thread Peter Phaal
sFlow reports on two types of data:
1. periodic export of counters
2. asynchronous export of randomly sampled packets and packet forwarding info

Ganglia's data model is well suited to handling counters exported by
the Host sFlow agent (http://sflow.net/), but does not provide support
for analyzing the packet data.

Tools like sFlowTrend or sFlow-RT (http://sflow-rt.com) are
specialized tools that can decode packet headers and calculate flow
metrics. If you want to convert sFlow packet data into a form that can
be fed into time series tools like Ganglia, then you might want to
take a look at sFlow-RT.

Peter


On Thu, Nov 19, 2015 at 2:42 AM, Wenshui Chen  wrote:
> Hi There,
>
> A ganalia 3.7.2 has been installed on a CentOS 6.7_64bit box
> successfully. The host's cpu, load, memory, disk, etc., usage statistics
> are able to be viewed via ganglia-web interface. The problem is sflow
> data from a router is not able to collected and displayed by ganglia.
> Both with and without the --sflow-enable flag have been tried during
> compilation and installation  process. Neither one can show sflow
> statistics exporting from a router or a switch. My mond.conf file is
> listed below.
>
> IPTables logging function has been enabled to record accepted packets of
> udp/6343 of the MLXe4 router which is exporting sflow packets via
> UDP/6343 port. The iptables logging function does prove that sflow
> packets have pass firewall of the ganglia box. A sFlowTrend also has
> been installed on the same box. The sFlowTrend shows sflow statistics
> exported from the same Brocade MLXe4 router without problem. Thus, I'm
> sure that packets of UDP/6343 is not blocked by firewall. However, no
> sflow statistic is recored by ganglia so far. What else can be tried
> then?  Thanks a lot for your kindly help.
>
> Best Regards,
>
> Wenshui Chen
>
> /* This configuration is as close to 2.5.x default behavior as possible
> The values closely match ./gmond/metric.h definitions in 2.5.x */
> globals {
>daemonize = yes
>setuid = yes
>user = nobody
>debug_level = 0
>max_udp_msg_len = 1472
>mute = no
>deaf = no
>allow_extra_data = yes
>host_dmax = 86400 /*secs. Expires (removes from web interface) hosts
> in 1 day */
>host_tmax = 20 /*secs */
>cleanup_threshold = 300 /*secs */
>gexec = no
># By default gmond will use reverse DNS resolution when displaying
> your hostname
># Uncommeting following value will override that value.
># When uncommented "Incorrect format for spoof argument. exitin" shown.
># override_hostname = lab02.twgrid.org
># If you are not using multicast this value should be set to
> something other than 0.
># Otherwise if you restart aggregator gmond you will get empty
> graphs. 60 seconds is reasonable
>send_metadata_interval = 0 /*secs */
>
> }
>
> /*
>   * The cluster attributes specified will be used as part of the 
>   * tag that will wrap all hosts collected by this instance.
>   */
> cluster {
>name = "lab02-sflow"
>owner = "ASGCNet"
> #  latlong = "unspecified"
> #  url = "unspecified"
> }
>
> /* The host section describes attributes of the host, like the location */
> host {
>   location = "unspecified"
> }
>
> /* Feel free to specify as many udp_send_channels as you like. Gmond
> used to only support having a single channel */
> udp_send_channel {
>  #bind_hostname = yes # Highly recommended, soon to be default.
> # This option tells gmond to use a source address
> # that resolves to the machine's hostname. Without
> # this, the metrics may appear to come from any
> # interface and the DNS names associated with
> # those IPs will be used to create the RRDs.
>mcast_join = 239.2.11.71
>host = lab02.twgrid.org
>port = 8649
>ttl = 1
> }
>
> /* You can specify as many udp_recv_channels as you like as well. */
> udp_recv_channel {
>mcast_join = 239.2.11.71
>port = 8649
>bind = 239.2.11.71
>retry_bind = true
># Size of the UDP buffer. If you are handling lots of metrics you really
># should bump it up to e.g. 10MB or even higher.
># following setting is 100MB. It was 10485760(10M)
># buffer = 10485760
> }
>
> /* You can specify as many tcp_accept_channels as you like to share
> an xml description of the state of the cluster */
> tcp_accept_channel {
>port = 8649
># If you want to gzip XML output
>gzip_output = no
> }
>
> /* Channel to receive sFlow datagrams */
> udp_recv_channel {
>port = 6342
> }
>
> /* Optional sFlow settings */
> sflow {
>   udp_port = 6342
>   accept_vm_metrics = yes
>   accept_jvm_metrics = yes
>   multiple_jvm_instances = no
>   accept_http_metrics = yes
>   multiple_http_instances = no
>   accept_memcache_metrics = yes
>   multiple_memcache_instances = yes
> }
>
> /* Each metrics module that is 

Re: [Ganglia-general] GMOND + SFLOWD functionality

2015-05-30 Thread Peter Phaal
Sergey,

gmond does not retransmit the sFlow metrics it receives. A single
gmond instance is used a central collector for a cluster of machines
running Host sFlow agents. gmetad uses a TCP connection to retrieve
the cluster stats from the single gmond instance and update the RRDs.

Peter

On Fri, May 29, 2015 at 10:02 AM, Sergey svin...@apple.com wrote:
 Hi Vladimir,

 This is very serious question - is GMOND supposed to retransmit metrics 
 received from the local HSFLOWD agent or it just saves them locally for 
 further retrieving via TCP connection?
 What is the initial project for this?

 Thanks!
 Serfey Vinnik
 --
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general

--
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] HTTPD metrics not sent

2015-05-28 Thread Peter Phaal
Have you enabled http in the sFlow section in the gmond config?

http://blog.sflow.com/2011/12/using-ganglia-to-monitor-web-farms.html

You should try running sflowtool on the head end gmond system to
verify that the data is arriving:

http://blog.sflow.com/2011/12/sflowtool.html

On Thu, May 28, 2015 at 10:06 AM, Sergey svin...@apple.com wrote:
 Hi Everybody!

 I use HSFLOWD agent to collect HTTPD metrics from Apache server vis 
 mod_sflow.so module.
 I see that GMOND gets HTTPD metrics from HSFLOWD and save them in metadata, 
 but for some reason it doesn’t forward HTTPD metrics by UDP to another GMOND 
 agent.
 All other metrics are successful transfered.
 Do you know how to fix it?

 Thanks!
 Sergey




 --
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general

--
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Servers with multiple drives

2013-09-20 Thread Peter Phaal
Another alternative would be to develop a simple, portable, minimal
dependency, C command line version of gmetric that could be compiled on
Windows.  Deployment would then involve simply copying the binary
executable to your different servers and then building custom metric export
scrips in PowerShell etc.

If you look at the source code to gmetric.py, there isn't much to it. The
code is derived from the older embeddedgmetric project which has a C
library that could be updated to work with the latest version of Ganglia
and build a command line tool.

https://code.google.com/p/embeddedgmetric/wiki/GmetricClib

Perhaps someone has already done this?


On Fri, Sep 20, 2013 at 5:09 AM, Burton, Steven sbur...@shepherdbe.comwrote:

  Peter,

 ** **

 Alas, this seems something of a deal breaker.  Installing python on all of
 our servers isn’t really an option. A shame because I like the design and
 philosophy of ganglia and sflow.  I will continue with nagios and
 NSClient++.

 ** **

 Steve.

 ** **

 *From:* Peter Phaal [mailto:peter.ph...@gmail.com]
 *Sent:* 04 September 2013 23:38
 *To:* Burton, Steven
 *Cc:* ganglia-general@lists.sourceforge.net
 *Subject:* Re: [Ganglia-general] Servers with multiple drives

 ** **

 Steve,

 ** **

 The Host sFlow statistics are described on sFlow.org:

 ** **

 http://sflow.org/sflow_host.txt

 ** **

 Most of the physical host statistics are based on Ganglia's libmetrics
 library and are a superset of the metrics that you would get from a default
 gmond installation. Libmetrics defines aggregate statistics for each node.
 For example, Host sFlow's disk statistics represent total reads, writes
 etc. across all storage devices on the node. The part_max_used metric is
 the utilization of the most utilized partition.

 ** **

 If you need per device statistics, or any other non-sFlow metrics, you
 could supplement the Host sFlow base set by using gmetric.py to send
 additional metrics to the Ganglia gmond collector:

 ** **


 https://github.com/vvuksan/ganglia-misc/blob/master/gmetric-python/gmetric.py
 

 ** **

 If you have any question that are specific to installing and configuring
 Host sFlow agents, posting questions on the Host sFlow mailing list will
 reach the developers:

 ** **

 https://lists.sourceforge.net/lists/listinfo/host-sflow-discuss

 ** **

 Peter

 ** **

 On Wed, Sep 4, 2013 at 12:18 AM, Burton, Steven sbur...@shepherdbe.com
 wrote:

 Hi,

 I'm investigating Ganglia as a replacement to our nagios-based server
 stats collection system. As most of the server I'll be monitoring run
 Windows, I've been concentrating on using the host-sflow agent (not
 Ganglia, I know but I'm guessing there's a lot of experience in this list).

 I've just installed it on a Windows server 2003 machine with multiple
 drives (2) but I'm only seeing one set of disk stats. Is this correct or
 have I messed something up?

 Steve.

 Steve Burton
 Network Manager

 Shepherd Group Built Environment
 Frederick House, Fulford Road, York, Y010 4EA
 (T) 01904 660 391 (F) 01904 610 256  (M) 07801 214 009
 (W): www.shepherd-group.com

 Shepherd Group Built Environment is a member of Shepherd Building Group.
 Shepherd Building Group Ltd is a company registered in England and Wales;
 Company Number: 653663. Registered Address: Huntington House, Jockey Lane,
 Huntington, York, YO32 9XW.
 The views or opinions present in this e-mail are solely those of the
 author and do not necessarily represent those of the company. The e-mail
 and any files transmitted with it are confidential and are intended solely
 for the individual or entity to which they are addressed. If you have
 received this e-mail in error, please notify the sender. Whilst every care
 has been taken to check this outgoing e-mail for viruses it is seen as your
 responsibility to check and sweep it, and any attachments, for viruses on
 receipt




 --
 Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
 Discover the easy way to master current and previous Microsoft technologies
 and advance your career. Get an incredible 1,500+ hours of step-by-step
 tutorial videos with LearnDevNow. Subscribe today and save!
 http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general

 ** **

--
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad

Re: [Ganglia-general] Can't use sFlow and Ganglia

2013-03-12 Thread Peter Phaal
Does Virtualbox support libvirt? If so, you can compile the Host sFlow
agent to link to the libvirt library to obtain VM statistics.

Otherwise, if there is a Virtualbox specific performance library that can
be used to retrieve metrics (Host sFlow uses libxenstat for Xen and WMI for
Hyper-V) then it shouldn't be too hard to write an adapter.

The best place for questions on the Host sFlow agent is the mailing list,
https://lists.sourceforge.net/lists/listinfo/host-sflow-discuss

On Tue, Mar 12, 2013 at 3:34 AM, Mayap Christine 
christine.mayapka...@enseeiht.fr wrote:

 Hello

 Thanks for this orientation!

 When using Virtualbox, is there a special configuration to be able to
 get the VM metrics?

 Le 06/02/2013 12:33, Nicholas Satterly a écrit :
  Hi,
 
  This is very odd. I don't understand how you could be running version
  3.5.0 of gmond without sFlow support enabled. Did you build this gmond
  yourself and run configure with the --disable-sflow option because
  it is enabled by default.
 
  I suggest you either rebuild gmond and ensure that you compile with
  sFlow support enabled or download a packaged version of ganglia that
  has sFlow support.
 
  This version for Ubuntu Raring should work...
  http://packages.ubuntu.com/raring/ganglia-monitor
 
  ... or this version for Debian Wheezy...
 
 http://packages.debian.org/search?suite=wheezysearchon=nameskeywords=ganglia
 
  Regards,
  Nick
 
  On Tue, Feb 5, 2013 at 6:59 PM, Duverne, Cyrille
  cyrille.duve...@euranova.eu wrote:
  Hello Nicholas,
 
 
  Thanks a lot for your help.
 
  Please find below the outputs of the commands :
 
  gmond --version
  gmond 3.5.0
 
 
  strings /usr/sbin/gmond | grep -i sflow
  no output
 
  hsflowd -v
  -bash: /usr/sbin/hsflowd: Permission denied
 
  sudo hsflowd -v
  hsflowd version 1.22.2
 
  Thanks.
 
  CyD
 
  Imagination is more important than Knowledge
  Albert Einstein
 
 
 
  Mardi 05/02/2013 à 17:05 Nicholas Satterly a écrit:
 
  Hi Cyrille,
 
  Can you run the following commands and copy-paste the output into a
 reply
  email?
 
  gmond --version
  strings /usr/sbin/gmond | grep -i sflow
  hsflowd -v
 
  Thanks,
  Nick
 
  On Tue, Feb 5, 2013 at 3:47 PM, Duverne, Cyrille
  cyrille.duve...@euranova.eu wrote:
  Hello,
 
  Indeed this part was missing, but when I add it and restart ganglia, I
 get
  an error saying that module sFlow doesn't exist...
 
  I think I'm not running an enough recent version of ganglia, I'm using
  3.5.0
 
  Thanks in advance for your help.
  CyD
 
 
 
  Mardi 05/02/2013 à 13:21 Nicholas Satterly a écrit:
 
  Hi,
 
  Not sure if you ever solved your problem but I think you are missing
  the following config stanza from gmond.conf for the gmond that is
  receiving the sFlow packets.
 
  sflow {
 accept_vm_metrics = yes
  }
 
  I'm just trying this out myself for the first time and see the VM
  metrics appear when gmond is run in debug mode but there is no trace
  of them in the XML output.
 
  $ gmond -d 2
  ...
  saving metadata for metric: infsrcprv10.vdisk_capacity host: smc02
  ***Allocating value packet for host--(null)-- and metric
  --infsrcprv10.vdisk_capacity-- 
  ...
 
  I would guess that something is going wrong when decoding the sFlow
  packets because a host of (null) can't possibly work.
 
  Has anyone else go this working?
 
  Regards,
  Nick
 
  PS. I'm running ganglia agent version 3.5.0 and host sFlow agent
 version
  1.22.2.
 
 
  On Wed, Jan 2, 2013 at 5:45 PM, Duverne, Cyrille
  cyrille.duve...@euranova.eu wrote:
  Hello,
 
  I have a cluster of 4 machines, running Ubuntu 12.04 x86_64, sFlow and
  Ganglia, here below the config I've set up :
 
  Master instance : /etc/gmond.conf :
 
  /* Feel free to specify as many udp_send_channels as you like.  Gmond
  used to only support having a single channel */
  udp_send_channel {
 mcast_join = inferno.local
  /*mcast_join = 139.2.11.71 DEFAULT VALUE*/
 port = 8649
 ttl = 1
  }
 
  /* You can specify as many udp_recv_channels as you like as well. */
  udp_recv_channel {
/* mcast_join = 239.2.11.71 DEFAULT VALUE*/
 port = 8649
/* bind = 239.2.11.71 DEFAULT VALUE*/
 family = inet4
  }
 
  /* channel to receive sFlow */
  /* 6343 is the default sFlow port, an explicit sFlow*/
  /* configuration section is needed to override default  */
  udp_recv_channel {
 port = 6343
  }
 
  /* You can specify as many tcp_accept_channels as you like to share
  an xml description of the state of the cluster */
  tcp_accept_channel {
 port = 8649
  }
 
  Cluster machines :
 
  /etc/hsflowd.conf
 
  sflow {
DNSSD = off
polling = 20
sampling = 512
collector {
   ip = 192.168.0.100
   udpport = 6343
 }
  }
 
  /etc/gmond.conf :
 
  /* Feel free to specify as many udp_send_channels as you like.  Gmond
  used to only support having a single channel */
  udp_send_channel {
 mcast_join = inferno.local
 port = 8649
 ttl = 1
  }
 
  /* You can specify 

[Ganglia-general] InfiniBand monitoring

2013-02-27 Thread Peter Phaal
I wanted to bring attention to the following proposal from Mellanox to
define the set of InfiniBand metrics to be exported via sFlow. If you use
InfiniBand, this is an opportunity to help identify the important metrics
that can ultimately make their way into Ganglia, e.g. GPU metrics:
http://blog.sflow.com/2012/10/using-ganglia-to-monitor-gpu-performance.html

Comments to the proposal are welcome on the sFlow mailing list:

http://groups.google.com/group/sflow

InfiniBand is a protocol, used in data centers, high speed trading and
 super computers.  The characteristics  of InfiniBand are high throughput,
 low latency protocol with connection QoS and high availability .

 The following draft specification defines an sFlow sample of InfiniBand
 traffic  counter structures for reporting information from InfiniBand
 ports.

 http://sflow.org/draft_sflow_infiniband.txt

 Please comment on the draft so we can move to finalize the specification.

 I would like to thank Peter Phaal for helping me with this contribution.



 Thanks,

 Ariel Almog

 Mellanox Technologies


--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Ganglia, mod_sflow and Apache response report

2013-02-05 Thread Peter Phaal
Michael,

Ganglia doesn't understand the sampled HTTP transactions reported by
mod_sflow and there is no response report built into Ganglia.

To incorporate response time metrics based on the sFlow data, your
would need to piece together a script using the elements described in
the Ganglia book.

1. It makes most sense to calculate the response time metric on the
gmond (head) node. You will need to install sflowtool on the server to
convert the binary sFlow to text so that you can analyze the data
using a script:

http://blog.sflow.com/2011/12/sflowtool.html

2. Your analysis script needs to have access to the sFlow feed, the
easiest way is to use the tcpdump command described on page 161
(actually it looks like there is a typo, the -r - argument to
sflowtool is missing):
tcpdump -p -s 0 -w - udp 6343 | sflowtool -r -

http://blog.sflow.com/2012/01/forwarding-using-sflowtool.html

3. Since you are looking at HTTP data, you might want to use the -H
option to get sflowtool to convert the sFlow data into combined
logfile format. That way you could use existing log analysis
libraries/tools to filter on URL's, mime-types, status codes etc. when
computing your metrics.

4. The Perl script on page 167 describes how to calculate average
response time from the samples, you would need to modify the sflowtool
invocation to include the tcpdump command. Also, the script as written
will compute the average response time across the cluster of web
servers - you would need to modify the script if you want per-Host
statistics.

5. Finally, you would need to use gmetric to send the calculated
metrics gmond (using spoofing to ensure that the calculated metrics
correspond to the other metrics being directly received from the Host
sFlow agents) - see Custom Metrics on page 160.

If you don't want to develop a solution from scratch, an alternative
would be to use an sFlow analyzer to compute the response time metrics
and then feed them into Ganglia - something along the lines:

http://blog.sflow.com/2013/02/cluster-performance-metrics.html

Peter

On Mon, Feb 4, 2013 at 7:52 AM, Michael Durket
dur...@highwire.stanford.edu wrote:
 I'm running ganglia 3.4.0-1 and ganglia web 3.5.4-1. On a web server I'm 
 running the latest version of mod_sflow. I can see the Apache report on gweb 
 just fine, but I'm not sure the Apache response report is working. Is there 
 any documentation (besides the general documentation in the Ganglia book on 
 ganglia and sflow) which might tell me how to the get the Apache response 
 report working with mod_sflow in gweb?


 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://p.sf.net/sfu/appdyn_d2d_jan
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general

--
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Sflow: custom metric are invisible

2012-12-19 Thread Peter Phaal
On the receiving end, have you configured gmond to listen for gmetric messages?

udp_recv_channel {
  port = 8649
}

On the sending end (host-sflow), your gmetric settings must be
consistent with the hsflowd settings. The following message on the
host-sflow mailing list describes how to read the hsflowd settings and
pass them to gmetric.py

http://sourceforge.net/mailarchive/message.php?msg_id=29438950

On Wed, Dec 19, 2012 at 1:50 AM, MAYAP KAMGA Christine larissa
christine.mayapka...@enseeiht.fr wrote:
   Hello

 I'm facing some problems  while using sflow.
 I'm currently using sflow(1.22) on my monitored server and gmond(3.5)
 on another one.
 I'm able to have all VM_* metrics and ganglia basic metrics with  gmond
 without problem.

 To learn more about custom metrics, i have created the script to
 extract Current_users with gmetric.py.
 I'm able to execute the script. I'm also able to receive notification
 about the size of the send data.
 However, i'm unable to see the Current_user metric on the gmond
 server among others . Did i miss something?

 Please, can somebody help and guide on what to do to solve this issue?
   Thanks in advance!

 --
 LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
 Remotely access PCs and mobile devices and provide instant support
 Improve your efficiency, and focus on delivering more value-add services
 Discover what IT Professionals Know. Rescue delivers
 http://p.sf.net/sfu/logmein_12329d2d
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general

--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Question about scaling

2012-10-24 Thread Peter Phaal
Hi Mark,

If you want to significantly reduce the amount of UDP traffic going to
your head end gmond (cnode340), then you might want to consider using
Host sFlow agents to monitor machines in the cluster - sFlow encodes
all the core Ganglia metrics (along with additional disk IO, swap,
interrupt activity metrics) in a single UDP packet, so you can cut the
UDP packets per second (and the load on the head end gmond) by a
factor of 30 or more.

If you make extensive use of gmond plugins for custom metrics then you
would want to stick with gmond on all your nodes. However, if you have
a limited number of custom metrics, you can supplement the core
metrics exported by sFlow using gmetric.

http://blog.sflow.com/2011/07/ganglia-32-released.html

As Nick suggested, you should be using the latest version of gmond for
the head node. Multi-threading significantly improves scaleability and
the newer versions of gmond also include native sFlow support.

Regards,
Peter

On Tue, Oct 23, 2012 at 4:34 PM, Nicholas Satterly nfsatte...@gmail.com wrote:
 I assume cnode340 is the head node that all ~340 other gmond's send their
 data to. If so, you could reduce the amount of redundant metadata flying
 around by increasing send_metadata_interval to 120 seconds or higher.

 Also, I suspect that if you telnet to port 8649 on your head node it will
 take a while to respond because it's busy processing incoming UDP metrics.
 If it takes more than 10 seconds to respond on a regular basis then gmetad
 will timeout [1].

 Try deploying a recently patched version of gmond [2] to the head node which
 is now multi-threaded and see if that fixes the problem. It starts a
 separate thread for responding to XML metric requests and should respond
 immediately while the main thread is still processing metrics.

 Let us know how you get on.

 Regards,
 Nick

 [1]
 https://github.com/ganglia/monitor-core/blob/master/gmetad/data_thread.c#L103
 [2] https://github.com/ganglia/monitor-core/pull/53


 On Tue, Oct 23, 2012 at 7:36 PM, Potter,Mark L mlpot...@mdanderson.org
 wrote:



 data_source MDACC 60 cnode340:8649

 Everything else is default at this point. http://pastebin.com/UAQYxcX3 is
 a full copy.

 
 From: Nicholas Satterly [nfsatte...@gmail.com]
 Sent: Tuesday, October 23, 2012 13:33
 To: Potter,Mark L
 Cc: ganglia-general@lists.sourceforge.net
 Subject: Re: [Ganglia-general] Question about scaling

 Please send thru your gmetad.conf file so we can see how things are
 configured on the server side. *

 --Nick.

 * Be sure to anonymise any sensitive info.

 On 23 Oct 2012, at 19:21, Potter,Mark L mlpot...@mdanderson.org wrote:

  I am using what I think to be a fairly standard gmond.conf:
 
  globals {
   daemonize = yes
   setuid = yes
   user = nobody
   debug_level = 0
   max_udp_msg_len = 1472
   mute = no
   deaf = no
   allow_extra_data = yes
   host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in
  1 day */
   host_tmax = 30 /*secs */
   cleanup_threshold = 300 /*secs */
   gexec = no
   send_metadata_interval = 30 /*secs */
  }
 
  cluster {
   name = MDACC
   owner = MD Anderson Caner Center
   latlong = unspecified
   url = unspecified
  }
 
  host {
   location = 8,3,1
  }
 
  udp_send_channel {
host = cnode340
port = 8649
  }
 
  udp_recv_channel {
 port = 8649
   retry_bind = true
  }
 
  tcp_accept_channel {
   port = 8649
  }
 
  gmetad is set to check every 60 seconds:
 
  data_source MDACC 60 cnode340:8649
 
 
  Everything works well until around 200 hosts where it appears gmetad
  starts having issues. I have ~340 hosts to go in to this cluster. Should I
  be running multiple gmetads for this amount of hosts? With all of them
  active the web interface reports all of them down and collects no stats at
  all. I am looking for advice on getting this up and running properly. The
  ganglia host isn't underpowered at all IMO and has plenty of HDD space:
 
  Mem:  32955788 (from free)
  16 Cores (AMD Opteron(tm) Processor 6128)
 
  Thanks for any assistance.
 
 
  Respectfully,
 
  Mark L. Potter
  Research IS  Technology Services
  UNIX Systems Administrator
  O: 713-745-2032
  C:  713-965-4133
 
  --
  Everyone hates slow websites. So do we.
  Make your web apps faster with AppDynamics
  Download AppDynamics Lite for free today:
  http://p.sf.net/sfu/appdyn_sfd2d_oct
  ___
  Ganglia-general mailing list
  Ganglia-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/ganglia-general



 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://p.sf.net/sfu/appdyn_sfd2d_oct
 ___
 Ganglia-general mailing list
 

Re: [Ganglia-general] sflow metrics not visible

2012-10-20 Thread Peter Phaal
On Fri, Oct 19, 2012 at 1:59 PM, Иван Евдокимов palmal.moz...@gmail.com wrote:
 I'm trying to use sFlow(jmx-agent 0.6.1)-Ganglia(3.5.0, source build) pair
 for jvm monitoring.
 gmond.conf
 udp_recv_channel {
 port = 6343
 }

 sflow {
 accept_vm_metrics = yes
 }

 When tcpdump port 6343 is fired, i see SFlowv5-packets arriving to
 ganglia (Ubuntu x64, VirtualBox). but ... no logs, no errors, no metrics.

 First of all, is there any chance to see the logs , except -d mode ??

 gmond -m display no vm_* specific metrics !!!

 gmetrics didn't seem to work acceptthe help display - gmetric -g sflow
 produces Incorrect option  value, and the same is for every option.

 Any clues and ti ?

The accept_vm_metrics applies to Xen/KVM etc. virtual machines. For
Java, you need to use the accept_jvm_metrics:

http://blog.sflow.com/2011/12/using-ganglia-to-monitor-java-virtual.html

You didn't mention if you installed Host sFlow agents on your servers.
The Host sFlow agent is required, the following article describes how
the Host sFlow sub-agents share configuration:

http://blog.sflow.com/2012/01/host-sflow-distributed-agent.html

Peter

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] GPU performance/health monitoring

2012-10-19 Thread Peter Phaal
Hi All,

If you are running a GPU based compute cluster you might be interested
in the recently added support for GPU performance/health metrics.

http://blog.sflow.com/2012/10/using-ganglia-to-monitor-gpu-performance.html

Please try out the new extensions at let us know if there are any
issues (you will need to build gmond from the latest sources on
github).

There are other reasons to use the latest gmond; the addition of
multi-threading improves scaleability and reduces the chance of losing
metrics.

Peter

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Impact of gmond polling on data collection

2012-09-19 Thread Peter Phaal
Nick,

I think you probably need two mutexes if you want to avoid blocking
the UDP thread unnecessarily.

1. a mutex on the hastable that must be grabbed by the TCP thread when
it walks the hash table and the UDP thread would grab it any time it
adds or removes an entry from the hash table.
2. a mutex used to control access to individual entries in the
hashtable. The TCP thread would grap and release this mutex for each
entry as it walks the hash table. The UDP thread would grab this mutex
each time it updates an entry.

The only situation in which this locking scheme would block the UDP
thread for any significant time is when a new host starts sending
metrics and a new entry needs to be added to the hash table. This is a
rare event and not much of a concern. The TCP thread should never have
to wait long to acquire either of the mutexes.

Peter

On Wed, Sep 19, 2012 at 8:45 AM, Nicholas Satterly nfsatte...@gmail.com wrote:
 Hi Peter,

 Thanks for the feedback.

 I've added a thread mutex to the hosts hash table as you suggested and will
 send a pull request in the next day or so.

 Regards,
 Nick


 On Mon, Sep 17, 2012 at 8:25 PM, Peter Phaal peter.ph...@gmail.com wrote:

 Nicholas,

 It makes sense to multi-thread gmond, but looking at your patch, I
 don't see any locking associated with the hosts hashtable. Isn't there
 a possible race if new hosts/metrics are added to the hashtable by the
 UDP thread at the same time the hashtable is being walked by the TCP
 thread?

 Peter

 On Mon, Sep 17, 2012 at 6:03 AM, Nicholas Satterly nfsatte...@gmail.com
 wrote:
  Hi Chris,
 
  I've discovered there are two contributing factors to problems like
  this.
 
  1. the number of metrics being sent (possibly in short bursts) can
  overflow
  the UDP receive buffer.
  2. the time it takes to process metrics in the UDP receive buffer causes
  TCP
  connections from the gmetad's to timeout (currently hard-coded to 10
  seconds)
 
  In your case, you are probably dropping UDP packets because gmond can't
  keep
  up. Gmond was enhanced to allow you to increase the UDP buffer size back
  in
  April. I suggest you upgrade to the latest version and set this a
  sensible
  value for your environment.
 
  udp_recv_channel {
port = 1234
buffer = 1024000
  }
 
  To determine what is sensible is a bit of trial and error. Run netstat
  -su
  and keep increasing the value until you no longer see the number of
  packet
  receive errors going up.
 
  $ netstat -su
  Udp:
  7941393 packets received
  23 packets to unknown port received.
  0 packet receive errors
  10079118 packets sent
 
  The other possibility is that it takes so long for a gmetad to pull back
  all
  the metrics you are collecting for a cluster that you are preventing the
  gmond from processing metric data received via UDP. Again this can cause
  the
  UDP receive buffer to overflow.
 
  The problem we had at my work is related to all of the above but
  manifested
  itself in a slightly different way. We were seeing gaps in all our
  graphs
  because at times none of the servers in a cluster would respond to
  gmetad
  poll within 10 seconds. I used to think that the gmond was completely
  hung
  but realised that they would respond normally most of the time but every
  minute or so it woul take about 20-25 seconds. This happened to coincide
  with the UDP receive queue growing (Recv-Q column below) and I
  realised
  that it took this long for the gmond to process the metric data it had
  received via UDP from all the other servers in the cluster.
 
  $ netstat -ua
  Active Internet connections (servers and established)
  Proto Recv-Q Send-Q Local Address
  udp   1920032  0 *:8649  *:*
 
  The solution was to modify gmond and move the TCP request handler into
  to
  separate thread so that gmond could take as long as it needed to process
  incoming metric data (from UDP receive buffer that is large enough not
  to
  overflow) without blocking on the TCP requests for the XML data.
 
  The patched gmond is running without a problem in our environment so I
  have
  submitted a pull request[1] for it to be included in trunk.
 
  I can't be 100% sure that this patch will fix your problem but it would
  be
  worth a try.
 
  Regards,
  Nick
 
  [1] https://github.com/ganglia/monitor-core/pull/50
 
 
  On Sat, Sep 15, 2012 at 12:16 AM, Chris Burroughs
  chris.burrou...@gmail.com wrote:
 
  We use ganglia to monitor  500 hosts in multiple datacenters with
  about
  90k unique host:metric pairs per DC.  We use this data for all of the
  cool graphs in the web UI and for passive alerting.
 
  One of our checks is to measure TN of load_one on every box (we want to
  make sure gmond is working and correctly updating metrics otherwise we
  could be blind and not know it).  We consider it a failure if TN is 
  600.  This is an arbitrary number but 10 minutes seemed plenty long.
 
  Unfortunately we are seeing this check fail far

Re: [Ganglia-general] Impact of gmond polling on data collection

2012-09-17 Thread Peter Phaal
Nicholas,

It makes sense to multi-thread gmond, but looking at your patch, I
don't see any locking associated with the hosts hashtable. Isn't there
a possible race if new hosts/metrics are added to the hashtable by the
UDP thread at the same time the hashtable is being walked by the TCP
thread?

Peter

On Mon, Sep 17, 2012 at 6:03 AM, Nicholas Satterly nfsatte...@gmail.com wrote:
 Hi Chris,

 I've discovered there are two contributing factors to problems like this.

 1. the number of metrics being sent (possibly in short bursts) can overflow
 the UDP receive buffer.
 2. the time it takes to process metrics in the UDP receive buffer causes TCP
 connections from the gmetad's to timeout (currently hard-coded to 10
 seconds)

 In your case, you are probably dropping UDP packets because gmond can't keep
 up. Gmond was enhanced to allow you to increase the UDP buffer size back in
 April. I suggest you upgrade to the latest version and set this a sensible
 value for your environment.

 udp_recv_channel {
   port = 1234
   buffer = 1024000
 }

 To determine what is sensible is a bit of trial and error. Run netstat -su
 and keep increasing the value until you no longer see the number of packet
 receive errors going up.

 $ netstat -su
 Udp:
 7941393 packets received
 23 packets to unknown port received.
 0 packet receive errors
 10079118 packets sent

 The other possibility is that it takes so long for a gmetad to pull back all
 the metrics you are collecting for a cluster that you are preventing the
 gmond from processing metric data received via UDP. Again this can cause the
 UDP receive buffer to overflow.

 The problem we had at my work is related to all of the above but manifested
 itself in a slightly different way. We were seeing gaps in all our graphs
 because at times none of the servers in a cluster would respond to gmetad
 poll within 10 seconds. I used to think that the gmond was completely hung
 but realised that they would respond normally most of the time but every
 minute or so it woul take about 20-25 seconds. This happened to coincide
 with the UDP receive queue growing (Recv-Q column below) and I realised
 that it took this long for the gmond to process the metric data it had
 received via UDP from all the other servers in the cluster.

 $ netstat -ua
 Active Internet connections (servers and established)
 Proto Recv-Q Send-Q Local Address
 udp   1920032  0 *:8649  *:*

 The solution was to modify gmond and move the TCP request handler into to
 separate thread so that gmond could take as long as it needed to process
 incoming metric data (from UDP receive buffer that is large enough not to
 overflow) without blocking on the TCP requests for the XML data.

 The patched gmond is running without a problem in our environment so I have
 submitted a pull request[1] for it to be included in trunk.

 I can't be 100% sure that this patch will fix your problem but it would be
 worth a try.

 Regards,
 Nick

 [1] https://github.com/ganglia/monitor-core/pull/50


 On Sat, Sep 15, 2012 at 12:16 AM, Chris Burroughs
 chris.burrou...@gmail.com wrote:

 We use ganglia to monitor  500 hosts in multiple datacenters with about
 90k unique host:metric pairs per DC.  We use this data for all of the
 cool graphs in the web UI and for passive alerting.

 One of our checks is to measure TN of load_one on every box (we want to
 make sure gmond is working and correctly updating metrics otherwise we
 could be blind and not know it).  We consider it a failure if TN is 
 600.  This is an arbitrary number but 10 minutes seemed plenty long.

 Unfortunately we are seeing this check fail far too often.  We set up
 two parallel gmetad instances (monitoring identical gmonds) per DC and
 have broken our problem into two classes:
  * (A) only one of the gmetad stops updating for an entire cluster, and
 must be restarted to recover.  Since the gmetad's disagree we know the
 problem is there. [1]
  * (B) Both gmetad's say an individual host has not reported (gmond
 aggregation or sending must be at fault).  This issue is usually
 transient (that is it recovers after some period of time greater than 10
 minutes).

 While attempting to reproduce (A) we ran several additional gmetad
 instances (again polling the same gmonds) around 2012-12-07.  Failures
 per day are below [2].  The act of testing seems to have significantly
 increased the number of failures.

 This lead us to consider if the act of polling a gmond aggregator could
 impact the ability for it to concurrently collect metrics.  We looked at
 the code but are not experienced with concurrent programming in C.
 Could someone with more familiarity with the gmond code comment as to if
 this is likely  to be a worthwhile avenue of investigation?  We are also
 looking to for suggestion for an empirical test to rule this out.

 (Of course, other comments on the root TN goes up, metrics stop
 updating sporadic problem are also welcome!)

 

Re: [Ganglia-general] Java/JMX plugin for Ganglia 3.1.x

2012-09-14 Thread Peter Phaal
Martin,

If you can upgrade to the latest Ganglia release you could use sFlow
to monitor your Tomcat servers, the jxm-sflow-agent exports standard
JVM metrics, or the tomcat-sflow-valve can export the JVM metrics as
well as HTTP counters and transactions.

http://host-sflow.sourceforge.net/relatedlinks.php

Cheers,
Peter

On Thu, Sep 13, 2012 at 5:43 AM, Martin Knoblauch kn...@knobisoft.de wrote:
 Hi,

  as part of a larger tomcat deployment I need to monitor several tomcat
 instances and want to add the measured data to a Ganglia setup. I already
 found JMXtrans which seems a cool solution, but it uses host spoofing and
 I am not sure it is what I really want. Needs some real investigating.

  What I would love would to have would be a Gmond plugin that just can add
 the measured metric to the system metrics. Has anybody already done such a
 plugin or is working on it? I could provide testing, feedback and maybe
 help.

 Cheers
 Martin
 --
 Martin Knoblauch
 email: k n o b i AT knobisoft DOT de
 www: http://www.knobisoft.de

 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Easy Question: Ganglia + sFlow/NetFlow

2012-07-25 Thread Peter Phaal
Douglas,

The sFlow standard includes a mechanism for periodically exporting
counters. It is these periodic counter exports that Ganglia is
processing - there is no equivalent mechanism in NetFlow. In addition,
sFlow standardizes export of counters from servers and applications -
it is these counters that Ganglia currently supports. The following
articles give examples:

http://blog.sflow.com/search/label/Ganglia

Ganglia doesn't understand flow data (neither sFlow's
packet/transaction samples nor NetFlow records). Ganglia's strength is
in monitoring clusters of servers - for network traffic analysis you
would be better off using tools like ntop, pmacct etc. and possibly
importing traffic summaries (such as total web traffic) into Ganglia
using gmetric or through a module.

The Host sFlow web site is the place to look for server and
application sFlow agents:

http://host-sflow.sourceforge.net/

The Host sFlow agent exports core server metrics and related projects
(listed on the Host sFlow web site) instrument Apache, Java etc.

http://blog.sflow.com/2012/01/host-sflow-distributed-agent.html

-Peter

On Wed, Jul 25, 2012 at 9:09 AM, Douglas Wagner dougla...@gmail.com wrote:
 Excuse the idiocy behind this post as we're just starting to look into a lot
 of this.

 I understand Ganglia is now capable of following sFlow packets being sent
 around a network, it's also my understanding that there is a difference
 between sFlow and NetFlow (netflow being potentially a Cisco thing?).

 So, a couple, hopefuly easy, questions.

 Is there a significant difference, from a Ganglia perspective, between
 NetFlow and sFlow packets?

 Does Ganglia support NetFlow as well as sFlow (they could be technically the
 same or different as night and day for all I know).

 On the Ganglia web page it's talking about sFlow packets being accepted from
 sources such as Apache and JMX, is there any documentation anyone out there
 can point me at to allowing apps such as these (these two specifically) to
 report statistics via sFlow?

 Thanks in advance for any help you might be able to give.

 --Douglas Wagner

 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] gmond 3.4.0 and dell switches

2012-07-20 Thread Peter Phaal
The sFlow standard defines a wide range of metrics from switches,
servers and applications. Each device only exports the metrics that
are relevant to its normal operation, so switches will report network
metrics, servers will report cpu, memory, disk statistics and
applications will report response times, URLs etc.

http://blog.sflow.com/2010/08/sflow-host-structures.html

The Dell switch is exporting sFlow metrics relating to its operation
as a switch. Since it isn't a server, it won't export the server
metrics that gmond is looking for. Ganglia is designed to monitor
clusters of servers and it expects to receive a core set of server
metrics from each member of the cluster and will ignore sFlow metrics
that don't relate to that function.

There are a number of other sFlow analysis tools listed on sFlow.org
that are focused on sFlow switch metrics:

http://sflow.org/products/collectors.php

The following article describes some things to consider when
evaluating sFlow analyzers for monitoring switches:

http://blog.sflow.com/2009/05/choosing-sflow-analyzer.html

Peter

On Fri, Jul 20, 2012 at 7:46 AM, Andreas Pflug
pgad...@pse-consulting.de wrote:
 I've configured some Dell switches (e.g. 6224, with recent 3.3.3.3
 firmware) to emit SFLOW packets, and I see them happily arriving at my
 gmond machine, but the switches aren't recognized.

 Digging into the sources, I found that the switch under investigation
 never sends blocks tagged as SFLOW_COUNTERBLOCK_HOST,_HID only type 0
 and 1. Consequently, all packets are dropped.

 Is this a Dell problem of incompletely implemented SFLOW, or is it a
 gmond problem?

 Regards
 Andreas

 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] gmond 3.4.0 and dell switches

2012-07-20 Thread Peter Phaal
I agree, the performance of the network fabric is a critical component
of cluster performance and it would be great to figure out how to best
include the data in Ganglia.

A possible starting point would be to define SWITCH elements in the
XML structure exported by gmond. A switch would contain multiple
INTERFACE objects each of which contain standard SNMP MIB-II metrics
(ifInOctets, ifOutOctets, ifInErrors, ifOutErrors, ifInDiscards,
ifOutDiscards etc). The problem is that this wouldn't be backward
compatible with tools accessing the XML interface. Another option
would be to have the network data appear as a separate XML document,
accessed on a different TCP port.

The next challenge would be to figure out how to include this type of
information in the Ganglia UI - rolled up errors and discards for the
fabric would be a natural fit for the top level view, but to drill
down, Ganglia would need to deal with the concept of multiple resource
pools in the cluster (networking and computation). Extending the
notion further, a storage resource pool might also be interesting. For
virtual server pools, pooling the VMs and the hypervisors would also
be useful.

Peter

On Fri, Jul 20, 2012 at 9:31 AM, Andreas Pflug
pgad...@pse-consulting.de wrote:
 Well,

 for examining the overall health of a cluster the network fabric appears
 equally important to me...
 There seems no OS software for this combined?

 Regards
 Andreas


 Am 20.07.12 17:50, schrieb Peter Phaal:
 The sFlow standard defines a wide range of metrics from switches,
 servers and applications. Each device only exports the metrics that
 are relevant to its normal operation, so switches will report network
 metrics, servers will report cpu, memory, disk statistics and
 applications will report response times, URLs etc.

 http://blog.sflow.com/2010/08/sflow-host-structures.html

 The Dell switch is exporting sFlow metrics relating to its operation
 as a switch. Since it isn't a server, it won't export the server
 metrics that gmond is looking for. Ganglia is designed to monitor
 clusters of servers and it expects to receive a core set of server
 metrics from each member of the cluster and will ignore sFlow metrics
 that don't relate to that function.

 There are a number of other sFlow analysis tools listed on sFlow.org
 that are focused on sFlow switch metrics:

 http://sflow.org/products/collectors.php

 The following article describes some things to consider when
 evaluating sFlow analyzers for monitoring switches:

 http://blog.sflow.com/2009/05/choosing-sflow-analyzer.html

 Peter

 On Fri, Jul 20, 2012 at 7:46 AM, Andreas Pflug
 pgad...@pse-consulting.de wrote:
 I've configured some Dell switches (e.g. 6224, with recent 3.3.3.3
 firmware) to emit SFLOW packets, and I see them happily arriving at my
 gmond machine, but the switches aren't recognized.

 Digging into the sources, I found that the switch under investigation
 never sends blocks tagged as SFLOW_COUNTERBLOCK_HOST,_HID only type 0
 and 1. Consequently, all packets are dropped.

 Is this a Dell problem of incompletely implemented SFLOW, or is it a
 gmond problem?

 Regards
 Andreas

 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] Fwd: gmond 3.4.0 and dell switches

2012-07-20 Thread Peter Phaal
Ganglia is nicely extensible if you add extra metrics to the core set,
but doesn't work well when you are missing most of the core metrics.
The Ganglia UI expects the core set of host metrics to be available,
if they aren't then you end up with lots of broken links and missing
charts.

Unfortunately, there isn't a whole lot of overlap between typical
switch metrics and the Ganglia host metrics and so while it would be
easy enough to treat each switch as a host in gmond and add the port
statistics as you suggest, it breaks much of the downstream code that
depends on the missing metrics.

On Fri, Jul 20, 2012 at 3:57 PM, Vladimir Vuksan vli...@veus.hr wrote:
 I am not in favor of changing the gmond XML. I would recommend simply making
 switches hosts and emitting interface data as metrics grouped by metric
 groups e.g.

 port-1-inoctets
 port-1-outoctets

 etc. Beyond that I would like to get stuff like switch CPU utilization. Is
 this doable ?

 Vladimir


 On Fri, 20 Jul 2012, Peter Phaal wrote:

 I agree, the performance of the network fabric is a critical component
 of cluster performance and it would be great to figure out how to best
 include the data in Ganglia.

 A possible starting point would be to define SWITCH elements in the
 XML structure exported by gmond. A switch would contain multiple
 INTERFACE objects each of which contain standard SNMP MIB-II metrics
 (ifInOctets, ifOutOctets, ifInErrors, ifOutErrors, ifInDiscards,
 ifOutDiscards etc). The problem is that this wouldn't be backward
 compatible with tools accessing the XML interface. Another option
 would be to have the network data appear as a separate XML document,
 accessed on a different TCP port.

 The next challenge would be to figure out how to include this type of
 information in the Ganglia UI - rolled up errors and discards for the
 fabric would be a natural fit for the top level view, but to drill
 down, Ganglia would need to deal with the concept of multiple resource
 pools in the cluster (networking and computation). Extending the
 notion further, a storage resource pool might also be interesting. For
 virtual server pools, pooling the VMs and the hypervisors would also
 be useful.

 Peter

 On Fri, Jul 20, 2012 at 9:31 AM, Andreas Pflug
 pgad...@pse-consulting.de wrote:

 Well,

 for examining the overall health of a cluster the network fabric appears
 equally important to me...
 There seems no OS software for this combined?

 Regards
 Andreas


 Am 20.07.12 17:50, schrieb Peter Phaal:

 The sFlow standard defines a wide range of metrics from switches,
 servers and applications. Each device only exports the metrics that
 are relevant to its normal operation, so switches will report network
 metrics, servers will report cpu, memory, disk statistics and
 applications will report response times, URLs etc.

 http://blog.sflow.com/2010/08/sflow-host-structures.html

 The Dell switch is exporting sFlow metrics relating to its operation
 as a switch. Since it isn't a server, it won't export the server
 metrics that gmond is looking for. Ganglia is designed to monitor
 clusters of servers and it expects to receive a core set of server
 metrics from each member of the cluster and will ignore sFlow metrics
 that don't relate to that function.

 There are a number of other sFlow analysis tools listed on sFlow.org
 that are focused on sFlow switch metrics:

 http://sflow.org/products/collectors.php

 The following article describes some things to consider when
 evaluating sFlow analyzers for monitoring switches:

 http://blog.sflow.com/2009/05/choosing-sflow-analyzer.html

 Peter

 On Fri, Jul 20, 2012 at 7:46 AM, Andreas Pflug
 pgad...@pse-consulting.de wrote:

 I've configured some Dell switches (e.g. 6224, with recent 3.3.3.3
 firmware) to emit SFLOW packets, and I see them happily arriving at my
 gmond machine, but the switches aren't recognized.

 Digging into the sources, I found that the switch under investigation
 never sends blocks tagged as SFLOW_COUNTERBLOCK_HOST,_HID only type 0
 and 1. Consequently, all packets are dropped.

 Is this a Dell problem of incompletely implemented SFLOW, or is it a
 gmond problem?

 Regards
 Andreas


 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond.
 Discussions
 will include endpoint security, mobile security and the latest in
 malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general




 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how

Re: [Ganglia-general] Gmond Compilation on Cygwin

2012-07-12 Thread Peter Phaal
Hi Robert,

sFlow is a very simple protocol - an sFlow agent periodically sends
XDR encoded structures over UDP. Each structure has a tag and a
length, making the protocol extensible.

In the short term, it would make sense is to define an sFlow structure
to carry the current NVML metrics and tag it using NVIDIA's IANA
assigned vendor number (5703). Something along the lines:

/* NVML statistics */
/* opaque = counter_data; enterprise = 5703, format=1 */
struct nvml_gpu_counters {
  unsigned int device_count;
  unsigned int mem_total;
  unsigned int mem_util;
 ...
}

Additional examples are in the sFlow Host Structures specification
(http://www.sflow.org/sflow_host.txt), these are the structures
currently being exported by the Host sFlow agent.

Extending the Windows Host sFlow agent to export these metrics would
involve adding a routine to populate and serialize this structure -
pretty straightforward - if you look at the Host sFlow agent source
code you will see examples of how the existing structures are handled.
For Ganglia to support the new counters, we would need to add a
decoder to gmond for the new structure - also straightforward.

Are per device metrics important, or can we roll up the metrics across
all the GPUs  on a server? With sFlow we generally roll up metrics for
each node where possible - the goal is to provide enough detail so
that the operations team can tell whether a node is healthy or not,
but not so much as to overwhelm the monitoring system and limit
scaleability. Once a problem is detected, detailed metrics for
troubleshooting and diagnostics can be performed using point tools on
the host.

The metrics currently exposed by NVML API could be improved -
everything appears to be a 1 second gauge. A more robust model for
metrics is to maintain monotonic counters so that they can be polled
at different frequencies and still produce meaningful results.
Counters are also more robust when sending metrics over an unreliable
transport like UDP. The receiver calculates the delta's and can easily
compensate for lost packets.

Longer term it would be useful to have a discussion to see what
metrics best characterize operational performance and are feasible to
implement. Counters such as number of threads started, number  of busy
ticks,  number of idle ticks etc. are the type of measurement you want
to calculate utilizations. Some kind of load average based on the
thread run queue would also be interesting.

My calendar is pretty open next week - I am based in San Francisco, so
8am-5pm PST works best.

Peter

On Thu, Jul 12, 2012 at 11:58 AM, Robert Alexander
ralexan...@nvidia.com wrote:
 Hey,

 A meeting may be a good idea.  My schedule is mostly open next week.  When 
 are others free?  I will brush up on sflow by then.

 NVML and the Python metric module are tested at NVIDIA on Windows and Linux, 
 but not within Cygwin.  The process will be easier/faster on the NVML side if 
 we keep Cygwin out of the loop.

 -Robert

 -Original Message-
 From: Bernard Li [mailto:bern...@vanhpc.org]
 Sent: Thursday, July 12, 2012 10:49 AM
 To: Nigel LEACH
 Cc: lozgachev.i...@gmail.com; ganglia-general@lists.sourceforge.net; Peter 
 Phaal; Robert Alexander
 Subject: Re: [Ganglia-general] Gmond Compilation on Cygwin

 Hi Nigel:

 Technically you only need 3.1 gmond to have support for the Python metric 
 module.  But I'm not sure whether we have ever tested this under Windows.

 Peter and Robert: How quickly can we get hsflowd to support GPU metrics 
 collection internally?  Should we setup a meeting to discuss this?

 Thanks,

 Bernard

 On Thu, Jul 12, 2012 at 4:05 AM, Nigel LEACH nigel.le...@uk.bnpparibas.com 
 wrote:
 Thanks Ivan, but we have 3.0 and 3.1 gmond running under Cygwin (and using 
 APR), the problem is with the 3.4 spin.

 -Original Message-
 From: lozgachev.i...@gmail.com [mailto:lozgachev.i...@gmail.com]
 Sent: 12 July 2012 11:54
 To: Nigel LEACH
 Cc: peter.ph...@gmail.com; ganglia-general@lists.sourceforge.net
 Subject: Re: [Ganglia-general] Gmond Compilation on Cygwin

 Hi all,

 Maybe it will be interesting. Some time ago I successfully compiled gmond 
 3.0.7 and 3.1.2 under Cygwin. If you need it I can upload somewhere gmond 
 and 3rd party sources + compilation script.
 Also, I have gmetad 3.0.7 compiled for Windows. In additional, I developed 
 (just for fun) my implementation of gmetad 3.1.2 using .NET and C#.

 P. S. I do not know whether it is possible to use these gmong versions to 
 collect statistic from GPU.

 --
 Best regards,
 Ivan.

 2012/7/12 Nigel LEACH nigel.le...@uk.bnpparibas.com:
 Thanks for the updates Peter and Bernard.

 I have been unable to get gmond 3.4 working under Cygwin, my latest errors 
 are parsing gm_protocol_xdr.c. I don't know whether we should follow this 
 up, it would be nice to have a Windows gmond, but my only reason for 
 upgrading are the GPU metrics.

 I take you point about re-using the existing GPU module and gmetric

Re: [Ganglia-general] Gmond Compilation on Cygwin

2012-07-10 Thread Peter Phaal
Nigel,

A simple option would be to use Host sFlow agents to export the core
metrics from your Windows servers and use gmetric to send add the GPU
metrics.

You could combine code from the python GPU module and gmetric
implementations to produce a self contained script for exporting GPU
metrics:

https://github.com/ganglia/gmond_python_modules/tree/master/gpu/nvidia
https://github.com/ganglia/ganglia_contrib

Longer term, it would make sense to extend Host sFlow to use the
C-based NVML API to extract and export metrics. This would be
straightforward - the Host sFlow agent uses native C APIs on the
platforms it supports to extract metrics.

What would take some thought is developing standard set of summary
metrics to characterize GPU performance. Once the set of metrics is
agreed on, then adding them to the sFlow agent is pretty trivial.

Currently the Ganglia python module exports the following metrics -
are they the right set? Anything missing? It would be great to get
involvement from the broader Ganglia community to capture best
practice from anyone running large GPU clusters, as well as getting
input from NVIDIA about the key metrics.

* gpu_num
* gpu_driver
* gpu_type
* gpu_uuid
* gpu_pci_id
* gpu_mem_total
* gpu_graphics_speed
* gpu_sm_speed
* gpu_mem_speed
* gpu_max_graphics_speed
* gpu_max_sm_speed
* gpu_max_mem_speed
* gpu_temp
* gpu_util
* gpu_mem_util
* gpu_mem_used
* gpu_fan
* gpu_power_usage
* gpu_perf_state
* gpu_ecc_mode

As far as scalability is concerned, you should find that moving to
sFlow as the measurement transport reduces network traffic since all
the metrics for a node are transported in a single UDP datagram
(rather than a datagram per metric when using gmond as the agent). The
other consideration is that sFlow is unicast, so if you are using a
multicast Ganglia setup then this involves re-structuring your a
configuration.

You still need to have at least one gmond instance, but it acts as an
sFlow aggregator and is mute:
http://blog.sflow.com/2011/07/ganglia-32-released.html

Peter

On Tue, Jul 10, 2012 at 8:36 AM, Nigel LEACH
nigel.le...@uk.bnpparibas.com wrote:
 Hello Bernard, I was coming to that conclusion, I’ve been trying to compile
 on various combinations of Cygwin, Windows, Hardware this afternoon, but
 without success yet. I’ve still got a few more tests to do though.



 The GPU plugin is my only reason for upgrading from our current 3.1.7, and
 there is nothing else esoteric we use. We do have Linux Blades, but all of
 our Tesla’s are hosted on Windows.  The entire estate is quite large, so we
 would need to ensure sFlow scales, no reason to think it won’t, but I have
 little experience with it..



 Regards

 Nigel



 From: bern...@vanhpc.org [mailto:bern...@vanhpc.org]
 Sent: 10 July 2012 16:19
 To: Nigel LEACH
 Cc: neil.mckee...@gmail.com; ganglia-general@lists.sourceforge.net


 Subject: Re: [Ganglia-general] Gmond Compilation on Cygwin



 Hi Nigel:



 Perhaps other developers could chime in but I'm not sure if the latest
 version could be compiled under Windows, at least I was not aware of any
 testing done.



 Going forward I would like to encourage users to use hsflowd under Windows.
 I'm talking to the developers to see if we can add support for GPU
 monitoring.  Do you have any other requirements besides that?



 Thanks,



 Bernard

 On Tuesday, July 10, 2012, Nigel LEACH wrote:

 Hi Neil, Many thanks for the swift reply.



 I want to take a look at sFlow, but it isn’t a prerequisite.



 Anyway, I disabled sFlow, and (separately) included the patch you sent. Both
 fixes appeared successful. For now I am going with your patch, and sFlow
 enabled.



 I say “appeared successful”, as make was error free, and a gmond.exe was
 created. However, it doesn’t appear to work out of the box. I created a
 default gmond.conf



 ./gmond --default_config  /usr/local/etc/gmond.conf



 and then simply ran gmond. It started a process, but no port (8649) was
 created. Running in debug mode I get this



 $ ./gmond -d 10

 loaded module: core_metrics

 loaded module: cpu_module

 loaded module: disk_module

 loaded module: load_module

 loaded module: mem_module

 loaded module: net_module

 loaded module: proc_module

 loaded module: sys_module





 and nothing further.



 I have done little investigation yet, so unless there is anything obvious I
 am missing, I’ll continue to troubleshoot.



 Regards

 Nigel





 From: neil.mckee...@gmail.com [mailto:neil.mckee...@gmail.com]
 Sent: 09 July 2012 18:15
 To: Nigel LEACH
 Cc: ganglia-general@lists.sourceforge.net
 Subject: Re: [Ganglia-general] Gmond Compilation on Cygwin



 You could try adding --disable-sflow as another configure option.   (Or
 were you planning to use sFlow agents such as hsflowd?).



 Neil





 On Jul 9, 2012, at 3:50 AM, Nigel LEACH wrote:



 Ganglia 3.4.0

 Windows 2008 R2 Enterprise

 Cygwin 1.5.25

 IBM iDataPlex dx360 with Tesla M2070

 Confuse 2.7



 I’m trying to use the Ganglia 

[Ganglia-general] Using Ganglia/sFlow to monitor Hadoop

2012-04-20 Thread Peter Phaal
Hi All,

I have been experimenting with setting up Ganglia with sFlow agents to
monitor Hadoop. The configuration is described in the following
article:

http://blog.sflow.com/2012/04/hadoop.html

The Ganglia 3.3 release is required to report on the sFlow java metrics.

Peter

--
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Ganglia for Windows

2012-04-11 Thread Peter Phaal
The sFlow data from the Windows server looks fine.
Are you using gmond to monitor the BSD systems?
Is the Windows server the only one you are monitoring with sFlow?
Are you sure that you are running a new version of gmond (version 3.2
or greater) on the collector machine?

Any older versions of gmond will discard the sFlow counters.

On Wed, Apr 11, 2012 at 1:42 AM, Burton, Steven
sbur...@shepherd-construction.co.uk wrote:
 Hi

 I had to specify the interface to tcpdump.

 pc28040664# tcpdump -i fxp0 -p -s 0 -w - udp port 6343 | sflowtool
 tcpdump: listening on fxp0, link-type EN10MB (Ethernet), capture size 65535 
 bytes
 startDatagram =
 datagramSourceIP 172.17.6.45
 datagramSize 412
 unixSecondsUTC 1334132034
 datagramVersion 5
 agentSubId 0
 agent 172.17.6.45
 packetSequenceNo 16394
 sysUpTime 492115000
 samplesInPacket 1
 startSample --
 sampleType_tag 0:2
 sampleType COUNTERSSAMPLE
 sampleSequenceNo 16394
 sourceId 2:1
 counterBlock_tag 0:2001
 adaptor_0_ifIndex 2
 adaptor_0_MACs 1
 adaptor_0_MAC_0 6eb07a70a528
 counterBlock_tag 0:2005
 disk_total 77301145600
 disk_free 70244171776
 disk_partition_max_used 912
 disk_reads 144893
 disk_bytes_read 3117016064
 disk_read_time 121649704
 disk_writes 1341632
 disk_bytes_written 18116973056
 disk_write_time 883169784
 counterBlock_tag 0:2004
 mem_total 2142728192
 mem_free 1703526400
 mem_shared 18446744073709551615
 mem_buffers 18446744073709551615
 mem_cached 77516800
 swap_total 4139274240
 swap_free 3825762304
 page_in 4294967295
 page_out 4294967295
 swap_in 833643
 swap_out 1040520
 counterBlock_tag 0:2003
 cpu_load_one 4.490
 cpu_load_five 3.988
 cpu_load_fifteen 3.963
 cpu_proc_run 1
 cpu_proc_total 483
 cpu_num 1
 cpu_speed 2533
 cpu_uptime 492170
 cpu_user 1058041428
 cpu_nice 4294967295
 cpu_system 3166627908
 cpu_idle 4120087790
 cpu_wio 4294967295
 cpu_intr 927812500
 cpu_sintr 4294967295
 cpu_interrupts 64740155
 cpu_contexts 197872436
 counterBlock_tag 0:2006
 nio_bytes_in 485523964
 nio_pkts_in 6939345
 nio_errs_in 0
 nio_drops_in 0
 nio_bytes_out 33214290
 nio_pkts_out 151485
 nio_errs_out 0
 nio_drops_out 0
 counterBlock_tag 0:2000
 hostname SCL-RSA3
 UUID f5ead14482ace308030c78da7ace816d
 machine_type 2
 os_name 3
 os_release 5.2.3790 Service Pack 2
 endSample   --
 endDatagram   =
 ^C1 packets captured
 2477 packets received by filter
 0 packets dropped by kernel

 pc28040664#

 That's the host that returns the 'No matching metrics detected' legend in the 
 rrdtool graphs.

 Currently I'm showing 3 hosts up (which is correct) localhost and another 
 FreeBSD server are showing metrics but not the Windows server.

 Is the problem with my listeners? I ask as I've not configured multicast 
 before and have only theoretical (and non-recent) knowledge of it.

 /* Feel free to specify as many udp_send_channels as you like.  Gmond
   used to only support having a single channel */
 udp_send_channel {
  mcast_join = 239.2.11.71
  port = 8649
  ttl = 1
 }

 /* You can specify as many udp_recv_channels as you like as well. */
 udp_recv_channel {
  mcast_join = 239.2.11.71
  port = 8649
  bind = 239.2.11.71
 }

 udp_recv_channel {
  port = 8649
 }

 udp_recv_channel {
  port = 6343
 }

 /* You can specify as many tcp_accept_channels as you like to share
   an xml description of the state of the cluster */
 tcp_accept_channel {
  port = 8649
 }

 Steve.

 S Burton BSc(Hons) MIEE MBCS MIEEE
 Network Manager
 Shepherd Construction Ltd
 Head Office
 Frederick House, Fulford Road, York, YO10 4EA
 Tel:  01904 660391 Fax: 01904 660577
 Web: www.shepherd-construction.co.uk
 Registered in England and Wales Company Number:  201860 Registered address: 
 Huntington House, Jockey Lane, Huntington, York YO32 9XW

 The views or opinions presented in this e-mail are solely those of the author 
 and do not necessarily represent those of the company.
 This email and any files transmitted with it are confidential and are 
 intended solely for the individual or entity to which they are addressed. If 
 you have received this e-mail in error, please notify 
 sclc...@shepherd-construction.co.uk quoting the name of the sender.
 Whilst every care has been taken to check this outgoing e-mail for viruses it 
 is seen as your responsibility to check and sweep it, and any attachments, 
 for viruses on receipt.

 -Original Message-
 From: Peter Phaal [mailto:peter.ph...@gmail.com]
 Sent: 05 April 2012 17:42
 To: Burton, Steven
 Cc: Bernard Li; Ganglia
 Subject: Re: [Ganglia-general] Ganglia for Windows

 Can you verify that you are receiving performance metrics using the following 
 command on your gmond server?
 tcpdump -p -s 0 -w - udp port 6343 | sflowtool

 The firewall on your windows server, every firewall in the path to the bsd 
 collector, and the firewall on the bsd collector itself must be configured to 
 allow UDP port 6343 traffic to pass. The above

Re: [Ganglia-general] Ganglia for Windows

2012-04-05 Thread Peter Phaal
Can you verify that you are receiving performance metrics using the
following command on your gmond server?
tcpdump -p -s 0 -w - udp port 6343 | sflowtool

The firewall on your windows server, every firewall in the path to the
bsd collector, and the firewall on the bsd collector itself must be
configured to allow UDP port 6343 traffic to pass. The above command
will let you verify that the data is at least making it to your
server. Remember that tcpdump catches packets before an local firewall
rules are applied, so you still need to check your local configuration
even if the command shows that sFlow metrics are being received.

You can download sflowtool from the following URL, you need a recent
version to be able to decode all the host performance metrics:

http://www.inmon.com/technology/sflowTools.php

On Thu, Apr 5, 2012 at 12:32 AM, Burton, Steven
sbur...@shepherd-construction.co.uk wrote:
 Bernard,

 Yes. I have:

 udp_recv_channel {
  port = 6343
 }

 I seem to have values in the rrd's but No matching metrics detected in the 
 graphs.

 Steve.

 S Burton BSc(Hons) MIEE MBCS MIEEE
 Network Manager
 Shepherd Construction Ltd
 Head Office
 Frederick House, Fulford Road, York, YO10 4EA
 Tel:  01904 660391 Fax: 01904 660577
 Web: www.shepherd-construction.co.uk
 Registered in England and Wales Company Number:  201860 Registered address: 
 Huntington House, Jockey Lane, Huntington, York YO32 9XW

 The views or opinions presented in this e-mail are solely those of the author 
 and do not necessarily represent those of the company.
 This email and any files transmitted with it are confidential and are 
 intended solely for the individual or entity to which they are addressed. If 
 you have received this e-mail in error, please notify 
 sclc...@shepherd-construction.co.uk quoting the name of the sender.
 Whilst every care has been taken to check this outgoing e-mail for viruses it 
 is seen as your responsibility to check and sweep it, and any attachments, 
 for viruses on receipt.

 -Original Message-
 From: Bernard Li [mailto:bern...@vanhpc.org]
 Sent: 03 April 2012 01:05
 To: Burton, Steven
 Cc: Ganglia
 Subject: Re: [Ganglia-general] Ganglia for Windows

 Hi Steve:

 Have you enabled sFlow on the Linux/FreeBSD gmond.conf?  They are not on by 
 default.

 Cheers,

 Bernard

 On Mon, Apr 2, 2012 at 7:01 AM, Burton, Steven 
 sbur...@shepherd-construction.co.uk wrote:

 I found that xms was loading but I also needed php5-simplexml.

 I now have graphs for the server I'm running Ganglia on but only empty
 graphs for the Windows server I'm trialling softflow on.

 Every graph has the legend No matching metrics detected.

 The number of entries in the rrds for this server seems to be
 increasing as measured by:

 rrdtool dump pkts_in.rrd | grep 'v' | grep -v NaN | wc -l

 Steve.

 S Burton BSc(Hons) MIEE MBCS MIEEE
 Network Manager
 Shepherd Construction Ltd
 Head Office
 Frederick House, Fulford Road, York, YO10 4EA
 Tel:  01904 660391 Fax: 01904 660577
 Web: www.shepherd-construction.co.uk
 Registered in England and Wales Company Number:  201860 Registered
 address: Huntington House, Jockey Lane, Huntington, York YO32 9XW

 The views or opinions presented in this e-mail are solely those of the
 author and do not necessarily represent those of the company.
 This email and any files transmitted with it are confidential and are
 intended solely for the individual or entity to which they are
 addressed. If you have received this e-mail in error, please notify
 sclc...@shepherd-construction.co.uk quoting the name of the sender.
 Whilst every care has been taken to check this outgoing e-mail for
 viruses it is seen as your responsibility to check and sweep it, and
 any attachments, for viruses on receipt.

 -Original Message-
 From: Burton, Steven [mailto:sbur...@shepherd-construction.co.uk]
 Sent: 02 April 2012 09:10
 To: Alex Dean; Ganglia
 Subject: Re: [Ganglia-general] Ganglia for Windows

 Hi,

 I've installed php5-xml which lead to another set of errors which
 suggested php5-session was needed, so I installed that. I have a web
 front end now but empty graphs. I'm pretty sure I have data in the
 rrds as I dumped a random selection to xml and there were a
 significant number of values which were NOT NaN.

 It may be that this isn't the way forward for me as I can get more
 metrics with nagios + plugins + nagiosgraph though I only have a 5
 minute granularity with nagios, at best.

 Conversely, I might have to switch to some Linux distribution though
 FreeBSD has served me well since 1996 and I'm more comfortable
 administering it.

 Steve.

 S Burton BSc(Hons) MIEE MBCS MIEEE
 Network Manager
 Shepherd Construction Ltd
 Head Office
 Frederick House, Fulford Road, York, YO10 4EA
 Tel:  01904 660391 Fax: 01904 660577
 Web: www.shepherd-construction.co.uk
 Registered in England and Wales Company Number:  201860 Registered
 address: Huntington House, Jockey Lane, Huntington, York YO32 9XW

 The 

Re: [Ganglia-general] Problem runnning gstat gmetric gmond

2012-02-26 Thread Peter Phaal
On Mon, Feb 20, 2012 at 9:06 AM, Mohit Dhingra mohitdhing...@gmail.com wrote:
 Hi Vladimir / All,

 Everything is working fine now (gmond and gmetad), I have installed ganglia
 on Dom0 OpenSUSE, with Xen as hypervisor. Now, I want to monitor VMs with
 the help of sflow, as you told earlier.

 I have checked your links.
 http://blog.sflow.com/2012/01/using-ganglia-to-monitor-virtual.html
 http://blog.sflow.com/2011/09/xenserver-60-supplemental-pack.html

 I have some doubts regarding installation. I have installed Xen as
 hypervisor on OpenSUSE( dom0 ). I am not sure it is this XenServer that you
 talk about?

 cadlab:~/Downloads/hsflowd-1.19 # uname -a
 Linux cadlab 2.6.37.6-0.11-xen #1 SMP 2011-12-19 23:39:38 +0100 x86_64
 x86_64 x86_64 GNU/Linux

 Is sflow available for this? I downloaded the source code package. It says,
 INSTALL.Linux and INSTALL.XenServer, where it talks about DDK, but there is
 no DDK for my Xen. Should I install as what is mentioned in INSTALL.Linux?
 Will it monitor VMs?

 Can somebody please help me out with this.

If you have development tools installed on your OpenSUSE Dom0, you can
build and install from sources:

http://blog.sflow.com/2010/10/installing-host-sflow-on-linux-server.html

hsflowd uses libxenstat to monitor the performance of each of the
virtual machines from Dom0.

To build software for XenServer (and Xen Cloud Platform) you can
download a special  virtual machine (the DDK) which exactly matches
the kernel in Dom0, but includes all the development tools needed to
compile software. You build the RPMs in the DDK and install them in
Dom0. This process keeps Dom0 as small as possible.

--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Where does sFlow fit into the ganglia / java ecosystem?

2012-02-03 Thread Peter Phaal
Bryan,

Since you want each of the nodes in the cluster to have access to the state its 
peers, implementing a full gmond equivalent peer sounds like the right call. 
However, I think that you might want to consider adding sFlow export 
functionality as well. It's helpful to have a clear understanding of the goals 
and architectural choices in sFlow.

The sFlow architecture is asymmetric with agents sending but never receiving 
data. Once you have made that choice, you can further simplify the agent by 
making it stateless - for example, you will see that sFlow exports raw counters 
and leaves it up to the receiver to compute deltas. With gmond the deltas are 
computed at the sender, requiring it to maintain state (which gmond is doing 
anyway when it receives metrics, so it isn't an unreasonable choice). Removing 
all state from the agent means that its memory requirements are minimal and it 
doesn't need to allocate memory - both properties are very useful when you want 
to embed the measurements in hardware devices like network switches.

As you point out, another difference is that sFlow exports standard sets of 
metrics rather than ad-hoc measurements. The benefit is that you can focus on 
optimizing the the collection of the standard metrics (even implementing some 
in hardware), tightly pack the data in a single datagram, eliminate the 
overhead of exchanging metadata and simplify multivendor monitoring since the 
same measurements will be sent by every device. Standardizing the metrics also 
helps reduce operational complexity - eliminating the configuration options 
that are needed for a more flexible solution.

A goal with sFlow is to instrument every switch port, server, virtual machine 
and service to provide a comprehensive view of performance across the data 
center. I think there would be great value in having bigdata export metrics so 
that they can be combined with data from network, load-balancer, web, memcache 
and application server tiers. 

It's also worth mentioning that sFlow doesn't just export counters. As an 
example, the sFlow Memcache metrics are probably most similar to the kinds of 
data you might want to export for bigdata. In addition to exporting a standard 
set of counters, the sFlow agent also randomly samples Memcache operations, 
exporting the command (GET,SET..), status (OK,ERROR,NOT_FOUND...), value size, 
and duration of the sampled operation. Random sampling is very lower overhead 
(about the cost of maintaining one counter) making it suitable for continuous 
monitoring of high transaction rate environments like a large Memcached 
cluster. The counters and the transaction samples complement one another. For 
example, you might be using Ganglia to track the cache hit rate using the sFlow 
counters and notice an increase in cache misses. Looking at the transaction 
samples you can identify the cluster-wide top missed keys - the information you 
need to actually fix the problem. In one case I am aware of, the misses were 
caused by a typo in a client side script and easily fixed - it's hard to see 
how you would easily spot this problem any other way.

In the web tier, sFlow agents sample HTTP operations and you might notice an 
increase in response time for a particular URL and trace it back to the missed 
key in the cache for example.

Getting back to bigdata - you could useful export the JVM metrics using sFlow - 
take a look at the jmx-sflow agent, or tomcat-sflow-valve for examples:
http://jmx-sflow-agent.googlecode.com/
http://tomcat-sflow-valve.googlecode.com/

There isn't much to the code, so you could easily incorporate it as an option 
in your java library.

There is currently an effort underway to generalize sFlow's application layer 
monitoring:

https://groups.google.com/forum/?fromgroups#!topic/sflow/e2sLb_3hyDI

I would be very interested in any comments you might have about the 
applicability to instrumenting bigdata transactions.

Cheers,
Peter

On Feb 3, 2012, at 10:19 AM, Bryan Thompson wrote:

 Peter,
 
 I put together a ganglia listener / sending library in Java [1] which builds 
 up soft state in a concurrent hash map to support a ganglia integration for 
 bigdata [2].  The library makes it easy to turn a Java application into a 
 ganglia peer.  I also plan to migrate some of our existing per-host, 
 per-process, and JVM specific counters that we have into this library where 
 they might be useful to a broader audience.
 
 Some of the benefits of this library for us are that we can:
 - leverage the existing ganglia ecosystem;
 - obtain fast load balanced reports from the soft state inside of the JVM; and
 - extend the metric collection and reporting trivially to application 
 specific counters.
 
 I understand that sFlow is available for a variety of environments and that 
 it provides a tighter, though fixed, data gram encoding for metric messages.  
 Can you expand on whether sFlow might have been an alternative for the 
 integration that we 

Re: [Ganglia-general] Ganglia 3.3.0 released

2012-02-02 Thread Peter Phaal
The following articles describe the sFlow metrics included in the Ganglia 3.3.0 
and 3.2.0 releases:

http://blog.sflow.com/2012/02/ganglia-33-released.html
http://blog.sflow.com/2011/07/ganglia-32-released.html

The Host sFlow agent efficiently exports standard Ganglia host metrics from 
Windows, Linux and FreeBSD servers as well as per-VM metrics from Hyper-V, 
XenServer, XCP and Xen hypervisors. Additional sFlow agents are available for 
Java, Apache, Tomcat, NGINX, node.js and Memcached.

Peter


On Feb 1, 2012, at 2:38 PM, Vladimir Vuksan wrote:

 This was gonna be the 4.0.0 release however we received feedback that 
 making a major version bump may get cause issues with various Linux 
 distribution packaging policies e.g. Fedora. Therefore it's been rebranded 
 as 3.3.0. Announcement is here
 
 http://ganglia.info/?p=489
 
 Enjoy,
 
 Vladimir


--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] Ganglia 3.2 and sFlow

2011-11-01 Thread Peter Phaal
Anyone curious about the sFlow functionality in Ganglia 3.2 should
take a look at Dave Mangot's blog - he describes why Tagged.com is
using Ganglia with sFlow.

http://tech.mangot.com/roller/dave/entry/host_based_sflow_a_drop

Peter

--
RSAreg; Conference 2012
Save #36;700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Ganglia 3.2.0 is out

2011-07-13 Thread Peter Phaal
On Wed, Jul 13, 2011 at 6:43 PM, Vladimir Vuksan vl...@vuksan.com wrote:
 Great. Would be possible to get a comprehensive guide on all the
 configuration options for sFlow stuff :-).

There are very few configuration settings. Just the udp_port and the
accept_vm_metrics settings, both are shown in the default
configuration:
[root@ganglia ~]# gmond --default_config
...
/* Channel to receive sFlow datagrams */
#udp_recv_channel {
#  port = 6343
#}

/* optional sFlow settings */
#sflow {
# udp_port = 6343
# accept_vm_metrics = no
#}


 Actually we are working on adding ability to add TAGS to hosts ie. a comma
 separated list of arbitrary tags that identify a host e.g. database,memcache
 etc. We then just need to build a UI that would allow you to just see things
 tagged with memcache etc.

TAGS sound like a flexible way to handle groups of hosts.

 The current gmond/sFlow implementation includes a parent attribute for
 each virtual machine that identifies the physical server hosting the
 virtual machine.


 Where in the XML is that actually displayed ?

The sFlow protocol assignes a data source index (dsi) to each measurement point:
METRIC NAME=dsi VAL=10.0.0.162:1 TYPE=string UNITS= TN=7
TMAX=60 DMAX=0 SLOPE=zero
EXTRA_DATA
EXTRA_ELEMENT NAME=TITLE VAL=Datasource ID/
EXTRA_ELEMENT NAME=DESC VAL=Datasource ID/
EXTRA_ELEMENT NAME=GROUP VAL=system/
/EXTRA_DATA
/METRIC

A virtual machine has it's own dsi as well as a parent_dsi, indicating
the hypervisor hosting it:

METRIC NAME=dsi VAL=10.0.0.163:2 TYPE=string UNITS= TN=18
TMAX=60 DMAX=0 SLOPE=zero
EXTRA_DATA
EXTRA_ELEMENT NAME=TITLE VAL=Datasource ID/
EXTRA_ELEMENT NAME=DESC VAL=Datasource ID/
EXTRA_ELEMENT NAME=GROUP VAL=system/
/EXTRA_DATA
/METRIC

METRIC NAME=parent_dsi VAL=10.0.0.162:1 TYPE=string UNITS=
TN=18 TMAX=60 DMAX=0 SLOPE=zero
EXTRA_DATA
EXTRA_ELEMENT NAME=TITLE VAL=Parent Datasource ID/
EXTRA_ELEMENT NAME=DESC VAL=Parent Datasource ID/
EXTRA_ELEMENT NAME=GROUP VAL=system/
/EXTRA_DATA
/METRIC

hsflowd reports core Ganglia metrics for the hypervisor and libvirt
metrics for each virtual machine:
http://www.sflow.org/sflow_host.txt

 Question is who sets the host-id. UUIDs are meaningless without context and
 I am not sure that it should be Ganglia or HSflowd that set them. This is
 likely a job of configuration management system.

hsflowd obtains the UUID from BIOS where possible (e.g. using
/usr/sbin/dmidecode on Linux), falling back on the UUID of the first
physical disk on older systems without BIOS UUID.

Hypervisors assign a UUID to each virtual machine as it is created.
hsflowd uses libvirt/libxenstore to retrieve the virtual machine
UUIDs.

The UUIDs provide a unique and persistent identifier for each physical
and virtual machine. sFlow also reports adapter MAC addresses for each
physical and virtual machine. IP addresses and hostnames can change,
but the UUIDs tend to stay the same.

There is zero configuration involved in assigning UUIDs, they are
assigned to CPU motherboards, or automatically by the operating system
as disks are formatted or virtual machines created.

Peter

--
AppSumo Presents a FREE Video for the SourceForge Community by Eric 
Ries, the creator of the Lean Startup Methodology on Lean Startup 
Secrets Revealed. This video shows you how to validate your ideas, 
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Ganglia 3.2.0 is out

2011-07-12 Thread Peter Phaal
On Tue, Jul 12, 2011 at 6:43 AM, Vladimir Vuksan vli...@veus.hr wrote:
 That's relatively easy to fix. In Gweb 2.1.0+ any metrics that don't exist
 show up as empty graphs with a legend that says No matching metrics found.
 We can certainly fix any other ones. We shouldn't let UI get in the way of
 collecting useful metrics :-).

That will be definite improvement. There is a bigger issue when

 I missed your February post :-(. I think all the metrics you are currently
 dropping are useful metrics and I think those should be included. Is this
 something that needs to change in the gmond code or is this part of hsflowd
 ?

The vm statistics are always sent by hsflowd (when running on a
hypervisor). They are dropped by default in gmond, but can be enabled
using the accept_vm_metrics = yes option:
http://blog.sflow.com/2011/07/ganglia-and-cloud-performance.html

Broken charts are only part of the problem. Ganglia works best when
all the items being displayed are part of a cluster (i.e. the members
are similar, sharing common attributes). When you look at the
statistics from a virtual server pool, there are really two logical
clusters. The cluster of virtual machines with one set of attributes
and the cluster of physical servers running Xen, KVM etc. that host
the virtual machines. Mixing virtual and physical machines leads to a
confusing presentation because you are no longer comparing like with
like. If you throw in network statistics, you logically have a third
cluster (of network interfaces).

One way to address the problem would be to have a HOST attribute in
the gmond metadata that allowed different logical clusters to be
identified. For example a physical server might have an attribute
CATEGORY=SERVER in its HOST section. A virtual machine could be
identified as CATEGORY=VM and a network interface CATEGORY=NETWORK.
This would allow the UI to switch between logical slices being
reported by a single gmond instance.

An alternative would be to allow multiple tcp_accept_channel sections,
each of which would present a different logical cluster. For example
the following

tcp_accept_channel {
  port = 8649
  hosttype=server
}

tcp_accept_channel {
  port = 8650
  hosttype=vm
}

This second option fits well with the current architecture, the
following gmetad.conf settings would create the two clusters.

data_source server cluster localhost
data_source vm cluster localhost:8650


 Regarding some of the other points in your February e-mail

 1. Standardizing TITLE and DESC metric values
  - That sounds like a good idea. 2. Should TITLE and DESC metadata be
 excluded from the statistics export
   - That also sounds like a good idea but it may not have as much value to
 get it done at this time. I'd defer that to later. Let me know if you
 disagree.

This was a general comment about cleaning up the scheme for the
future. Not a high priority.

 3. Express Containment of a virtual host
   - I think we could work around it by either adding an additional attribute
 to e.g. HOST that says something like PARENT. That should be easy to add.
 Alternatively we can add e.g. string metric that says Parent. That may be
 the easier way to go. Remaining portion is then just the UI component.

The current gmond/sFlow implementation includes a parent attribute for
each virtual machine that identifies the physical server hosting the
virtual machine.

 4.  Unique server/VM UUID
   - I believe this is now solved by using the override_hostname and
 override_ip settings. Let me know if you disagree.

The current gmond/sFlow implementation does override the hostname and
ip attributes, but you end up with odd values in the UI:
IP Address  b0b22c02-6947-fc8a-5a87-f1d014f4ae69

It would be better if there were an explicit opaque host-id attribute
that was used as the key in the gmond hash table and to key identify
hosts in gmond, as directory paths for charts etc. Hostnames and IP
addresses would no longer be required (or required to be unique) and
could be omitted if unknown.

hsflowd reports UUIDs for physical and virtual servers. UUIDs are
persistent and unique, making them good candidates for host-ids

 5. Expanding number of Ganglia metrics / Is there interest ?
   - Yes and yes :-). This is something people constantly ask on IRC. I think
 we'll start with adding selected Python modules that are now in our Github
 repo to the distribution. I'd definitely like to see more metrics coming out
 hsflowd.

There are also currently efforts to standardize http and memcache
metrics export in sFlow. Once they are finalized, we plan to add them
to gmond:
http://blog.sflow.com/2011/01/http.html
http://blog.sflow.com/2010/09/memcached.html

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 

Re: [Ganglia-general] Ganglia 3.2.0 is out

2011-07-11 Thread Peter Phaal
Good suggestion. The bind directive in the udp_recv_channel block
looks like it does the trick.

I updated the instructions to cover this option:
http://blog.sflow.com/2011/07/ganglia-32-released.html

On Mon, Jul 11, 2011 at 9:43 AM, Robert Jordan rjor...@notampering.com wrote:
 Hi Peter,
 Regarding the article linked below;  Is it also possible to use the standard
 port number but different bind addresses for multiple gmond processes when
 monitoring multiple clusters?  Using this approach would have the advantage
 of allowing the configuration to be changed by simply updating DNS entries
 rather than potentially needing to update many host-sflow agent machines.
 Thanks,
 Robert

 On Thu, Jul 7, 2011 at 11:26 PM, Peter Phaal peter.ph...@gmail.com wrote:

 Great news! For additional information on the sFlow feature and
 updated configuration instructions, see:
 http://blog.sflow.com/2011/07/ganglia-32-released.html

 On Thu, Jul 7, 2011 at 7:20 PM, Vladimir Vuksan vli...@veus.hr wrote:
  -- Forwarded message --
  We are happy to announce the release of Ganglia 3.2.0. Announcement can
  be read
  here
 
  http://ganglia.info/?p=430
 
  Notable changes are
 
    -  sFlow support
    -  hostname/ip override - useful in dynamic/cloud environments
    -  FreeBSD patches
    -  Python module improvements
    -  Bugfixes and improvements over 3.1.7
 
  Now that 3.2.0 is out we have a number of other improvements we are
  working and
  hope to release shortly. Stay tuned.
 
  Vladimir
 
 
  --
  All of the data generated in your IT infrastructure is seriously
  valuable.
  Why? It contains a definitive record of application performance,
  security
  threats, fraudulent activity, and more. Splunk takes this data and makes
  sense of it. IT sense. And common sense.
  http://p.sf.net/sfu/splunk-d2d-c2
  ___
  Ganglia-general mailing list
  Ganglia-general@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/ganglia-general
 


 --
 All of the data generated in your IT infrastructure is seriously valuable.
 Why? It contains a definitive record of application performance, security
 threats, fraudulent activity, and more. Splunk takes this data and makes
 sense of it. IT sense. And common sense.
 http://p.sf.net/sfu/splunk-d2d-c2
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general



--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Ganglia 3.2.0 is out

2011-07-11 Thread Peter Phaal
On Mon, Jul 11, 2011 at 1:09 PM, Vladimir Vuksan vli...@veus.hr wrote:
 Peter,

 It is also my understanding that currently only metrics from physical hosts
 are supported. Is it possible to add network devices that support sFlow ?

 Thanks,
 Vladimir

Currently the Ganglia UI is host oriented, expecting a core set of
metrics to be present for each server. The current Host sFlow
implementation includes virtual machine statistics (equivalent to
libvirt performance metrics), but they are disabled by default since
there are issues with the UI since virtual machines report a limited
set of metrics:
http://blog.sflow.com/2011/07/ganglia-and-cloud-performance.html

There are additional enhancements to the Ganglia UI and data model
that would be helpful:
http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg06319.html

Enabling sFlow metrics from network devices would have similar
problems since the metrics relate to network links rather than
servers.

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Ganglia 3.2.0 is out

2011-07-08 Thread Peter Phaal
Great news! For additional information on the sFlow feature and
updated configuration instructions, see:
http://blog.sflow.com/2011/07/ganglia-32-released.html

On Thu, Jul 7, 2011 at 7:20 PM, Vladimir Vuksan vli...@veus.hr wrote:
 -- Forwarded message --
 We are happy to announce the release of Ganglia 3.2.0. Announcement can be 
 read
 here

 http://ganglia.info/?p=430

 Notable changes are

   -  sFlow support
   -  hostname/ip override - useful in dynamic/cloud environments
   -  FreeBSD patches
   -  Python module improvements
   -  Bugfixes and improvements over 3.1.7

 Now that 3.2.0 is out we have a number of other improvements we are working 
 and
 hope to release shortly. Stay tuned.

 Vladimir

 --
 All of the data generated in your IT infrastructure is seriously valuable.
 Why? It contains a definitive record of application performance, security
 threats, fraudulent activity, and more. Splunk takes this data and makes
 sense of it. IT sense. And common sense.
 http://p.sf.net/sfu/splunk-d2d-c2
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] Using Ganglia to monitor Rackspace cloudservers

2011-01-31 Thread Peter Phaal
Hi All,

I have been experimenting with Ganglia for monitoring performance in
the Rackspace cloud and it works very well:

http://blog.sflow.com/2011/01/rackspace-cloudservers.html

A big advantage of the gmond/sFlow data push model is that Ganglia
automatically discovers cloud servers as they are created. The polling
model that most network management tools use is poorly suited to
monitoring dynamic cloud server pools.

For anyone interested in taking a look, Ganglia is running on a Fedora
14 cloud server, http://rs-ganglia.inmon.com/

Peter

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Ganglia and sFlow

2010-12-15 Thread Peter Phaal
The patch is obsolete, the sFlow code has been checked into the development
branch (trunk).

To build Ganglia with sFlow support you need to download the latest sources
from Sourceforge:

svn co https://ganglia.svn.sourceforge.net/svnroot/ganglia ganglia

Peter

On Wed, Dec 15, 2010 at 3:46 AM, Giovanni De Rosa giode...@hotmail.itwrote:

  Hi,
 i think there is something wrong beacuse i only see the host on with is
 installed gmond in the ganglia web page.
 i checked if sflow send to the host with gmond  the packets and it does (i
 used tcpdump udp port 6343).
 To pach gmond for the use of sFlow i used this
 http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=276
 What can i do?

 thanks a lot

 Giovanni



  *From:* Peter Phaal peter.ph...@gmail.com
 *Sent:* Tuesday, December 07, 2010 6:34 PM
 *To:* Giovanni De Rosa giode...@hotmail.it
 *Cc:* ganglia-general@lists.sourceforge.net
 *Subject:* Re: [Ganglia-general] Ganglia and sFlow

 The following article provides additional information on configuring the
 Ganglia development branch (trunk) to collect sFlow:
 http://blog.sflow.com/2010/10/ganglia.html

 Installing and configuring Host sFlow agents to send sFlow from Linux and
 Windows platforms is described in the articles:
 http://blog.sflow.com/2010/10/installing-host-sflow-on-linux-server.html
 http://blog.sflow.com/2010/10/installing-host-sflow-on-windows-server.html

 sFlow is sent from the Host sFlow agents to the Ganglia gmond collector as
 unicast UDP messages to port 6343. You need to make sure that each Host
 sFlow agent is configured to send to the IP address of the server that gmond
 is installed on (the configuration details are in the articles above). If
 you are still having problems then check that there are no firewalls
 blocking the traffic. The IP tables filters on the the collector and agents
 and well as any intermediate firewalls must be configured allow UDP port
 6343 traffic to pass.

 You can confirm that sFlow is being received at the gmond server by running
 the following tcpdump command:
 tcpdump udp port 6343

 Please let me know if you have any difficulties.

 Peter

 On Tue, Dec 7, 2010 at 1:19 AM, Giovanni De Rosa giode...@hotmail.itwrote:

  hi,
 i'm trying to use ganglia with sFlow. I have installed gmond patched for
 using sFlow on a host and installed the sFlow agent onto a different host.
 The problem is that it seems to me that never is changed (the xml of gmond
 seems the same). How can i anderstand that all is working right??? the sFlow
 angent run well and send to the host on with is installed gmond the
 packets.

 thanks a lot
 Giovanni


 --
 What happens now with your Lotus Notes apps - do you make another costly
 upgrade, or settle for being marooned without product support? Time to
 move
 off Lotus Notes and onto the cloud with Force.com, apps are easier to
 build,
 use, and manage than apps on traditional platforms. Sign up for the Lotus
 Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general



--
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Ganglia and sFlow

2010-12-15 Thread peter . phaal

You are correct.

If gmond still isn't reporting on sFlow, the other thing to check is your  
firewall. tcpdump sees packets before the firewall so seeing the sFlow  
packets in tcpdump confirms that the packets are arriving at the server,  
but it doesn't necessarily mean that gmond is able to receive them. You may  
need to add a rule to iptables accepting incoming packets to UDP port 6343.


To test if the problem is firewall related, you can temporarily disable the  
firewall with the command:

/sbin/service iptables stop

On Dec 15, 2010 9:28am, giovanni de rosa giode...@hotmail.it wrote:





thanks a lot. I'm not an expert so sorry if i say nonsense. If i  
undestand correctly i have to download everything is in:  
https://ganglia.svn.sourceforge.net/svnroot/ganglia/trunk/monitor-core/  
than adding in the configuration file the rcv channel for sflow and then  
rebulding all. Is this right?





thanks






Date: Wed, 15 Dec 2010 09:01:56 -0800
Subject: Re: [Ganglia-general] Ganglia and sFlow
From: peter.ph...@gmail.com
To: giode...@hotmail.it
CC: ganglia-general@lists.sourceforge.net


The patch is obsolete, the sFlow code has been checked into the  
development branch (trunk).




To build Ganglia with sFlow support you need to download the latest  
sources from Sourceforge:





svn co https://ganglia.svn.sourceforge.net/svnroot/ganglia ganglia




Peter




On Wed, Dec 15, 2010 at 3:46 AM, Giovanni De Rosa giode...@hotmail.it  
wrote:





Hi,
i think there is something wrong beacuse i only see the host on with is  
installed gmond in the ganglia web page.
i checked if sflow send to the host with gmond the packets and it does (i  
used tcpdump udp port 6343).
To pach gmond for the use of sFlow i used this  
http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=276

What can i do?





thanks a lot





Giovanni












From: Peter Phaal



Sent: Tuesday, December 07, 2010 6:34 PM



To: Giovanni De Rosa



Cc: ganglia-general@lists.sourceforge.net



Subject: Re: [Ganglia-general] Ganglia and sFlow











The following article provides additional information on configuring the  
Ganglia development branch (trunk) to collect sFlow:



http://blog.sflow.com/2010/10/ganglia.html





Installing and configuring Host sFlow agents to send sFlow from Linux and  
Windows platforms is described in the articles:



http://blog.sflow.com/2010/10/installing-host-sflow-on-linux-server.html



http://blog.sflow.com/2010/10/installing-host-sflow-on-windows-server.html





sFlow is sent from the Host sFlow agents to the Ganglia gmond collector  
as unicast UDP messages to port 6343. You need to make sure that each  
Host sFlow agent is configured to send to the IP address of the server  
that gmond is installed on (the configuration details are in the articles  
above). If you are still having problems then check that there are no  
firewalls blocking the traffic. The IP tables filters on the the  
collector and agents and well as any intermediate firewalls must be  
configured allow UDP port 6343 traffic to pass.





You can confirm that sFlow is being received at the gmond server by  
running the following tcpdump command:



tcpdump udp port 6343






Please let me know if you have any difficulties.






Peter



On Tue, Dec 7, 2010 at 1:19 AM, Giovanni De Rosa giode...@hotmail.it  
wrote:





hi,


i'm trying to use ganglia with sFlow. I have installed gmond patched for  
using sFlow on a host and installed the sFlow agent onto a different  
host. The problem is that it seems to me that never is changed (the xml  
of gmond seems the same). How can i anderstand that all is working  
right??? the sFlow angent run well and send to the host on with is  
installed gmond the packets.





thanks a lot



Giovanni




--
What happens now with your Lotus Notes apps - do you make another costly
upgrade, or settle for being marooned without product support? Time to  
move
off Lotus Notes and onto the cloud with Force.com, apps are easier to  
build,

use, and manage than apps on traditional platforms. Sign up for the Lotus
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general














--
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


Re: [Ganglia-general] Ganglia and sFlow

2010-12-07 Thread Peter Phaal
The following article provides additional information on configuring the
Ganglia development branch (trunk) to collect sFlow:
http://blog.sflow.com/2010/10/ganglia.html

Installing and configuring Host sFlow agents to send sFlow from Linux and
Windows platforms is described in the articles:
http://blog.sflow.com/2010/10/installing-host-sflow-on-linux-server.html
http://blog.sflow.com/2010/10/installing-host-sflow-on-windows-server.html

sFlow is sent from the Host sFlow agents to the Ganglia gmond collector as
unicast UDP messages to port 6343. You need to make sure that each Host
sFlow agent is configured to send to the IP address of the server that gmond
is installed on (the configuration details are in the articles above). If
you are still having problems then check that there are no firewalls
blocking the traffic. The IP tables filters on the the collector and agents
and well as any intermediate firewalls must be configured allow UDP port
6343 traffic to pass.

You can confirm that sFlow is being received at the gmond server by running
the following tcpdump command:
tcpdump udp port 6343

Please let me know if you have any difficulties.

Peter

On Tue, Dec 7, 2010 at 1:19 AM, Giovanni De Rosa giode...@hotmail.itwrote:

  hi,
 i'm trying to use ganglia with sFlow. I have installed gmond patched for
 using sFlow on a host and installed the sFlow agent onto a different host.
 The problem is that it seems to me that never is changed (the xml of gmond
 seems the same). How can i anderstand that all is working right??? the sFlow
 angent run well and send to the host on with is installed gmond the
 packets.

 thanks a lot
 Giovanni


 --
 What happens now with your Lotus Notes apps - do you make another costly
 upgrade, or settle for being marooned without product support? Time to move
 off Lotus Notes and onto the cloud with Force.com, apps are easier to
 build,
 use, and manage than apps on traditional platforms. Sign up for the Lotus
 Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d
 ___
 Ganglia-general mailing list
 Ganglia-general@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/ganglia-general


--
What happens now with your Lotus Notes apps - do you make another costly 
upgrade, or settle for being marooned without product support? Time to move
off Lotus Notes and onto the cloud with Force.com, apps are easier to build,
use, and manage than apps on traditional platforms. Sign up for the Lotus 
Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general


[Ganglia-general] sFlow support in gmond

2010-10-25 Thread peter . phaal

Hello All,

Here is some background on the sFlow support that has been added to gmond  
in the development branch:

http://blog.sflow.com/2010/10/ganglia.html

An sFlow agent is extremely lightweight, since sFlow monitoring is  
typically used in embedded environments where resources are constrained:  
switches, routers, firewalls, hypervisors etc. The addition of sFlow  
support to gmond allows metrics to be collected from these environments  
where the installation of a gmond agent is often not possible.


This initial implementation of gmond/sFlow decodes and populates the core  
set of Ganglia metrics, but future versions could decode additional sFlow  
structures. For example, sFlow reports on virtual machine statistics (based  
on libvirt), however the challenge is deciding how to incorporate the  
additional metrics in the Ganglia data model and in the UI:

http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg06009.html

There are currently sFlow agents for Windows, Linux, Xen, XCP, XenServer  
and KVM/libvirt.


Please reply to the list with any comments and suggestions.

Cheers,
Peter
--
Nokia and ATT present the 2010 Calling All Innovators-North America contest
Create new apps  games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev___
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general