[Ganglia-general] Monitoring Linux services
Hi All, For anyone interesting in monitoring Linux services, the latest Host sFlow release can automatically track and monitor services running under systemd: http://blog.sflow.com/2016/12/monitoring-linux-services.html Ganglia already includes support for the sFlow metrics: http://blog.sflow.com/2016/12/using-ganglia-to-monitor-linux-services.html Peter -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] ganglia web to monitor apache servers?
You could use a combination of Host sFlow and mod-sflow on your Apache web servers: http://www.sflow.net/ https://github.com/sflow/mod-sflow The following article describes how to configure the head-end gmond: http://blog.sflow.com/2011/12/using-ganglia-to-monitor-web-farms.html mod-sflow also exports Apache worker pool stats to Ganglia: http://blog.sflow.com/2012/10/thread-pools.html mod-sflow also exports URL, referrer, user-agent, response time and status code information that you can use to derive metrics for each web service. You could use sFlow-RT to calculate the derived metrics and proxy them to gmetad: http://blog.sflow.com/2015/12/using-proxy-to-feed-metrics-into-ganglia.html On Thu, Dec 31, 2015 at 10:40 AM, Aaronwrote: > Hi, I would like to monitor linux apache servers where the apache servers > would have gmond running, and the stats would be reported back to the > ganglia server running gmetad and ganglia web to be displayed in a graph. > Is there a php or python script to do this? Any recommendations? > > Thanks, Aaron > > -- > > ___ > Ganglia-general mailing list > Ganglia-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ganglia-general > -- ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] ganglia web to monitor apache servers?
Vladimir's blog has a solution that involves tailing the Apache log files: http://vuksan.com/linux/ganglia/#Apache_Traffic_Stats The sFlow protocol packs a large number of metrics in each UDP datagram, so you should see a reduction in UDP datagrams per second associated with monitoring. The C based mod-sflow / host-sflow agents have a small memory and CPU footprint. On Thu, Dec 31, 2015 at 3:14 PM, Aaron <hawaiiaa...@gmail.com> wrote: > Thanks Peter. Is there a way to use more a pure ganglia solution? Will > sflow generate more udp traffic and/or cpu cycles? > > On Thu, Dec 31, 2015 at 12:05 PM, Peter Phaal <peter.ph...@gmail.com> wrote: >> >> You could use a combination of Host sFlow and mod-sflow on your Apache >> web servers: >> http://www.sflow.net/ >> https://github.com/sflow/mod-sflow >> >> The following article describes how to configure the head-end gmond: >> http://blog.sflow.com/2011/12/using-ganglia-to-monitor-web-farms.html >> >> mod-sflow also exports Apache worker pool stats to Ganglia: >> http://blog.sflow.com/2012/10/thread-pools.html >> >> mod-sflow also exports URL, referrer, user-agent, response time and >> status code information that you can use to derive metrics for each >> web service. You could use sFlow-RT to calculate the derived metrics >> and proxy them to gmetad: >> >> http://blog.sflow.com/2015/12/using-proxy-to-feed-metrics-into-ganglia.html >> >> On Thu, Dec 31, 2015 at 10:40 AM, Aaron <hawaiiaa...@gmail.com> wrote: >> > Hi, I would like to monitor linux apache servers where the apache >> > servers >> > would have gmond running, and the stats would be reported back to the >> > ganglia server running gmetad and ganglia web to be displayed in a >> > graph. >> > Is there a php or python script to do this? Any recommendations? >> > >> > Thanks, Aaron >> > >> > >> > -- >> > >> > ___ >> > Ganglia-general mailing list >> > Ganglia-general@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/ganglia-general >> > > > -- ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Unable to collect Sflow data
sFlow reports on two types of data: 1. periodic export of counters 2. asynchronous export of randomly sampled packets and packet forwarding info Ganglia's data model is well suited to handling counters exported by the Host sFlow agent (http://sflow.net/), but does not provide support for analyzing the packet data. Tools like sFlowTrend or sFlow-RT (http://sflow-rt.com) are specialized tools that can decode packet headers and calculate flow metrics. If you want to convert sFlow packet data into a form that can be fed into time series tools like Ganglia, then you might want to take a look at sFlow-RT. Peter On Thu, Nov 19, 2015 at 2:42 AM, Wenshui Chenwrote: > Hi There, > > A ganalia 3.7.2 has been installed on a CentOS 6.7_64bit box > successfully. The host's cpu, load, memory, disk, etc., usage statistics > are able to be viewed via ganglia-web interface. The problem is sflow > data from a router is not able to collected and displayed by ganglia. > Both with and without the --sflow-enable flag have been tried during > compilation and installation process. Neither one can show sflow > statistics exporting from a router or a switch. My mond.conf file is > listed below. > > IPTables logging function has been enabled to record accepted packets of > udp/6343 of the MLXe4 router which is exporting sflow packets via > UDP/6343 port. The iptables logging function does prove that sflow > packets have pass firewall of the ganglia box. A sFlowTrend also has > been installed on the same box. The sFlowTrend shows sflow statistics > exported from the same Brocade MLXe4 router without problem. Thus, I'm > sure that packets of UDP/6343 is not blocked by firewall. However, no > sflow statistic is recored by ganglia so far. What else can be tried > then? Thanks a lot for your kindly help. > > Best Regards, > > Wenshui Chen > > /* This configuration is as close to 2.5.x default behavior as possible > The values closely match ./gmond/metric.h definitions in 2.5.x */ > globals { >daemonize = yes >setuid = yes >user = nobody >debug_level = 0 >max_udp_msg_len = 1472 >mute = no >deaf = no >allow_extra_data = yes >host_dmax = 86400 /*secs. Expires (removes from web interface) hosts > in 1 day */ >host_tmax = 20 /*secs */ >cleanup_threshold = 300 /*secs */ >gexec = no ># By default gmond will use reverse DNS resolution when displaying > your hostname ># Uncommeting following value will override that value. ># When uncommented "Incorrect format for spoof argument. exitin" shown. ># override_hostname = lab02.twgrid.org ># If you are not using multicast this value should be set to > something other than 0. ># Otherwise if you restart aggregator gmond you will get empty > graphs. 60 seconds is reasonable >send_metadata_interval = 0 /*secs */ > > } > > /* > * The cluster attributes specified will be used as part of the > * tag that will wrap all hosts collected by this instance. > */ > cluster { >name = "lab02-sflow" >owner = "ASGCNet" > # latlong = "unspecified" > # url = "unspecified" > } > > /* The host section describes attributes of the host, like the location */ > host { > location = "unspecified" > } > > /* Feel free to specify as many udp_send_channels as you like. Gmond > used to only support having a single channel */ > udp_send_channel { > #bind_hostname = yes # Highly recommended, soon to be default. > # This option tells gmond to use a source address > # that resolves to the machine's hostname. Without > # this, the metrics may appear to come from any > # interface and the DNS names associated with > # those IPs will be used to create the RRDs. >mcast_join = 239.2.11.71 >host = lab02.twgrid.org >port = 8649 >ttl = 1 > } > > /* You can specify as many udp_recv_channels as you like as well. */ > udp_recv_channel { >mcast_join = 239.2.11.71 >port = 8649 >bind = 239.2.11.71 >retry_bind = true ># Size of the UDP buffer. If you are handling lots of metrics you really ># should bump it up to e.g. 10MB or even higher. ># following setting is 100MB. It was 10485760(10M) ># buffer = 10485760 > } > > /* You can specify as many tcp_accept_channels as you like to share > an xml description of the state of the cluster */ > tcp_accept_channel { >port = 8649 ># If you want to gzip XML output >gzip_output = no > } > > /* Channel to receive sFlow datagrams */ > udp_recv_channel { >port = 6342 > } > > /* Optional sFlow settings */ > sflow { > udp_port = 6342 > accept_vm_metrics = yes > accept_jvm_metrics = yes > multiple_jvm_instances = no > accept_http_metrics = yes > multiple_http_instances = no > accept_memcache_metrics = yes > multiple_memcache_instances = yes > } > > /* Each metrics module that is
Re: [Ganglia-general] GMOND + SFLOWD functionality
Sergey, gmond does not retransmit the sFlow metrics it receives. A single gmond instance is used a central collector for a cluster of machines running Host sFlow agents. gmetad uses a TCP connection to retrieve the cluster stats from the single gmond instance and update the RRDs. Peter On Fri, May 29, 2015 at 10:02 AM, Sergey svin...@apple.com wrote: Hi Vladimir, This is very serious question - is GMOND supposed to retransmit metrics received from the local HSFLOWD agent or it just saves them locally for further retrieving via TCP connection? What is the initial project for this? Thanks! Serfey Vinnik -- ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] HTTPD metrics not sent
Have you enabled http in the sFlow section in the gmond config? http://blog.sflow.com/2011/12/using-ganglia-to-monitor-web-farms.html You should try running sflowtool on the head end gmond system to verify that the data is arriving: http://blog.sflow.com/2011/12/sflowtool.html On Thu, May 28, 2015 at 10:06 AM, Sergey svin...@apple.com wrote: Hi Everybody! I use HSFLOWD agent to collect HTTPD metrics from Apache server vis mod_sflow.so module. I see that GMOND gets HTTPD metrics from HSFLOWD and save them in metadata, but for some reason it doesn’t forward HTTPD metrics by UDP to another GMOND agent. All other metrics are successful transfered. Do you know how to fix it? Thanks! Sergey -- ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Servers with multiple drives
Another alternative would be to develop a simple, portable, minimal dependency, C command line version of gmetric that could be compiled on Windows. Deployment would then involve simply copying the binary executable to your different servers and then building custom metric export scrips in PowerShell etc. If you look at the source code to gmetric.py, there isn't much to it. The code is derived from the older embeddedgmetric project which has a C library that could be updated to work with the latest version of Ganglia and build a command line tool. https://code.google.com/p/embeddedgmetric/wiki/GmetricClib Perhaps someone has already done this? On Fri, Sep 20, 2013 at 5:09 AM, Burton, Steven sbur...@shepherdbe.comwrote: Peter, ** ** Alas, this seems something of a deal breaker. Installing python on all of our servers isn’t really an option. A shame because I like the design and philosophy of ganglia and sflow. I will continue with nagios and NSClient++. ** ** Steve. ** ** *From:* Peter Phaal [mailto:peter.ph...@gmail.com] *Sent:* 04 September 2013 23:38 *To:* Burton, Steven *Cc:* ganglia-general@lists.sourceforge.net *Subject:* Re: [Ganglia-general] Servers with multiple drives ** ** Steve, ** ** The Host sFlow statistics are described on sFlow.org: ** ** http://sflow.org/sflow_host.txt ** ** Most of the physical host statistics are based on Ganglia's libmetrics library and are a superset of the metrics that you would get from a default gmond installation. Libmetrics defines aggregate statistics for each node. For example, Host sFlow's disk statistics represent total reads, writes etc. across all storage devices on the node. The part_max_used metric is the utilization of the most utilized partition. ** ** If you need per device statistics, or any other non-sFlow metrics, you could supplement the Host sFlow base set by using gmetric.py to send additional metrics to the Ganglia gmond collector: ** ** https://github.com/vvuksan/ganglia-misc/blob/master/gmetric-python/gmetric.py ** ** If you have any question that are specific to installing and configuring Host sFlow agents, posting questions on the Host sFlow mailing list will reach the developers: ** ** https://lists.sourceforge.net/lists/listinfo/host-sflow-discuss ** ** Peter ** ** On Wed, Sep 4, 2013 at 12:18 AM, Burton, Steven sbur...@shepherdbe.com wrote: Hi, I'm investigating Ganglia as a replacement to our nagios-based server stats collection system. As most of the server I'll be monitoring run Windows, I've been concentrating on using the host-sflow agent (not Ganglia, I know but I'm guessing there's a lot of experience in this list). I've just installed it on a Windows server 2003 machine with multiple drives (2) but I'm only seeing one set of disk stats. Is this correct or have I messed something up? Steve. Steve Burton Network Manager Shepherd Group Built Environment Frederick House, Fulford Road, York, Y010 4EA (T) 01904 660 391 (F) 01904 610 256 (M) 07801 214 009 (W): www.shepherd-group.com Shepherd Group Built Environment is a member of Shepherd Building Group. Shepherd Building Group Ltd is a company registered in England and Wales; Company Number: 653663. Registered Address: Huntington House, Jockey Lane, Huntington, York, YO32 9XW. The views or opinions present in this e-mail are solely those of the author and do not necessarily represent those of the company. The e-mail and any files transmitted with it are confidential and are intended solely for the individual or entity to which they are addressed. If you have received this e-mail in error, please notify the sender. Whilst every care has been taken to check this outgoing e-mail for viruses it is seen as your responsibility to check and sweep it, and any attachments, for viruses on receipt -- Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general ** ** -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad
Re: [Ganglia-general] Can't use sFlow and Ganglia
Does Virtualbox support libvirt? If so, you can compile the Host sFlow agent to link to the libvirt library to obtain VM statistics. Otherwise, if there is a Virtualbox specific performance library that can be used to retrieve metrics (Host sFlow uses libxenstat for Xen and WMI for Hyper-V) then it shouldn't be too hard to write an adapter. The best place for questions on the Host sFlow agent is the mailing list, https://lists.sourceforge.net/lists/listinfo/host-sflow-discuss On Tue, Mar 12, 2013 at 3:34 AM, Mayap Christine christine.mayapka...@enseeiht.fr wrote: Hello Thanks for this orientation! When using Virtualbox, is there a special configuration to be able to get the VM metrics? Le 06/02/2013 12:33, Nicholas Satterly a écrit : Hi, This is very odd. I don't understand how you could be running version 3.5.0 of gmond without sFlow support enabled. Did you build this gmond yourself and run configure with the --disable-sflow option because it is enabled by default. I suggest you either rebuild gmond and ensure that you compile with sFlow support enabled or download a packaged version of ganglia that has sFlow support. This version for Ubuntu Raring should work... http://packages.ubuntu.com/raring/ganglia-monitor ... or this version for Debian Wheezy... http://packages.debian.org/search?suite=wheezysearchon=nameskeywords=ganglia Regards, Nick On Tue, Feb 5, 2013 at 6:59 PM, Duverne, Cyrille cyrille.duve...@euranova.eu wrote: Hello Nicholas, Thanks a lot for your help. Please find below the outputs of the commands : gmond --version gmond 3.5.0 strings /usr/sbin/gmond | grep -i sflow no output hsflowd -v -bash: /usr/sbin/hsflowd: Permission denied sudo hsflowd -v hsflowd version 1.22.2 Thanks. CyD Imagination is more important than Knowledge Albert Einstein Mardi 05/02/2013 à 17:05 Nicholas Satterly a écrit: Hi Cyrille, Can you run the following commands and copy-paste the output into a reply email? gmond --version strings /usr/sbin/gmond | grep -i sflow hsflowd -v Thanks, Nick On Tue, Feb 5, 2013 at 3:47 PM, Duverne, Cyrille cyrille.duve...@euranova.eu wrote: Hello, Indeed this part was missing, but when I add it and restart ganglia, I get an error saying that module sFlow doesn't exist... I think I'm not running an enough recent version of ganglia, I'm using 3.5.0 Thanks in advance for your help. CyD Mardi 05/02/2013 à 13:21 Nicholas Satterly a écrit: Hi, Not sure if you ever solved your problem but I think you are missing the following config stanza from gmond.conf for the gmond that is receiving the sFlow packets. sflow { accept_vm_metrics = yes } I'm just trying this out myself for the first time and see the VM metrics appear when gmond is run in debug mode but there is no trace of them in the XML output. $ gmond -d 2 ... saving metadata for metric: infsrcprv10.vdisk_capacity host: smc02 ***Allocating value packet for host--(null)-- and metric --infsrcprv10.vdisk_capacity-- ... I would guess that something is going wrong when decoding the sFlow packets because a host of (null) can't possibly work. Has anyone else go this working? Regards, Nick PS. I'm running ganglia agent version 3.5.0 and host sFlow agent version 1.22.2. On Wed, Jan 2, 2013 at 5:45 PM, Duverne, Cyrille cyrille.duve...@euranova.eu wrote: Hello, I have a cluster of 4 machines, running Ubuntu 12.04 x86_64, sFlow and Ganglia, here below the config I've set up : Master instance : /etc/gmond.conf : /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { mcast_join = inferno.local /*mcast_join = 139.2.11.71 DEFAULT VALUE*/ port = 8649 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { /* mcast_join = 239.2.11.71 DEFAULT VALUE*/ port = 8649 /* bind = 239.2.11.71 DEFAULT VALUE*/ family = inet4 } /* channel to receive sFlow */ /* 6343 is the default sFlow port, an explicit sFlow*/ /* configuration section is needed to override default */ udp_recv_channel { port = 6343 } /* You can specify as many tcp_accept_channels as you like to share an xml description of the state of the cluster */ tcp_accept_channel { port = 8649 } Cluster machines : /etc/hsflowd.conf sflow { DNSSD = off polling = 20 sampling = 512 collector { ip = 192.168.0.100 udpport = 6343 } } /etc/gmond.conf : /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { mcast_join = inferno.local port = 8649 ttl = 1 } /* You can specify
[Ganglia-general] InfiniBand monitoring
I wanted to bring attention to the following proposal from Mellanox to define the set of InfiniBand metrics to be exported via sFlow. If you use InfiniBand, this is an opportunity to help identify the important metrics that can ultimately make their way into Ganglia, e.g. GPU metrics: http://blog.sflow.com/2012/10/using-ganglia-to-monitor-gpu-performance.html Comments to the proposal are welcome on the sFlow mailing list: http://groups.google.com/group/sflow InfiniBand is a protocol, used in data centers, high speed trading and super computers. The characteristics of InfiniBand are high throughput, low latency protocol with connection QoS and high availability . The following draft specification defines an sFlow sample of InfiniBand traffic counter structures for reporting information from InfiniBand ports. http://sflow.org/draft_sflow_infiniband.txt Please comment on the draft so we can move to finalize the specification. I would like to thank Peter Phaal for helping me with this contribution. Thanks, Ariel Almog Mellanox Technologies -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia, mod_sflow and Apache response report
Michael, Ganglia doesn't understand the sampled HTTP transactions reported by mod_sflow and there is no response report built into Ganglia. To incorporate response time metrics based on the sFlow data, your would need to piece together a script using the elements described in the Ganglia book. 1. It makes most sense to calculate the response time metric on the gmond (head) node. You will need to install sflowtool on the server to convert the binary sFlow to text so that you can analyze the data using a script: http://blog.sflow.com/2011/12/sflowtool.html 2. Your analysis script needs to have access to the sFlow feed, the easiest way is to use the tcpdump command described on page 161 (actually it looks like there is a typo, the -r - argument to sflowtool is missing): tcpdump -p -s 0 -w - udp 6343 | sflowtool -r - http://blog.sflow.com/2012/01/forwarding-using-sflowtool.html 3. Since you are looking at HTTP data, you might want to use the -H option to get sflowtool to convert the sFlow data into combined logfile format. That way you could use existing log analysis libraries/tools to filter on URL's, mime-types, status codes etc. when computing your metrics. 4. The Perl script on page 167 describes how to calculate average response time from the samples, you would need to modify the sflowtool invocation to include the tcpdump command. Also, the script as written will compute the average response time across the cluster of web servers - you would need to modify the script if you want per-Host statistics. 5. Finally, you would need to use gmetric to send the calculated metrics gmond (using spoofing to ensure that the calculated metrics correspond to the other metrics being directly received from the Host sFlow agents) - see Custom Metrics on page 160. If you don't want to develop a solution from scratch, an alternative would be to use an sFlow analyzer to compute the response time metrics and then feed them into Ganglia - something along the lines: http://blog.sflow.com/2013/02/cluster-performance-metrics.html Peter On Mon, Feb 4, 2013 at 7:52 AM, Michael Durket dur...@highwire.stanford.edu wrote: I'm running ganglia 3.4.0-1 and ganglia web 3.5.4-1. On a web server I'm running the latest version of mod_sflow. I can see the Apache report on gweb just fine, but I'm not sure the Apache response report is working. Is there any documentation (besides the general documentation in the Ganglia book on ganglia and sflow) which might tell me how to the get the Apache response report working with mod_sflow in gweb? -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_jan ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Free Next-Gen Firewall Hardware Offer Buy your Sophos next-gen firewall before the end March 2013 and get the hardware for free! Learn more. http://p.sf.net/sfu/sophos-d2d-feb ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Sflow: custom metric are invisible
On the receiving end, have you configured gmond to listen for gmetric messages? udp_recv_channel { port = 8649 } On the sending end (host-sflow), your gmetric settings must be consistent with the hsflowd settings. The following message on the host-sflow mailing list describes how to read the hsflowd settings and pass them to gmetric.py http://sourceforge.net/mailarchive/message.php?msg_id=29438950 On Wed, Dec 19, 2012 at 1:50 AM, MAYAP KAMGA Christine larissa christine.mayapka...@enseeiht.fr wrote: Hello I'm facing some problems while using sflow. I'm currently using sflow(1.22) on my monitored server and gmond(3.5) on another one. I'm able to have all VM_* metrics and ganglia basic metrics with gmond without problem. To learn more about custom metrics, i have created the script to extract Current_users with gmetric.py. I'm able to execute the script. I'm also able to receive notification about the size of the send data. However, i'm unable to see the Current_user metric on the gmond server among others . Did i miss something? Please, can somebody help and guide on what to do to solve this issue? Thanks in advance! -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial Remotely access PCs and mobile devices and provide instant support Improve your efficiency, and focus on delivering more value-add services Discover what IT Professionals Know. Rescue delivers http://p.sf.net/sfu/logmein_12329d2d ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Question about scaling
Hi Mark, If you want to significantly reduce the amount of UDP traffic going to your head end gmond (cnode340), then you might want to consider using Host sFlow agents to monitor machines in the cluster - sFlow encodes all the core Ganglia metrics (along with additional disk IO, swap, interrupt activity metrics) in a single UDP packet, so you can cut the UDP packets per second (and the load on the head end gmond) by a factor of 30 or more. If you make extensive use of gmond plugins for custom metrics then you would want to stick with gmond on all your nodes. However, if you have a limited number of custom metrics, you can supplement the core metrics exported by sFlow using gmetric. http://blog.sflow.com/2011/07/ganglia-32-released.html As Nick suggested, you should be using the latest version of gmond for the head node. Multi-threading significantly improves scaleability and the newer versions of gmond also include native sFlow support. Regards, Peter On Tue, Oct 23, 2012 at 4:34 PM, Nicholas Satterly nfsatte...@gmail.com wrote: I assume cnode340 is the head node that all ~340 other gmond's send their data to. If so, you could reduce the amount of redundant metadata flying around by increasing send_metadata_interval to 120 seconds or higher. Also, I suspect that if you telnet to port 8649 on your head node it will take a while to respond because it's busy processing incoming UDP metrics. If it takes more than 10 seconds to respond on a regular basis then gmetad will timeout [1]. Try deploying a recently patched version of gmond [2] to the head node which is now multi-threaded and see if that fixes the problem. It starts a separate thread for responding to XML metric requests and should respond immediately while the main thread is still processing metrics. Let us know how you get on. Regards, Nick [1] https://github.com/ganglia/monitor-core/blob/master/gmetad/data_thread.c#L103 [2] https://github.com/ganglia/monitor-core/pull/53 On Tue, Oct 23, 2012 at 7:36 PM, Potter,Mark L mlpot...@mdanderson.org wrote: data_source MDACC 60 cnode340:8649 Everything else is default at this point. http://pastebin.com/UAQYxcX3 is a full copy. From: Nicholas Satterly [nfsatte...@gmail.com] Sent: Tuesday, October 23, 2012 13:33 To: Potter,Mark L Cc: ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Question about scaling Please send thru your gmetad.conf file so we can see how things are configured on the server side. * --Nick. * Be sure to anonymise any sensitive info. On 23 Oct 2012, at 19:21, Potter,Mark L mlpot...@mdanderson.org wrote: I am using what I think to be a fairly standard gmond.conf: globals { daemonize = yes setuid = yes user = nobody debug_level = 0 max_udp_msg_len = 1472 mute = no deaf = no allow_extra_data = yes host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */ host_tmax = 30 /*secs */ cleanup_threshold = 300 /*secs */ gexec = no send_metadata_interval = 30 /*secs */ } cluster { name = MDACC owner = MD Anderson Caner Center latlong = unspecified url = unspecified } host { location = 8,3,1 } udp_send_channel { host = cnode340 port = 8649 } udp_recv_channel { port = 8649 retry_bind = true } tcp_accept_channel { port = 8649 } gmetad is set to check every 60 seconds: data_source MDACC 60 cnode340:8649 Everything works well until around 200 hosts where it appears gmetad starts having issues. I have ~340 hosts to go in to this cluster. Should I be running multiple gmetads for this amount of hosts? With all of them active the web interface reports all of them down and collects no stats at all. I am looking for advice on getting this up and running properly. The ganglia host isn't underpowered at all IMO and has plenty of HDD space: Mem: 32955788 (from free) 16 Cores (AMD Opteron(tm) Processor 6128) Thanks for any assistance. Respectfully, Mark L. Potter Research IS Technology Services UNIX Systems Administrator O: 713-745-2032 C: 713-965-4133 -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Ganglia-general mailing list
Re: [Ganglia-general] sflow metrics not visible
On Fri, Oct 19, 2012 at 1:59 PM, Иван Евдокимов palmal.moz...@gmail.com wrote: I'm trying to use sFlow(jmx-agent 0.6.1)-Ganglia(3.5.0, source build) pair for jvm monitoring. gmond.conf udp_recv_channel { port = 6343 } sflow { accept_vm_metrics = yes } When tcpdump port 6343 is fired, i see SFlowv5-packets arriving to ganglia (Ubuntu x64, VirtualBox). but ... no logs, no errors, no metrics. First of all, is there any chance to see the logs , except -d mode ?? gmond -m display no vm_* specific metrics !!! gmetrics didn't seem to work acceptthe help display - gmetric -g sflow produces Incorrect option value, and the same is for every option. Any clues and ti ? The accept_vm_metrics applies to Xen/KVM etc. virtual machines. For Java, you need to use the accept_jvm_metrics: http://blog.sflow.com/2011/12/using-ganglia-to-monitor-java-virtual.html You didn't mention if you installed Host sFlow agents on your servers. The Host sFlow agent is required, the following article describes how the Host sFlow sub-agents share configuration: http://blog.sflow.com/2012/01/host-sflow-distributed-agent.html Peter -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] GPU performance/health monitoring
Hi All, If you are running a GPU based compute cluster you might be interested in the recently added support for GPU performance/health metrics. http://blog.sflow.com/2012/10/using-ganglia-to-monitor-gpu-performance.html Please try out the new extensions at let us know if there are any issues (you will need to build gmond from the latest sources on github). There are other reasons to use the latest gmond; the addition of multi-threading improves scaleability and reduces the chance of losing metrics. Peter -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Impact of gmond polling on data collection
Nick, I think you probably need two mutexes if you want to avoid blocking the UDP thread unnecessarily. 1. a mutex on the hastable that must be grabbed by the TCP thread when it walks the hash table and the UDP thread would grab it any time it adds or removes an entry from the hash table. 2. a mutex used to control access to individual entries in the hashtable. The TCP thread would grap and release this mutex for each entry as it walks the hash table. The UDP thread would grab this mutex each time it updates an entry. The only situation in which this locking scheme would block the UDP thread for any significant time is when a new host starts sending metrics and a new entry needs to be added to the hash table. This is a rare event and not much of a concern. The TCP thread should never have to wait long to acquire either of the mutexes. Peter On Wed, Sep 19, 2012 at 8:45 AM, Nicholas Satterly nfsatte...@gmail.com wrote: Hi Peter, Thanks for the feedback. I've added a thread mutex to the hosts hash table as you suggested and will send a pull request in the next day or so. Regards, Nick On Mon, Sep 17, 2012 at 8:25 PM, Peter Phaal peter.ph...@gmail.com wrote: Nicholas, It makes sense to multi-thread gmond, but looking at your patch, I don't see any locking associated with the hosts hashtable. Isn't there a possible race if new hosts/metrics are added to the hashtable by the UDP thread at the same time the hashtable is being walked by the TCP thread? Peter On Mon, Sep 17, 2012 at 6:03 AM, Nicholas Satterly nfsatte...@gmail.com wrote: Hi Chris, I've discovered there are two contributing factors to problems like this. 1. the number of metrics being sent (possibly in short bursts) can overflow the UDP receive buffer. 2. the time it takes to process metrics in the UDP receive buffer causes TCP connections from the gmetad's to timeout (currently hard-coded to 10 seconds) In your case, you are probably dropping UDP packets because gmond can't keep up. Gmond was enhanced to allow you to increase the UDP buffer size back in April. I suggest you upgrade to the latest version and set this a sensible value for your environment. udp_recv_channel { port = 1234 buffer = 1024000 } To determine what is sensible is a bit of trial and error. Run netstat -su and keep increasing the value until you no longer see the number of packet receive errors going up. $ netstat -su Udp: 7941393 packets received 23 packets to unknown port received. 0 packet receive errors 10079118 packets sent The other possibility is that it takes so long for a gmetad to pull back all the metrics you are collecting for a cluster that you are preventing the gmond from processing metric data received via UDP. Again this can cause the UDP receive buffer to overflow. The problem we had at my work is related to all of the above but manifested itself in a slightly different way. We were seeing gaps in all our graphs because at times none of the servers in a cluster would respond to gmetad poll within 10 seconds. I used to think that the gmond was completely hung but realised that they would respond normally most of the time but every minute or so it woul take about 20-25 seconds. This happened to coincide with the UDP receive queue growing (Recv-Q column below) and I realised that it took this long for the gmond to process the metric data it had received via UDP from all the other servers in the cluster. $ netstat -ua Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address udp 1920032 0 *:8649 *:* The solution was to modify gmond and move the TCP request handler into to separate thread so that gmond could take as long as it needed to process incoming metric data (from UDP receive buffer that is large enough not to overflow) without blocking on the TCP requests for the XML data. The patched gmond is running without a problem in our environment so I have submitted a pull request[1] for it to be included in trunk. I can't be 100% sure that this patch will fix your problem but it would be worth a try. Regards, Nick [1] https://github.com/ganglia/monitor-core/pull/50 On Sat, Sep 15, 2012 at 12:16 AM, Chris Burroughs chris.burrou...@gmail.com wrote: We use ganglia to monitor 500 hosts in multiple datacenters with about 90k unique host:metric pairs per DC. We use this data for all of the cool graphs in the web UI and for passive alerting. One of our checks is to measure TN of load_one on every box (we want to make sure gmond is working and correctly updating metrics otherwise we could be blind and not know it). We consider it a failure if TN is 600. This is an arbitrary number but 10 minutes seemed plenty long. Unfortunately we are seeing this check fail far
Re: [Ganglia-general] Impact of gmond polling on data collection
Nicholas, It makes sense to multi-thread gmond, but looking at your patch, I don't see any locking associated with the hosts hashtable. Isn't there a possible race if new hosts/metrics are added to the hashtable by the UDP thread at the same time the hashtable is being walked by the TCP thread? Peter On Mon, Sep 17, 2012 at 6:03 AM, Nicholas Satterly nfsatte...@gmail.com wrote: Hi Chris, I've discovered there are two contributing factors to problems like this. 1. the number of metrics being sent (possibly in short bursts) can overflow the UDP receive buffer. 2. the time it takes to process metrics in the UDP receive buffer causes TCP connections from the gmetad's to timeout (currently hard-coded to 10 seconds) In your case, you are probably dropping UDP packets because gmond can't keep up. Gmond was enhanced to allow you to increase the UDP buffer size back in April. I suggest you upgrade to the latest version and set this a sensible value for your environment. udp_recv_channel { port = 1234 buffer = 1024000 } To determine what is sensible is a bit of trial and error. Run netstat -su and keep increasing the value until you no longer see the number of packet receive errors going up. $ netstat -su Udp: 7941393 packets received 23 packets to unknown port received. 0 packet receive errors 10079118 packets sent The other possibility is that it takes so long for a gmetad to pull back all the metrics you are collecting for a cluster that you are preventing the gmond from processing metric data received via UDP. Again this can cause the UDP receive buffer to overflow. The problem we had at my work is related to all of the above but manifested itself in a slightly different way. We were seeing gaps in all our graphs because at times none of the servers in a cluster would respond to gmetad poll within 10 seconds. I used to think that the gmond was completely hung but realised that they would respond normally most of the time but every minute or so it woul take about 20-25 seconds. This happened to coincide with the UDP receive queue growing (Recv-Q column below) and I realised that it took this long for the gmond to process the metric data it had received via UDP from all the other servers in the cluster. $ netstat -ua Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address udp 1920032 0 *:8649 *:* The solution was to modify gmond and move the TCP request handler into to separate thread so that gmond could take as long as it needed to process incoming metric data (from UDP receive buffer that is large enough not to overflow) without blocking on the TCP requests for the XML data. The patched gmond is running without a problem in our environment so I have submitted a pull request[1] for it to be included in trunk. I can't be 100% sure that this patch will fix your problem but it would be worth a try. Regards, Nick [1] https://github.com/ganglia/monitor-core/pull/50 On Sat, Sep 15, 2012 at 12:16 AM, Chris Burroughs chris.burrou...@gmail.com wrote: We use ganglia to monitor 500 hosts in multiple datacenters with about 90k unique host:metric pairs per DC. We use this data for all of the cool graphs in the web UI and for passive alerting. One of our checks is to measure TN of load_one on every box (we want to make sure gmond is working and correctly updating metrics otherwise we could be blind and not know it). We consider it a failure if TN is 600. This is an arbitrary number but 10 minutes seemed plenty long. Unfortunately we are seeing this check fail far too often. We set up two parallel gmetad instances (monitoring identical gmonds) per DC and have broken our problem into two classes: * (A) only one of the gmetad stops updating for an entire cluster, and must be restarted to recover. Since the gmetad's disagree we know the problem is there. [1] * (B) Both gmetad's say an individual host has not reported (gmond aggregation or sending must be at fault). This issue is usually transient (that is it recovers after some period of time greater than 10 minutes). While attempting to reproduce (A) we ran several additional gmetad instances (again polling the same gmonds) around 2012-12-07. Failures per day are below [2]. The act of testing seems to have significantly increased the number of failures. This lead us to consider if the act of polling a gmond aggregator could impact the ability for it to concurrently collect metrics. We looked at the code but are not experienced with concurrent programming in C. Could someone with more familiarity with the gmond code comment as to if this is likely to be a worthwhile avenue of investigation? We are also looking to for suggestion for an empirical test to rule this out. (Of course, other comments on the root TN goes up, metrics stop updating sporadic problem are also welcome!)
Re: [Ganglia-general] Java/JMX plugin for Ganglia 3.1.x
Martin, If you can upgrade to the latest Ganglia release you could use sFlow to monitor your Tomcat servers, the jxm-sflow-agent exports standard JVM metrics, or the tomcat-sflow-valve can export the JVM metrics as well as HTTP counters and transactions. http://host-sflow.sourceforge.net/relatedlinks.php Cheers, Peter On Thu, Sep 13, 2012 at 5:43 AM, Martin Knoblauch kn...@knobisoft.de wrote: Hi, as part of a larger tomcat deployment I need to monitor several tomcat instances and want to add the measured data to a Ganglia setup. I already found JMXtrans which seems a cool solution, but it uses host spoofing and I am not sure it is what I really want. Needs some real investigating. What I would love would to have would be a Gmond plugin that just can add the measured metric to the system metrics. Has anybody already done such a plugin or is working on it? I could provide testing, feedback and maybe help. Cheers Martin -- Martin Knoblauch email: k n o b i AT knobisoft DOT de www: http://www.knobisoft.de -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Got visibility? Most devs has no idea what their production app looks like. Find out how fast your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? http://info.appdynamics.com/FreeJavaPerformanceDownload.html ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Easy Question: Ganglia + sFlow/NetFlow
Douglas, The sFlow standard includes a mechanism for periodically exporting counters. It is these periodic counter exports that Ganglia is processing - there is no equivalent mechanism in NetFlow. In addition, sFlow standardizes export of counters from servers and applications - it is these counters that Ganglia currently supports. The following articles give examples: http://blog.sflow.com/search/label/Ganglia Ganglia doesn't understand flow data (neither sFlow's packet/transaction samples nor NetFlow records). Ganglia's strength is in monitoring clusters of servers - for network traffic analysis you would be better off using tools like ntop, pmacct etc. and possibly importing traffic summaries (such as total web traffic) into Ganglia using gmetric or through a module. The Host sFlow web site is the place to look for server and application sFlow agents: http://host-sflow.sourceforge.net/ The Host sFlow agent exports core server metrics and related projects (listed on the Host sFlow web site) instrument Apache, Java etc. http://blog.sflow.com/2012/01/host-sflow-distributed-agent.html -Peter On Wed, Jul 25, 2012 at 9:09 AM, Douglas Wagner dougla...@gmail.com wrote: Excuse the idiocy behind this post as we're just starting to look into a lot of this. I understand Ganglia is now capable of following sFlow packets being sent around a network, it's also my understanding that there is a difference between sFlow and NetFlow (netflow being potentially a Cisco thing?). So, a couple, hopefuly easy, questions. Is there a significant difference, from a Ganglia perspective, between NetFlow and sFlow packets? Does Ganglia support NetFlow as well as sFlow (they could be technically the same or different as night and day for all I know). On the Ganglia web page it's talking about sFlow packets being accepted from sources such as Apache and JMX, is there any documentation anyone out there can point me at to allowing apps such as these (these two specifically) to report statistics via sFlow? Thanks in advance for any help you might be able to give. --Douglas Wagner -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] gmond 3.4.0 and dell switches
The sFlow standard defines a wide range of metrics from switches, servers and applications. Each device only exports the metrics that are relevant to its normal operation, so switches will report network metrics, servers will report cpu, memory, disk statistics and applications will report response times, URLs etc. http://blog.sflow.com/2010/08/sflow-host-structures.html The Dell switch is exporting sFlow metrics relating to its operation as a switch. Since it isn't a server, it won't export the server metrics that gmond is looking for. Ganglia is designed to monitor clusters of servers and it expects to receive a core set of server metrics from each member of the cluster and will ignore sFlow metrics that don't relate to that function. There are a number of other sFlow analysis tools listed on sFlow.org that are focused on sFlow switch metrics: http://sflow.org/products/collectors.php The following article describes some things to consider when evaluating sFlow analyzers for monitoring switches: http://blog.sflow.com/2009/05/choosing-sflow-analyzer.html Peter On Fri, Jul 20, 2012 at 7:46 AM, Andreas Pflug pgad...@pse-consulting.de wrote: I've configured some Dell switches (e.g. 6224, with recent 3.3.3.3 firmware) to emit SFLOW packets, and I see them happily arriving at my gmond machine, but the switches aren't recognized. Digging into the sources, I found that the switch under investigation never sends blocks tagged as SFLOW_COUNTERBLOCK_HOST,_HID only type 0 and 1. Consequently, all packets are dropped. Is this a Dell problem of incompletely implemented SFLOW, or is it a gmond problem? Regards Andreas -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] gmond 3.4.0 and dell switches
I agree, the performance of the network fabric is a critical component of cluster performance and it would be great to figure out how to best include the data in Ganglia. A possible starting point would be to define SWITCH elements in the XML structure exported by gmond. A switch would contain multiple INTERFACE objects each of which contain standard SNMP MIB-II metrics (ifInOctets, ifOutOctets, ifInErrors, ifOutErrors, ifInDiscards, ifOutDiscards etc). The problem is that this wouldn't be backward compatible with tools accessing the XML interface. Another option would be to have the network data appear as a separate XML document, accessed on a different TCP port. The next challenge would be to figure out how to include this type of information in the Ganglia UI - rolled up errors and discards for the fabric would be a natural fit for the top level view, but to drill down, Ganglia would need to deal with the concept of multiple resource pools in the cluster (networking and computation). Extending the notion further, a storage resource pool might also be interesting. For virtual server pools, pooling the VMs and the hypervisors would also be useful. Peter On Fri, Jul 20, 2012 at 9:31 AM, Andreas Pflug pgad...@pse-consulting.de wrote: Well, for examining the overall health of a cluster the network fabric appears equally important to me... There seems no OS software for this combined? Regards Andreas Am 20.07.12 17:50, schrieb Peter Phaal: The sFlow standard defines a wide range of metrics from switches, servers and applications. Each device only exports the metrics that are relevant to its normal operation, so switches will report network metrics, servers will report cpu, memory, disk statistics and applications will report response times, URLs etc. http://blog.sflow.com/2010/08/sflow-host-structures.html The Dell switch is exporting sFlow metrics relating to its operation as a switch. Since it isn't a server, it won't export the server metrics that gmond is looking for. Ganglia is designed to monitor clusters of servers and it expects to receive a core set of server metrics from each member of the cluster and will ignore sFlow metrics that don't relate to that function. There are a number of other sFlow analysis tools listed on sFlow.org that are focused on sFlow switch metrics: http://sflow.org/products/collectors.php The following article describes some things to consider when evaluating sFlow analyzers for monitoring switches: http://blog.sflow.com/2009/05/choosing-sflow-analyzer.html Peter On Fri, Jul 20, 2012 at 7:46 AM, Andreas Pflug pgad...@pse-consulting.de wrote: I've configured some Dell switches (e.g. 6224, with recent 3.3.3.3 firmware) to emit SFLOW packets, and I see them happily arriving at my gmond machine, but the switches aren't recognized. Digging into the sources, I found that the switch under investigation never sends blocks tagged as SFLOW_COUNTERBLOCK_HOST,_HID only type 0 and 1. Consequently, all packets are dropped. Is this a Dell problem of incompletely implemented SFLOW, or is it a gmond problem? Regards Andreas -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] Fwd: gmond 3.4.0 and dell switches
Ganglia is nicely extensible if you add extra metrics to the core set, but doesn't work well when you are missing most of the core metrics. The Ganglia UI expects the core set of host metrics to be available, if they aren't then you end up with lots of broken links and missing charts. Unfortunately, there isn't a whole lot of overlap between typical switch metrics and the Ganglia host metrics and so while it would be easy enough to treat each switch as a host in gmond and add the port statistics as you suggest, it breaks much of the downstream code that depends on the missing metrics. On Fri, Jul 20, 2012 at 3:57 PM, Vladimir Vuksan vli...@veus.hr wrote: I am not in favor of changing the gmond XML. I would recommend simply making switches hosts and emitting interface data as metrics grouped by metric groups e.g. port-1-inoctets port-1-outoctets etc. Beyond that I would like to get stuff like switch CPU utilization. Is this doable ? Vladimir On Fri, 20 Jul 2012, Peter Phaal wrote: I agree, the performance of the network fabric is a critical component of cluster performance and it would be great to figure out how to best include the data in Ganglia. A possible starting point would be to define SWITCH elements in the XML structure exported by gmond. A switch would contain multiple INTERFACE objects each of which contain standard SNMP MIB-II metrics (ifInOctets, ifOutOctets, ifInErrors, ifOutErrors, ifInDiscards, ifOutDiscards etc). The problem is that this wouldn't be backward compatible with tools accessing the XML interface. Another option would be to have the network data appear as a separate XML document, accessed on a different TCP port. The next challenge would be to figure out how to include this type of information in the Ganglia UI - rolled up errors and discards for the fabric would be a natural fit for the top level view, but to drill down, Ganglia would need to deal with the concept of multiple resource pools in the cluster (networking and computation). Extending the notion further, a storage resource pool might also be interesting. For virtual server pools, pooling the VMs and the hypervisors would also be useful. Peter On Fri, Jul 20, 2012 at 9:31 AM, Andreas Pflug pgad...@pse-consulting.de wrote: Well, for examining the overall health of a cluster the network fabric appears equally important to me... There seems no OS software for this combined? Regards Andreas Am 20.07.12 17:50, schrieb Peter Phaal: The sFlow standard defines a wide range of metrics from switches, servers and applications. Each device only exports the metrics that are relevant to its normal operation, so switches will report network metrics, servers will report cpu, memory, disk statistics and applications will report response times, URLs etc. http://blog.sflow.com/2010/08/sflow-host-structures.html The Dell switch is exporting sFlow metrics relating to its operation as a switch. Since it isn't a server, it won't export the server metrics that gmond is looking for. Ganglia is designed to monitor clusters of servers and it expects to receive a core set of server metrics from each member of the cluster and will ignore sFlow metrics that don't relate to that function. There are a number of other sFlow analysis tools listed on sFlow.org that are focused on sFlow switch metrics: http://sflow.org/products/collectors.php The following article describes some things to consider when evaluating sFlow analyzers for monitoring switches: http://blog.sflow.com/2009/05/choosing-sflow-analyzer.html Peter On Fri, Jul 20, 2012 at 7:46 AM, Andreas Pflug pgad...@pse-consulting.de wrote: I've configured some Dell switches (e.g. 6224, with recent 3.3.3.3 firmware) to emit SFLOW packets, and I see them happily arriving at my gmond machine, but the switches aren't recognized. Digging into the sources, I found that the switch under investigation never sends blocks tagged as SFLOW_COUNTERBLOCK_HOST,_HID only type 0 and 1. Consequently, all packets are dropped. Is this a Dell problem of incompletely implemented SFLOW, or is it a gmond problem? Regards Andreas -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how
Re: [Ganglia-general] Gmond Compilation on Cygwin
Hi Robert, sFlow is a very simple protocol - an sFlow agent periodically sends XDR encoded structures over UDP. Each structure has a tag and a length, making the protocol extensible. In the short term, it would make sense is to define an sFlow structure to carry the current NVML metrics and tag it using NVIDIA's IANA assigned vendor number (5703). Something along the lines: /* NVML statistics */ /* opaque = counter_data; enterprise = 5703, format=1 */ struct nvml_gpu_counters { unsigned int device_count; unsigned int mem_total; unsigned int mem_util; ... } Additional examples are in the sFlow Host Structures specification (http://www.sflow.org/sflow_host.txt), these are the structures currently being exported by the Host sFlow agent. Extending the Windows Host sFlow agent to export these metrics would involve adding a routine to populate and serialize this structure - pretty straightforward - if you look at the Host sFlow agent source code you will see examples of how the existing structures are handled. For Ganglia to support the new counters, we would need to add a decoder to gmond for the new structure - also straightforward. Are per device metrics important, or can we roll up the metrics across all the GPUs on a server? With sFlow we generally roll up metrics for each node where possible - the goal is to provide enough detail so that the operations team can tell whether a node is healthy or not, but not so much as to overwhelm the monitoring system and limit scaleability. Once a problem is detected, detailed metrics for troubleshooting and diagnostics can be performed using point tools on the host. The metrics currently exposed by NVML API could be improved - everything appears to be a 1 second gauge. A more robust model for metrics is to maintain monotonic counters so that they can be polled at different frequencies and still produce meaningful results. Counters are also more robust when sending metrics over an unreliable transport like UDP. The receiver calculates the delta's and can easily compensate for lost packets. Longer term it would be useful to have a discussion to see what metrics best characterize operational performance and are feasible to implement. Counters such as number of threads started, number of busy ticks, number of idle ticks etc. are the type of measurement you want to calculate utilizations. Some kind of load average based on the thread run queue would also be interesting. My calendar is pretty open next week - I am based in San Francisco, so 8am-5pm PST works best. Peter On Thu, Jul 12, 2012 at 11:58 AM, Robert Alexander ralexan...@nvidia.com wrote: Hey, A meeting may be a good idea. My schedule is mostly open next week. When are others free? I will brush up on sflow by then. NVML and the Python metric module are tested at NVIDIA on Windows and Linux, but not within Cygwin. The process will be easier/faster on the NVML side if we keep Cygwin out of the loop. -Robert -Original Message- From: Bernard Li [mailto:bern...@vanhpc.org] Sent: Thursday, July 12, 2012 10:49 AM To: Nigel LEACH Cc: lozgachev.i...@gmail.com; ganglia-general@lists.sourceforge.net; Peter Phaal; Robert Alexander Subject: Re: [Ganglia-general] Gmond Compilation on Cygwin Hi Nigel: Technically you only need 3.1 gmond to have support for the Python metric module. But I'm not sure whether we have ever tested this under Windows. Peter and Robert: How quickly can we get hsflowd to support GPU metrics collection internally? Should we setup a meeting to discuss this? Thanks, Bernard On Thu, Jul 12, 2012 at 4:05 AM, Nigel LEACH nigel.le...@uk.bnpparibas.com wrote: Thanks Ivan, but we have 3.0 and 3.1 gmond running under Cygwin (and using APR), the problem is with the 3.4 spin. -Original Message- From: lozgachev.i...@gmail.com [mailto:lozgachev.i...@gmail.com] Sent: 12 July 2012 11:54 To: Nigel LEACH Cc: peter.ph...@gmail.com; ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Gmond Compilation on Cygwin Hi all, Maybe it will be interesting. Some time ago I successfully compiled gmond 3.0.7 and 3.1.2 under Cygwin. If you need it I can upload somewhere gmond and 3rd party sources + compilation script. Also, I have gmetad 3.0.7 compiled for Windows. In additional, I developed (just for fun) my implementation of gmetad 3.1.2 using .NET and C#. P. S. I do not know whether it is possible to use these gmong versions to collect statistic from GPU. -- Best regards, Ivan. 2012/7/12 Nigel LEACH nigel.le...@uk.bnpparibas.com: Thanks for the updates Peter and Bernard. I have been unable to get gmond 3.4 working under Cygwin, my latest errors are parsing gm_protocol_xdr.c. I don't know whether we should follow this up, it would be nice to have a Windows gmond, but my only reason for upgrading are the GPU metrics. I take you point about re-using the existing GPU module and gmetric
Re: [Ganglia-general] Gmond Compilation on Cygwin
Nigel, A simple option would be to use Host sFlow agents to export the core metrics from your Windows servers and use gmetric to send add the GPU metrics. You could combine code from the python GPU module and gmetric implementations to produce a self contained script for exporting GPU metrics: https://github.com/ganglia/gmond_python_modules/tree/master/gpu/nvidia https://github.com/ganglia/ganglia_contrib Longer term, it would make sense to extend Host sFlow to use the C-based NVML API to extract and export metrics. This would be straightforward - the Host sFlow agent uses native C APIs on the platforms it supports to extract metrics. What would take some thought is developing standard set of summary metrics to characterize GPU performance. Once the set of metrics is agreed on, then adding them to the sFlow agent is pretty trivial. Currently the Ganglia python module exports the following metrics - are they the right set? Anything missing? It would be great to get involvement from the broader Ganglia community to capture best practice from anyone running large GPU clusters, as well as getting input from NVIDIA about the key metrics. * gpu_num * gpu_driver * gpu_type * gpu_uuid * gpu_pci_id * gpu_mem_total * gpu_graphics_speed * gpu_sm_speed * gpu_mem_speed * gpu_max_graphics_speed * gpu_max_sm_speed * gpu_max_mem_speed * gpu_temp * gpu_util * gpu_mem_util * gpu_mem_used * gpu_fan * gpu_power_usage * gpu_perf_state * gpu_ecc_mode As far as scalability is concerned, you should find that moving to sFlow as the measurement transport reduces network traffic since all the metrics for a node are transported in a single UDP datagram (rather than a datagram per metric when using gmond as the agent). The other consideration is that sFlow is unicast, so if you are using a multicast Ganglia setup then this involves re-structuring your a configuration. You still need to have at least one gmond instance, but it acts as an sFlow aggregator and is mute: http://blog.sflow.com/2011/07/ganglia-32-released.html Peter On Tue, Jul 10, 2012 at 8:36 AM, Nigel LEACH nigel.le...@uk.bnpparibas.com wrote: Hello Bernard, I was coming to that conclusion, I’ve been trying to compile on various combinations of Cygwin, Windows, Hardware this afternoon, but without success yet. I’ve still got a few more tests to do though. The GPU plugin is my only reason for upgrading from our current 3.1.7, and there is nothing else esoteric we use. We do have Linux Blades, but all of our Tesla’s are hosted on Windows. The entire estate is quite large, so we would need to ensure sFlow scales, no reason to think it won’t, but I have little experience with it.. Regards Nigel From: bern...@vanhpc.org [mailto:bern...@vanhpc.org] Sent: 10 July 2012 16:19 To: Nigel LEACH Cc: neil.mckee...@gmail.com; ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Gmond Compilation on Cygwin Hi Nigel: Perhaps other developers could chime in but I'm not sure if the latest version could be compiled under Windows, at least I was not aware of any testing done. Going forward I would like to encourage users to use hsflowd under Windows. I'm talking to the developers to see if we can add support for GPU monitoring. Do you have any other requirements besides that? Thanks, Bernard On Tuesday, July 10, 2012, Nigel LEACH wrote: Hi Neil, Many thanks for the swift reply. I want to take a look at sFlow, but it isn’t a prerequisite. Anyway, I disabled sFlow, and (separately) included the patch you sent. Both fixes appeared successful. For now I am going with your patch, and sFlow enabled. I say “appeared successful”, as make was error free, and a gmond.exe was created. However, it doesn’t appear to work out of the box. I created a default gmond.conf ./gmond --default_config /usr/local/etc/gmond.conf and then simply ran gmond. It started a process, but no port (8649) was created. Running in debug mode I get this $ ./gmond -d 10 loaded module: core_metrics loaded module: cpu_module loaded module: disk_module loaded module: load_module loaded module: mem_module loaded module: net_module loaded module: proc_module loaded module: sys_module and nothing further. I have done little investigation yet, so unless there is anything obvious I am missing, I’ll continue to troubleshoot. Regards Nigel From: neil.mckee...@gmail.com [mailto:neil.mckee...@gmail.com] Sent: 09 July 2012 18:15 To: Nigel LEACH Cc: ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Gmond Compilation on Cygwin You could try adding --disable-sflow as another configure option. (Or were you planning to use sFlow agents such as hsflowd?). Neil On Jul 9, 2012, at 3:50 AM, Nigel LEACH wrote: Ganglia 3.4.0 Windows 2008 R2 Enterprise Cygwin 1.5.25 IBM iDataPlex dx360 with Tesla M2070 Confuse 2.7 I’m trying to use the Ganglia
[Ganglia-general] Using Ganglia/sFlow to monitor Hadoop
Hi All, I have been experimenting with setting up Ganglia with sFlow agents to monitor Hadoop. The configuration is described in the following article: http://blog.sflow.com/2012/04/hadoop.html The Ganglia 3.3 release is required to report on the sFlow java metrics. Peter -- For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia for Windows
The sFlow data from the Windows server looks fine. Are you using gmond to monitor the BSD systems? Is the Windows server the only one you are monitoring with sFlow? Are you sure that you are running a new version of gmond (version 3.2 or greater) on the collector machine? Any older versions of gmond will discard the sFlow counters. On Wed, Apr 11, 2012 at 1:42 AM, Burton, Steven sbur...@shepherd-construction.co.uk wrote: Hi I had to specify the interface to tcpdump. pc28040664# tcpdump -i fxp0 -p -s 0 -w - udp port 6343 | sflowtool tcpdump: listening on fxp0, link-type EN10MB (Ethernet), capture size 65535 bytes startDatagram = datagramSourceIP 172.17.6.45 datagramSize 412 unixSecondsUTC 1334132034 datagramVersion 5 agentSubId 0 agent 172.17.6.45 packetSequenceNo 16394 sysUpTime 492115000 samplesInPacket 1 startSample -- sampleType_tag 0:2 sampleType COUNTERSSAMPLE sampleSequenceNo 16394 sourceId 2:1 counterBlock_tag 0:2001 adaptor_0_ifIndex 2 adaptor_0_MACs 1 adaptor_0_MAC_0 6eb07a70a528 counterBlock_tag 0:2005 disk_total 77301145600 disk_free 70244171776 disk_partition_max_used 912 disk_reads 144893 disk_bytes_read 3117016064 disk_read_time 121649704 disk_writes 1341632 disk_bytes_written 18116973056 disk_write_time 883169784 counterBlock_tag 0:2004 mem_total 2142728192 mem_free 1703526400 mem_shared 18446744073709551615 mem_buffers 18446744073709551615 mem_cached 77516800 swap_total 4139274240 swap_free 3825762304 page_in 4294967295 page_out 4294967295 swap_in 833643 swap_out 1040520 counterBlock_tag 0:2003 cpu_load_one 4.490 cpu_load_five 3.988 cpu_load_fifteen 3.963 cpu_proc_run 1 cpu_proc_total 483 cpu_num 1 cpu_speed 2533 cpu_uptime 492170 cpu_user 1058041428 cpu_nice 4294967295 cpu_system 3166627908 cpu_idle 4120087790 cpu_wio 4294967295 cpu_intr 927812500 cpu_sintr 4294967295 cpu_interrupts 64740155 cpu_contexts 197872436 counterBlock_tag 0:2006 nio_bytes_in 485523964 nio_pkts_in 6939345 nio_errs_in 0 nio_drops_in 0 nio_bytes_out 33214290 nio_pkts_out 151485 nio_errs_out 0 nio_drops_out 0 counterBlock_tag 0:2000 hostname SCL-RSA3 UUID f5ead14482ace308030c78da7ace816d machine_type 2 os_name 3 os_release 5.2.3790 Service Pack 2 endSample -- endDatagram = ^C1 packets captured 2477 packets received by filter 0 packets dropped by kernel pc28040664# That's the host that returns the 'No matching metrics detected' legend in the rrdtool graphs. Currently I'm showing 3 hosts up (which is correct) localhost and another FreeBSD server are showing metrics but not the Windows server. Is the problem with my listeners? I ask as I've not configured multicast before and have only theoretical (and non-recent) knowledge of it. /* Feel free to specify as many udp_send_channels as you like. Gmond used to only support having a single channel */ udp_send_channel { mcast_join = 239.2.11.71 port = 8649 ttl = 1 } /* You can specify as many udp_recv_channels as you like as well. */ udp_recv_channel { mcast_join = 239.2.11.71 port = 8649 bind = 239.2.11.71 } udp_recv_channel { port = 8649 } udp_recv_channel { port = 6343 } /* You can specify as many tcp_accept_channels as you like to share an xml description of the state of the cluster */ tcp_accept_channel { port = 8649 } Steve. S Burton BSc(Hons) MIEE MBCS MIEEE Network Manager Shepherd Construction Ltd Head Office Frederick House, Fulford Road, York, YO10 4EA Tel: 01904 660391 Fax: 01904 660577 Web: www.shepherd-construction.co.uk Registered in England and Wales Company Number: 201860 Registered address: Huntington House, Jockey Lane, Huntington, York YO32 9XW The views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of the company. This email and any files transmitted with it are confidential and are intended solely for the individual or entity to which they are addressed. If you have received this e-mail in error, please notify sclc...@shepherd-construction.co.uk quoting the name of the sender. Whilst every care has been taken to check this outgoing e-mail for viruses it is seen as your responsibility to check and sweep it, and any attachments, for viruses on receipt. -Original Message- From: Peter Phaal [mailto:peter.ph...@gmail.com] Sent: 05 April 2012 17:42 To: Burton, Steven Cc: Bernard Li; Ganglia Subject: Re: [Ganglia-general] Ganglia for Windows Can you verify that you are receiving performance metrics using the following command on your gmond server? tcpdump -p -s 0 -w - udp port 6343 | sflowtool The firewall on your windows server, every firewall in the path to the bsd collector, and the firewall on the bsd collector itself must be configured to allow UDP port 6343 traffic to pass. The above
Re: [Ganglia-general] Ganglia for Windows
Can you verify that you are receiving performance metrics using the following command on your gmond server? tcpdump -p -s 0 -w - udp port 6343 | sflowtool The firewall on your windows server, every firewall in the path to the bsd collector, and the firewall on the bsd collector itself must be configured to allow UDP port 6343 traffic to pass. The above command will let you verify that the data is at least making it to your server. Remember that tcpdump catches packets before an local firewall rules are applied, so you still need to check your local configuration even if the command shows that sFlow metrics are being received. You can download sflowtool from the following URL, you need a recent version to be able to decode all the host performance metrics: http://www.inmon.com/technology/sflowTools.php On Thu, Apr 5, 2012 at 12:32 AM, Burton, Steven sbur...@shepherd-construction.co.uk wrote: Bernard, Yes. I have: udp_recv_channel { port = 6343 } I seem to have values in the rrd's but No matching metrics detected in the graphs. Steve. S Burton BSc(Hons) MIEE MBCS MIEEE Network Manager Shepherd Construction Ltd Head Office Frederick House, Fulford Road, York, YO10 4EA Tel: 01904 660391 Fax: 01904 660577 Web: www.shepherd-construction.co.uk Registered in England and Wales Company Number: 201860 Registered address: Huntington House, Jockey Lane, Huntington, York YO32 9XW The views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of the company. This email and any files transmitted with it are confidential and are intended solely for the individual or entity to which they are addressed. If you have received this e-mail in error, please notify sclc...@shepherd-construction.co.uk quoting the name of the sender. Whilst every care has been taken to check this outgoing e-mail for viruses it is seen as your responsibility to check and sweep it, and any attachments, for viruses on receipt. -Original Message- From: Bernard Li [mailto:bern...@vanhpc.org] Sent: 03 April 2012 01:05 To: Burton, Steven Cc: Ganglia Subject: Re: [Ganglia-general] Ganglia for Windows Hi Steve: Have you enabled sFlow on the Linux/FreeBSD gmond.conf? They are not on by default. Cheers, Bernard On Mon, Apr 2, 2012 at 7:01 AM, Burton, Steven sbur...@shepherd-construction.co.uk wrote: I found that xms was loading but I also needed php5-simplexml. I now have graphs for the server I'm running Ganglia on but only empty graphs for the Windows server I'm trialling softflow on. Every graph has the legend No matching metrics detected. The number of entries in the rrds for this server seems to be increasing as measured by: rrdtool dump pkts_in.rrd | grep 'v' | grep -v NaN | wc -l Steve. S Burton BSc(Hons) MIEE MBCS MIEEE Network Manager Shepherd Construction Ltd Head Office Frederick House, Fulford Road, York, YO10 4EA Tel: 01904 660391 Fax: 01904 660577 Web: www.shepherd-construction.co.uk Registered in England and Wales Company Number: 201860 Registered address: Huntington House, Jockey Lane, Huntington, York YO32 9XW The views or opinions presented in this e-mail are solely those of the author and do not necessarily represent those of the company. This email and any files transmitted with it are confidential and are intended solely for the individual or entity to which they are addressed. If you have received this e-mail in error, please notify sclc...@shepherd-construction.co.uk quoting the name of the sender. Whilst every care has been taken to check this outgoing e-mail for viruses it is seen as your responsibility to check and sweep it, and any attachments, for viruses on receipt. -Original Message- From: Burton, Steven [mailto:sbur...@shepherd-construction.co.uk] Sent: 02 April 2012 09:10 To: Alex Dean; Ganglia Subject: Re: [Ganglia-general] Ganglia for Windows Hi, I've installed php5-xml which lead to another set of errors which suggested php5-session was needed, so I installed that. I have a web front end now but empty graphs. I'm pretty sure I have data in the rrds as I dumped a random selection to xml and there were a significant number of values which were NOT NaN. It may be that this isn't the way forward for me as I can get more metrics with nagios + plugins + nagiosgraph though I only have a 5 minute granularity with nagios, at best. Conversely, I might have to switch to some Linux distribution though FreeBSD has served me well since 1996 and I'm more comfortable administering it. Steve. S Burton BSc(Hons) MIEE MBCS MIEEE Network Manager Shepherd Construction Ltd Head Office Frederick House, Fulford Road, York, YO10 4EA Tel: 01904 660391 Fax: 01904 660577 Web: www.shepherd-construction.co.uk Registered in England and Wales Company Number: 201860 Registered address: Huntington House, Jockey Lane, Huntington, York YO32 9XW The
Re: [Ganglia-general] Problem runnning gstat gmetric gmond
On Mon, Feb 20, 2012 at 9:06 AM, Mohit Dhingra mohitdhing...@gmail.com wrote: Hi Vladimir / All, Everything is working fine now (gmond and gmetad), I have installed ganglia on Dom0 OpenSUSE, with Xen as hypervisor. Now, I want to monitor VMs with the help of sflow, as you told earlier. I have checked your links. http://blog.sflow.com/2012/01/using-ganglia-to-monitor-virtual.html http://blog.sflow.com/2011/09/xenserver-60-supplemental-pack.html I have some doubts regarding installation. I have installed Xen as hypervisor on OpenSUSE( dom0 ). I am not sure it is this XenServer that you talk about? cadlab:~/Downloads/hsflowd-1.19 # uname -a Linux cadlab 2.6.37.6-0.11-xen #1 SMP 2011-12-19 23:39:38 +0100 x86_64 x86_64 x86_64 GNU/Linux Is sflow available for this? I downloaded the source code package. It says, INSTALL.Linux and INSTALL.XenServer, where it talks about DDK, but there is no DDK for my Xen. Should I install as what is mentioned in INSTALL.Linux? Will it monitor VMs? Can somebody please help me out with this. If you have development tools installed on your OpenSUSE Dom0, you can build and install from sources: http://blog.sflow.com/2010/10/installing-host-sflow-on-linux-server.html hsflowd uses libxenstat to monitor the performance of each of the virtual machines from Dom0. To build software for XenServer (and Xen Cloud Platform) you can download a special virtual machine (the DDK) which exactly matches the kernel in Dom0, but includes all the development tools needed to compile software. You build the RPMs in the DDK and install them in Dom0. This process keeps Dom0 as small as possible. -- Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Where does sFlow fit into the ganglia / java ecosystem?
Bryan, Since you want each of the nodes in the cluster to have access to the state its peers, implementing a full gmond equivalent peer sounds like the right call. However, I think that you might want to consider adding sFlow export functionality as well. It's helpful to have a clear understanding of the goals and architectural choices in sFlow. The sFlow architecture is asymmetric with agents sending but never receiving data. Once you have made that choice, you can further simplify the agent by making it stateless - for example, you will see that sFlow exports raw counters and leaves it up to the receiver to compute deltas. With gmond the deltas are computed at the sender, requiring it to maintain state (which gmond is doing anyway when it receives metrics, so it isn't an unreasonable choice). Removing all state from the agent means that its memory requirements are minimal and it doesn't need to allocate memory - both properties are very useful when you want to embed the measurements in hardware devices like network switches. As you point out, another difference is that sFlow exports standard sets of metrics rather than ad-hoc measurements. The benefit is that you can focus on optimizing the the collection of the standard metrics (even implementing some in hardware), tightly pack the data in a single datagram, eliminate the overhead of exchanging metadata and simplify multivendor monitoring since the same measurements will be sent by every device. Standardizing the metrics also helps reduce operational complexity - eliminating the configuration options that are needed for a more flexible solution. A goal with sFlow is to instrument every switch port, server, virtual machine and service to provide a comprehensive view of performance across the data center. I think there would be great value in having bigdata export metrics so that they can be combined with data from network, load-balancer, web, memcache and application server tiers. It's also worth mentioning that sFlow doesn't just export counters. As an example, the sFlow Memcache metrics are probably most similar to the kinds of data you might want to export for bigdata. In addition to exporting a standard set of counters, the sFlow agent also randomly samples Memcache operations, exporting the command (GET,SET..), status (OK,ERROR,NOT_FOUND...), value size, and duration of the sampled operation. Random sampling is very lower overhead (about the cost of maintaining one counter) making it suitable for continuous monitoring of high transaction rate environments like a large Memcached cluster. The counters and the transaction samples complement one another. For example, you might be using Ganglia to track the cache hit rate using the sFlow counters and notice an increase in cache misses. Looking at the transaction samples you can identify the cluster-wide top missed keys - the information you need to actually fix the problem. In one case I am aware of, the misses were caused by a typo in a client side script and easily fixed - it's hard to see how you would easily spot this problem any other way. In the web tier, sFlow agents sample HTTP operations and you might notice an increase in response time for a particular URL and trace it back to the missed key in the cache for example. Getting back to bigdata - you could useful export the JVM metrics using sFlow - take a look at the jmx-sflow agent, or tomcat-sflow-valve for examples: http://jmx-sflow-agent.googlecode.com/ http://tomcat-sflow-valve.googlecode.com/ There isn't much to the code, so you could easily incorporate it as an option in your java library. There is currently an effort underway to generalize sFlow's application layer monitoring: https://groups.google.com/forum/?fromgroups#!topic/sflow/e2sLb_3hyDI I would be very interested in any comments you might have about the applicability to instrumenting bigdata transactions. Cheers, Peter On Feb 3, 2012, at 10:19 AM, Bryan Thompson wrote: Peter, I put together a ganglia listener / sending library in Java [1] which builds up soft state in a concurrent hash map to support a ganglia integration for bigdata [2]. The library makes it easy to turn a Java application into a ganglia peer. I also plan to migrate some of our existing per-host, per-process, and JVM specific counters that we have into this library where they might be useful to a broader audience. Some of the benefits of this library for us are that we can: - leverage the existing ganglia ecosystem; - obtain fast load balanced reports from the soft state inside of the JVM; and - extend the metric collection and reporting trivially to application specific counters. I understand that sFlow is available for a variety of environments and that it provides a tighter, though fixed, data gram encoding for metric messages. Can you expand on whether sFlow might have been an alternative for the integration that we
Re: [Ganglia-general] Ganglia 3.3.0 released
The following articles describe the sFlow metrics included in the Ganglia 3.3.0 and 3.2.0 releases: http://blog.sflow.com/2012/02/ganglia-33-released.html http://blog.sflow.com/2011/07/ganglia-32-released.html The Host sFlow agent efficiently exports standard Ganglia host metrics from Windows, Linux and FreeBSD servers as well as per-VM metrics from Hyper-V, XenServer, XCP and Xen hypervisors. Additional sFlow agents are available for Java, Apache, Tomcat, NGINX, node.js and Memcached. Peter On Feb 1, 2012, at 2:38 PM, Vladimir Vuksan wrote: This was gonna be the 4.0.0 release however we received feedback that making a major version bump may get cause issues with various Linux distribution packaging policies e.g. Fedora. Therefore it's been rebranded as 3.3.0. Announcement is here http://ganglia.info/?p=489 Enjoy, Vladimir -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] Ganglia 3.2 and sFlow
Anyone curious about the sFlow functionality in Ganglia 3.2 should take a look at Dave Mangot's blog - he describes why Tagged.com is using Ganglia with sFlow. http://tech.mangot.com/roller/dave/entry/host_based_sflow_a_drop Peter -- RSAreg; Conference 2012 Save #36;700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia 3.2.0 is out
On Wed, Jul 13, 2011 at 6:43 PM, Vladimir Vuksan vl...@vuksan.com wrote: Great. Would be possible to get a comprehensive guide on all the configuration options for sFlow stuff :-). There are very few configuration settings. Just the udp_port and the accept_vm_metrics settings, both are shown in the default configuration: [root@ganglia ~]# gmond --default_config ... /* Channel to receive sFlow datagrams */ #udp_recv_channel { # port = 6343 #} /* optional sFlow settings */ #sflow { # udp_port = 6343 # accept_vm_metrics = no #} Actually we are working on adding ability to add TAGS to hosts ie. a comma separated list of arbitrary tags that identify a host e.g. database,memcache etc. We then just need to build a UI that would allow you to just see things tagged with memcache etc. TAGS sound like a flexible way to handle groups of hosts. The current gmond/sFlow implementation includes a parent attribute for each virtual machine that identifies the physical server hosting the virtual machine. Where in the XML is that actually displayed ? The sFlow protocol assignes a data source index (dsi) to each measurement point: METRIC NAME=dsi VAL=10.0.0.162:1 TYPE=string UNITS= TN=7 TMAX=60 DMAX=0 SLOPE=zero EXTRA_DATA EXTRA_ELEMENT NAME=TITLE VAL=Datasource ID/ EXTRA_ELEMENT NAME=DESC VAL=Datasource ID/ EXTRA_ELEMENT NAME=GROUP VAL=system/ /EXTRA_DATA /METRIC A virtual machine has it's own dsi as well as a parent_dsi, indicating the hypervisor hosting it: METRIC NAME=dsi VAL=10.0.0.163:2 TYPE=string UNITS= TN=18 TMAX=60 DMAX=0 SLOPE=zero EXTRA_DATA EXTRA_ELEMENT NAME=TITLE VAL=Datasource ID/ EXTRA_ELEMENT NAME=DESC VAL=Datasource ID/ EXTRA_ELEMENT NAME=GROUP VAL=system/ /EXTRA_DATA /METRIC METRIC NAME=parent_dsi VAL=10.0.0.162:1 TYPE=string UNITS= TN=18 TMAX=60 DMAX=0 SLOPE=zero EXTRA_DATA EXTRA_ELEMENT NAME=TITLE VAL=Parent Datasource ID/ EXTRA_ELEMENT NAME=DESC VAL=Parent Datasource ID/ EXTRA_ELEMENT NAME=GROUP VAL=system/ /EXTRA_DATA /METRIC hsflowd reports core Ganglia metrics for the hypervisor and libvirt metrics for each virtual machine: http://www.sflow.org/sflow_host.txt Question is who sets the host-id. UUIDs are meaningless without context and I am not sure that it should be Ganglia or HSflowd that set them. This is likely a job of configuration management system. hsflowd obtains the UUID from BIOS where possible (e.g. using /usr/sbin/dmidecode on Linux), falling back on the UUID of the first physical disk on older systems without BIOS UUID. Hypervisors assign a UUID to each virtual machine as it is created. hsflowd uses libvirt/libxenstore to retrieve the virtual machine UUIDs. The UUIDs provide a unique and persistent identifier for each physical and virtual machine. sFlow also reports adapter MAC addresses for each physical and virtual machine. IP addresses and hostnames can change, but the UUIDs tend to stay the same. There is zero configuration involved in assigning UUIDs, they are assigned to CPU motherboards, or automatically by the operating system as disks are formatted or virtual machines created. Peter -- AppSumo Presents a FREE Video for the SourceForge Community by Eric Ries, the creator of the Lean Startup Methodology on Lean Startup Secrets Revealed. This video shows you how to validate your ideas, optimize your ideas and identify your business strategy. http://p.sf.net/sfu/appsumosfdev2dev ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia 3.2.0 is out
On Tue, Jul 12, 2011 at 6:43 AM, Vladimir Vuksan vli...@veus.hr wrote: That's relatively easy to fix. In Gweb 2.1.0+ any metrics that don't exist show up as empty graphs with a legend that says No matching metrics found. We can certainly fix any other ones. We shouldn't let UI get in the way of collecting useful metrics :-). That will be definite improvement. There is a bigger issue when I missed your February post :-(. I think all the metrics you are currently dropping are useful metrics and I think those should be included. Is this something that needs to change in the gmond code or is this part of hsflowd ? The vm statistics are always sent by hsflowd (when running on a hypervisor). They are dropped by default in gmond, but can be enabled using the accept_vm_metrics = yes option: http://blog.sflow.com/2011/07/ganglia-and-cloud-performance.html Broken charts are only part of the problem. Ganglia works best when all the items being displayed are part of a cluster (i.e. the members are similar, sharing common attributes). When you look at the statistics from a virtual server pool, there are really two logical clusters. The cluster of virtual machines with one set of attributes and the cluster of physical servers running Xen, KVM etc. that host the virtual machines. Mixing virtual and physical machines leads to a confusing presentation because you are no longer comparing like with like. If you throw in network statistics, you logically have a third cluster (of network interfaces). One way to address the problem would be to have a HOST attribute in the gmond metadata that allowed different logical clusters to be identified. For example a physical server might have an attribute CATEGORY=SERVER in its HOST section. A virtual machine could be identified as CATEGORY=VM and a network interface CATEGORY=NETWORK. This would allow the UI to switch between logical slices being reported by a single gmond instance. An alternative would be to allow multiple tcp_accept_channel sections, each of which would present a different logical cluster. For example the following tcp_accept_channel { port = 8649 hosttype=server } tcp_accept_channel { port = 8650 hosttype=vm } This second option fits well with the current architecture, the following gmetad.conf settings would create the two clusters. data_source server cluster localhost data_source vm cluster localhost:8650 Regarding some of the other points in your February e-mail 1. Standardizing TITLE and DESC metric values - That sounds like a good idea. 2. Should TITLE and DESC metadata be excluded from the statistics export - That also sounds like a good idea but it may not have as much value to get it done at this time. I'd defer that to later. Let me know if you disagree. This was a general comment about cleaning up the scheme for the future. Not a high priority. 3. Express Containment of a virtual host - I think we could work around it by either adding an additional attribute to e.g. HOST that says something like PARENT. That should be easy to add. Alternatively we can add e.g. string metric that says Parent. That may be the easier way to go. Remaining portion is then just the UI component. The current gmond/sFlow implementation includes a parent attribute for each virtual machine that identifies the physical server hosting the virtual machine. 4. Unique server/VM UUID - I believe this is now solved by using the override_hostname and override_ip settings. Let me know if you disagree. The current gmond/sFlow implementation does override the hostname and ip attributes, but you end up with odd values in the UI: IP Address b0b22c02-6947-fc8a-5a87-f1d014f4ae69 It would be better if there were an explicit opaque host-id attribute that was used as the key in the gmond hash table and to key identify hosts in gmond, as directory paths for charts etc. Hostnames and IP addresses would no longer be required (or required to be unique) and could be omitted if unknown. hsflowd reports UUIDs for physical and virtual servers. UUIDs are persistent and unique, making them good candidates for host-ids 5. Expanding number of Ganglia metrics / Is there interest ? - Yes and yes :-). This is something people constantly ask on IRC. I think we'll start with adding selected Python modules that are now in our Github repo to the distribution. I'd definitely like to see more metrics coming out hsflowd. There are also currently efforts to standardize http and memcache metrics export in sFlow. Once they are finalized, we plan to add them to gmond: http://blog.sflow.com/2011/01/http.html http://blog.sflow.com/2010/09/memcached.html -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes
Re: [Ganglia-general] Ganglia 3.2.0 is out
Good suggestion. The bind directive in the udp_recv_channel block looks like it does the trick. I updated the instructions to cover this option: http://blog.sflow.com/2011/07/ganglia-32-released.html On Mon, Jul 11, 2011 at 9:43 AM, Robert Jordan rjor...@notampering.com wrote: Hi Peter, Regarding the article linked below; Is it also possible to use the standard port number but different bind addresses for multiple gmond processes when monitoring multiple clusters? Using this approach would have the advantage of allowing the configuration to be changed by simply updating DNS entries rather than potentially needing to update many host-sflow agent machines. Thanks, Robert On Thu, Jul 7, 2011 at 11:26 PM, Peter Phaal peter.ph...@gmail.com wrote: Great news! For additional information on the sFlow feature and updated configuration instructions, see: http://blog.sflow.com/2011/07/ganglia-32-released.html On Thu, Jul 7, 2011 at 7:20 PM, Vladimir Vuksan vli...@veus.hr wrote: -- Forwarded message -- We are happy to announce the release of Ganglia 3.2.0. Announcement can be read here http://ganglia.info/?p=430 Notable changes are - sFlow support - hostname/ip override - useful in dynamic/cloud environments - FreeBSD patches - Python module improvements - Bugfixes and improvements over 3.1.7 Now that 3.2.0 is out we have a number of other improvements we are working and hope to release shortly. Stay tuned. Vladimir -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia 3.2.0 is out
On Mon, Jul 11, 2011 at 1:09 PM, Vladimir Vuksan vli...@veus.hr wrote: Peter, It is also my understanding that currently only metrics from physical hosts are supported. Is it possible to add network devices that support sFlow ? Thanks, Vladimir Currently the Ganglia UI is host oriented, expecting a core set of metrics to be present for each server. The current Host sFlow implementation includes virtual machine statistics (equivalent to libvirt performance metrics), but they are disabled by default since there are issues with the UI since virtual machines report a limited set of metrics: http://blog.sflow.com/2011/07/ganglia-and-cloud-performance.html There are additional enhancements to the Ganglia UI and data model that would be helpful: http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg06319.html Enabling sFlow metrics from network devices would have similar problems since the metrics relate to network links rather than servers. -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia 3.2.0 is out
Great news! For additional information on the sFlow feature and updated configuration instructions, see: http://blog.sflow.com/2011/07/ganglia-32-released.html On Thu, Jul 7, 2011 at 7:20 PM, Vladimir Vuksan vli...@veus.hr wrote: -- Forwarded message -- We are happy to announce the release of Ganglia 3.2.0. Announcement can be read here http://ganglia.info/?p=430 Notable changes are - sFlow support - hostname/ip override - useful in dynamic/cloud environments - FreeBSD patches - Python module improvements - Bugfixes and improvements over 3.1.7 Now that 3.2.0 is out we have a number of other improvements we are working and hope to release shortly. Stay tuned. Vladimir -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2 ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] Using Ganglia to monitor Rackspace cloudservers
Hi All, I have been experimenting with Ganglia for monitoring performance in the Rackspace cloud and it works very well: http://blog.sflow.com/2011/01/rackspace-cloudservers.html A big advantage of the gmond/sFlow data push model is that Ganglia automatically discovers cloud servers as they are created. The polling model that most network management tools use is poorly suited to monitoring dynamic cloud server pools. For anyone interested in taking a look, Ganglia is running on a Fedora 14 cloud server, http://rs-ganglia.inmon.com/ Peter -- Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia and sFlow
The patch is obsolete, the sFlow code has been checked into the development branch (trunk). To build Ganglia with sFlow support you need to download the latest sources from Sourceforge: svn co https://ganglia.svn.sourceforge.net/svnroot/ganglia ganglia Peter On Wed, Dec 15, 2010 at 3:46 AM, Giovanni De Rosa giode...@hotmail.itwrote: Hi, i think there is something wrong beacuse i only see the host on with is installed gmond in the ganglia web page. i checked if sflow send to the host with gmond the packets and it does (i used tcpdump udp port 6343). To pach gmond for the use of sFlow i used this http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=276 What can i do? thanks a lot Giovanni *From:* Peter Phaal peter.ph...@gmail.com *Sent:* Tuesday, December 07, 2010 6:34 PM *To:* Giovanni De Rosa giode...@hotmail.it *Cc:* ganglia-general@lists.sourceforge.net *Subject:* Re: [Ganglia-general] Ganglia and sFlow The following article provides additional information on configuring the Ganglia development branch (trunk) to collect sFlow: http://blog.sflow.com/2010/10/ganglia.html Installing and configuring Host sFlow agents to send sFlow from Linux and Windows platforms is described in the articles: http://blog.sflow.com/2010/10/installing-host-sflow-on-linux-server.html http://blog.sflow.com/2010/10/installing-host-sflow-on-windows-server.html sFlow is sent from the Host sFlow agents to the Ganglia gmond collector as unicast UDP messages to port 6343. You need to make sure that each Host sFlow agent is configured to send to the IP address of the server that gmond is installed on (the configuration details are in the articles above). If you are still having problems then check that there are no firewalls blocking the traffic. The IP tables filters on the the collector and agents and well as any intermediate firewalls must be configured allow UDP port 6343 traffic to pass. You can confirm that sFlow is being received at the gmond server by running the following tcpdump command: tcpdump udp port 6343 Please let me know if you have any difficulties. Peter On Tue, Dec 7, 2010 at 1:19 AM, Giovanni De Rosa giode...@hotmail.itwrote: hi, i'm trying to use ganglia with sFlow. I have installed gmond patched for using sFlow on a host and installed the sFlow agent onto a different host. The problem is that it seems to me that never is changed (the xml of gmond seems the same). How can i anderstand that all is working right??? the sFlow angent run well and send to the host on with is installed gmond the packets. thanks a lot Giovanni -- What happens now with your Lotus Notes apps - do you make another costly upgrade, or settle for being marooned without product support? Time to move off Lotus Notes and onto the cloud with Force.com, apps are easier to build, use, and manage than apps on traditional platforms. Sign up for the Lotus Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia and sFlow
You are correct. If gmond still isn't reporting on sFlow, the other thing to check is your firewall. tcpdump sees packets before the firewall so seeing the sFlow packets in tcpdump confirms that the packets are arriving at the server, but it doesn't necessarily mean that gmond is able to receive them. You may need to add a rule to iptables accepting incoming packets to UDP port 6343. To test if the problem is firewall related, you can temporarily disable the firewall with the command: /sbin/service iptables stop On Dec 15, 2010 9:28am, giovanni de rosa giode...@hotmail.it wrote: thanks a lot. I'm not an expert so sorry if i say nonsense. If i undestand correctly i have to download everything is in: https://ganglia.svn.sourceforge.net/svnroot/ganglia/trunk/monitor-core/ than adding in the configuration file the rcv channel for sflow and then rebulding all. Is this right? thanks Date: Wed, 15 Dec 2010 09:01:56 -0800 Subject: Re: [Ganglia-general] Ganglia and sFlow From: peter.ph...@gmail.com To: giode...@hotmail.it CC: ganglia-general@lists.sourceforge.net The patch is obsolete, the sFlow code has been checked into the development branch (trunk). To build Ganglia with sFlow support you need to download the latest sources from Sourceforge: svn co https://ganglia.svn.sourceforge.net/svnroot/ganglia ganglia Peter On Wed, Dec 15, 2010 at 3:46 AM, Giovanni De Rosa giode...@hotmail.it wrote: Hi, i think there is something wrong beacuse i only see the host on with is installed gmond in the ganglia web page. i checked if sflow send to the host with gmond the packets and it does (i used tcpdump udp port 6343). To pach gmond for the use of sFlow i used this http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=276 What can i do? thanks a lot Giovanni From: Peter Phaal Sent: Tuesday, December 07, 2010 6:34 PM To: Giovanni De Rosa Cc: ganglia-general@lists.sourceforge.net Subject: Re: [Ganglia-general] Ganglia and sFlow The following article provides additional information on configuring the Ganglia development branch (trunk) to collect sFlow: http://blog.sflow.com/2010/10/ganglia.html Installing and configuring Host sFlow agents to send sFlow from Linux and Windows platforms is described in the articles: http://blog.sflow.com/2010/10/installing-host-sflow-on-linux-server.html http://blog.sflow.com/2010/10/installing-host-sflow-on-windows-server.html sFlow is sent from the Host sFlow agents to the Ganglia gmond collector as unicast UDP messages to port 6343. You need to make sure that each Host sFlow agent is configured to send to the IP address of the server that gmond is installed on (the configuration details are in the articles above). If you are still having problems then check that there are no firewalls blocking the traffic. The IP tables filters on the the collector and agents and well as any intermediate firewalls must be configured allow UDP port 6343 traffic to pass. You can confirm that sFlow is being received at the gmond server by running the following tcpdump command: tcpdump udp port 6343 Please let me know if you have any difficulties. Peter On Tue, Dec 7, 2010 at 1:19 AM, Giovanni De Rosa giode...@hotmail.it wrote: hi, i'm trying to use ganglia with sFlow. I have installed gmond patched for using sFlow on a host and installed the sFlow agent onto a different host. The problem is that it seems to me that never is changed (the xml of gmond seems the same). How can i anderstand that all is working right??? the sFlow angent run well and send to the host on with is installed gmond the packets. thanks a lot Giovanni -- What happens now with your Lotus Notes apps - do you make another costly upgrade, or settle for being marooned without product support? Time to move off Lotus Notes and onto the cloud with Force.com, apps are easier to build, use, and manage than apps on traditional platforms. Sign up for the Lotus Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
Re: [Ganglia-general] Ganglia and sFlow
The following article provides additional information on configuring the Ganglia development branch (trunk) to collect sFlow: http://blog.sflow.com/2010/10/ganglia.html Installing and configuring Host sFlow agents to send sFlow from Linux and Windows platforms is described in the articles: http://blog.sflow.com/2010/10/installing-host-sflow-on-linux-server.html http://blog.sflow.com/2010/10/installing-host-sflow-on-windows-server.html sFlow is sent from the Host sFlow agents to the Ganglia gmond collector as unicast UDP messages to port 6343. You need to make sure that each Host sFlow agent is configured to send to the IP address of the server that gmond is installed on (the configuration details are in the articles above). If you are still having problems then check that there are no firewalls blocking the traffic. The IP tables filters on the the collector and agents and well as any intermediate firewalls must be configured allow UDP port 6343 traffic to pass. You can confirm that sFlow is being received at the gmond server by running the following tcpdump command: tcpdump udp port 6343 Please let me know if you have any difficulties. Peter On Tue, Dec 7, 2010 at 1:19 AM, Giovanni De Rosa giode...@hotmail.itwrote: hi, i'm trying to use ganglia with sFlow. I have installed gmond patched for using sFlow on a host and installed the sFlow agent onto a different host. The problem is that it seems to me that never is changed (the xml of gmond seems the same). How can i anderstand that all is working right??? the sFlow angent run well and send to the host on with is installed gmond the packets. thanks a lot Giovanni -- What happens now with your Lotus Notes apps - do you make another costly upgrade, or settle for being marooned without product support? Time to move off Lotus Notes and onto the cloud with Force.com, apps are easier to build, use, and manage than apps on traditional platforms. Sign up for the Lotus Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d ___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general -- What happens now with your Lotus Notes apps - do you make another costly upgrade, or settle for being marooned without product support? Time to move off Lotus Notes and onto the cloud with Force.com, apps are easier to build, use, and manage than apps on traditional platforms. Sign up for the Lotus Notes Migration Kit to learn more. http://p.sf.net/sfu/salesforce-d2d___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Ganglia-general] sFlow support in gmond
Hello All, Here is some background on the sFlow support that has been added to gmond in the development branch: http://blog.sflow.com/2010/10/ganglia.html An sFlow agent is extremely lightweight, since sFlow monitoring is typically used in embedded environments where resources are constrained: switches, routers, firewalls, hypervisors etc. The addition of sFlow support to gmond allows metrics to be collected from these environments where the installation of a gmond agent is often not possible. This initial implementation of gmond/sFlow decodes and populates the core set of Ganglia metrics, but future versions could decode additional sFlow structures. For example, sFlow reports on virtual machine statistics (based on libvirt), however the challenge is deciding how to incorporate the additional metrics in the Ganglia data model and in the UI: http://www.mail-archive.com/ganglia-general@lists.sourceforge.net/msg06009.html There are currently sFlow agents for Windows, Linux, Xen, XCP, XenServer and KVM/libvirt. Please reply to the list with any comments and suggestions. Cheers, Peter -- Nokia and ATT present the 2010 Calling All Innovators-North America contest Create new apps games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev___ Ganglia-general mailing list Ganglia-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ganglia-general