Re: [Ganglia-general] 3.7.2 RPMs

2016-04-15 Thread Chris Burroughs
For reasons that are not clear to me the EPEL build also uses a slightly different (no libganglia?) set of packages, which can make switching between EPEL and other builds difficult. On 03/28/2016 12:36 PM, Adrian Sevcenco wrote: > On 03/28/2016 06:45 PM, Damir Krstic wrote: >> Does anyone have

Re: [Ganglia-general] gmetad data thread is not closing connections properly (CLOSE_WAIT connections)

2016-03-23 Thread Chris Burroughs
Hi Javier, your issue sounds at least somewhat similar to: https://github.com/ganglia/monitor-core/issues/47 Which includes several cross referenced discussions, but no clear fixes. Are your clusters all on the same local network? On 02/15/2016 12:38 PM, Javier Villar Fernández wrote: > Hi

Re: [Ganglia-general] Ganglia documentation?

2015-03-09 Thread Chris Burroughs
While there are certainly a lot of changes since 3.1.x, the architecture descirption in 'Monitoring with Ganglia' is still quite relevant. On 03/04/2015 02:43 PM, Ralph Castain wrote: Hi folks I’m looking at Ganglia for one of my projects, and see that the architecture has evolved

Re: [Ganglia-general] How to use procstat metrics in gmond 3.7.0 ?

2015-02-26 Thread Chris Burroughs
I don't see anything obviously wrong with the snippet. You can run procstat.py from the command line (--help for options) and pass along the configuration over the command line. That might provide a faster feedback cycle for debugging. On 02/24/2015 03:17 PM, Grigory Shamov wrote: Hi All,

Re: [Ganglia-general] GSoC / Google Summer of Code 2015 (deadline Friday)

2015-02-18 Thread Chris Burroughs
I would be interested in mentoring this summer. -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing,

Re: [Ganglia-general] collect_every vs gmetad polling?

2015-02-13 Thread Chris Burroughs
gmond only keeps the latest value, so the short answer is 'nothing'. It would be a significant (but very useful) architectural change for the various components keep a sliding window of values and take a lot of the luck out of getting everything to line up. On 02/12/2015 05:38 PM, Brad Hough

[Ganglia-general] 64bit metrics with modpython

2014-02-18 Thread Chris Burroughs
I'm trying to write a gmond python module that needs to measure values greater than 2^32 (bytes of memory/storage). I'm having trouble getting that to work as there either isn't a uint64 type or the python module is turning all ints into 32bit [1]. What's the right way to pass 64 bit numbers

Re: [Ganglia-general] UC 64bit metrics with modpython

2014-02-18 Thread Chris Burroughs
On 2014-02-18 09:39, Rushton Martin wrote: Could you simply divide the value by 2^20 (or 10^6) and send MiB (or MB)? This is actually similar to what the built in memory metrics do. They report KiB and then the fancy report converts that to something nice. The work around that I am currently

Re: [Ganglia-general] Problems during ganlia customization

2013-08-19 Thread Chris Burroughs
On 08/06/2013 11:42 AM, m...@termit.ln.ua wrote: 1. Hostnames. Looks like ganglia takes hostnames from PTR records. It looks not good for me to amend PTR record if i want to change hostname. Is it possible to explicitly set hostnames in gmond.conf? If memory servers I think you want to look

[Ganglia-general] CLOSE_WAIT / blocking on hash_insert

2013-07-23 Thread Chris Burroughs
We have been slowly been trying to squash stability issues with gmetad. Many of the symptoms seem to relate to sockets ending up in CLOSE_WAIT, although I'm unsure if that is a useful clue or a side effect. The user visible problem is metrics not getting updated and TN climbing for an

Re: [Ganglia-general] Trendline question

2013-05-21 Thread Chris Burroughs
You should see a little Trend button when clicking around in the UI next to CSV/JSON/Inspect/Timeshift etc. On 05/20/2013 09:14 AM, Емельянов Борис wrote: Good day! I want to use trendline for some of my metrics, but the only way i manage to switch it on is to send ?trend=1trendhistory=1

Re: [Ganglia-general] Dynamic views in the web interface

2013-03-19 Thread Chris Burroughs
You are correct that there is no way to 'group-by' arbitrary attributes. Some process will have to pass in a set of hostnames (or regex). There are some short examples in the views document http://sourceforge.net/apps/trac/ganglia/wiki/ganglia-web-2#Views. On 03/19/2013 05:32 AM, Torstensen

Re: [Ganglia-general] Grouping Hosts from the same cluster, or spoofing Cluster name

2013-03-18 Thread Chris Burroughs
On 2013-03-14 09:04, Simon Boulet wrote: Now, I would like to group VMs separately from the Hosts in Ganglia Web. Currently both the VMs and the Hosts appears in the same cluster. I was thinking of having a separate cluster for the VMs. However, it doesn't seems possible in my current

Re: [Ganglia-general] IP Change on gmetad/apache server

2013-03-18 Thread Chris Burroughs
I gather from the context that you are identifying hosts by ip? In that case you could (in principle) 'mv old-name new-name' for all rrd files. On 2013-03-17 14:39, dan.fra...@pnc.com wrote: Due to issues beyond my control, the Linux VMWare server that runs our top-level gmetad and apache web

Re: [Ganglia-general] Problem with head node in a cluster

2012-11-01 Thread Chris Burroughs
Could you elaborate on what you mean by head node? Do you mean a gmond aggregator? How was the script failing? On 09/25/2012 04:50 AM, deep desai wrote: hi, I have written a python module for getting the stats for cassandra using the nodetool cfstats. The script is working fine for all

Re: [Ganglia-general] Question about scaling

2012-11-01 Thread Chris Burroughs
What makes 60 an unlucky number? On 10/25/2012 05:20 PM, Vladimir Vuksan wrote: 60 seconds is likely the problem. I would leave it at default ie 15. I can explain later. Potter,Mark L mlpot...@mdanderson.org wrote: Nicholas, I have it set to collect every 60 seconds at the moment as

Re: [Ganglia-general] Impact of gmond polling on data collection

2012-09-21 Thread Chris Burroughs
can't be 100% sure that this patch will fix your problem but it would be worth a try. Regards, Nick [1] https://github.com/ganglia/monitor-core/pull/50 On Sat, Sep 15, 2012 at 12:16 AM, Chris Burroughs chris.burrou...@gmail.com wrote: We use ganglia to monitor 500 hosts in multiple

[Ganglia-general] java library for parsing gmond.conf?

2012-09-15 Thread Chris Burroughs
Is there any known java library (or antlr file or other lexer/parser) for libconfuse based configuration files? I'd like my java applications to just read gmond.conf (like gmetric does) instead of having to configure the same thing twice. I suppose a something that is custom to ganglia would

Re: [Ganglia-general] Java/JMX plugin for Ganglia 3.1.x

2012-09-14 Thread Chris Burroughs
We have all of our metrics of interest generated by Coda Hale's metrics package: http://metrics.codahale.com/ That includes a ganglia reporter that can be used to send metrics to gmond. (But not arbitrary pre-existing beans.) On 09/13/2012 08:43 AM, Martin Knoblauch wrote: Hi, as part of a

[Ganglia-general] Impact of gmond polling on data collection

2012-09-14 Thread Chris Burroughs
problem are also welcome!) Thank you, Chris Burroughs [1] https://github.com/ganglia/monitor-core/issues/47 [2] 120827 89 120828 6 120829 3 120830 4 120831 5 120901 1 120902 6 120903 2 120904 9 120905 4 120906 70 120907 523 120908 85 120909 4 120910 6 120911 2 120912 5 120913 5

[Ganglia-general] SurgeCon 2012

2012-09-05 Thread Chris Burroughs
Surge [1] is scalability focused conference in late September hosted in Baltimore. It's a pretty cool conference with a good mix of operationally minded people interested in scalability, distributed systems, systems level performance and good stuff like that. You should go! [2] This year there

Re: [Ganglia-general] Error 1 sending the modular data

2012-08-15 Thread Chris Burroughs
again I found http://bugzilla.ganglia.info/cgi-bin/bugzilla/show_bug.cgi?id=189 but that seems to refer to an un-escaped name, as opposed to a transient but not recovered error. On 08/13/2012 01:29 PM, Chris Burroughs wrote: So for background, my original problem is that load_one

Re: [Ganglia-general] gmetad xml generation time

2012-08-15 Thread Chris Burroughs
On 08/15/2012 09:13 AM, Vladimir Vuksan wrote: 2 seconds seems a bit excessive. Last time I tested downloading from gmetad on a setup with ~50k metrics it took between 250-300ms to download XML. I will recheck later today. Can you tell me what the size of the downloaded XML is Bytes ie.

Re: [Ganglia-general] gmetad xml generation time

2012-08-14 Thread Chris Burroughs
On 08/14/2012 02:00 PM, Douglas Wagner wrote: I don't mean to ask stupid questions, maybe I'm not reading you right... Are you saying that it's taking 2K ms (20s) between received data sets? (i.e. Every 2K ms you see XML data come into your application)? Or that it's taking 2K ms between the

[Ganglia-general] Error 1 sending the modular data

2012-08-13 Thread Chris Burroughs
So for background, my original problem is that load_one will not be updated by gmetad for a period of over 600 seconds (an arbitrary timeout signifying that gmond/the host is probably down). It occurs a few times/day across hundreds of hosts, and often occurs near midnight localtime. This

[Ganglia-general] gmetad xml generation time

2012-08-13 Thread Chris Burroughs
I have a process that periodically polls gmetad (builds models of some metrics, alerts if things don't look like). To reduce the number of variables I set up a dedicated gmetad on the same host as the poller and set write_rrds off. Unless I'm missing something the only thing it should be doing

[Ganglia-general] Programmatically get events

2012-07-23 Thread Chris Burroughs
I'm trying to find some api for getting (not setting) events. I've looked through events.php [1] and see a human readable output. But I'm looking for something for the machines. Am I missing something obvious? [1] https://github.com/ganglia/ganglia-web/blob/3.5.2/events.php

Re: [Ganglia-general] Overlay timeshifted data

2012-05-31 Thread Chris Burroughs
Really exciting. But I'm confused how this works with the round robin nature of RRD. Don't we by default only have (for example) daily data for past 24 hour period, not 48 hours? On 05/16/2012 07:54 PM, Vladimir Vuksan wrote: There is a blog post about a new feature in Ganglia Web called

Re: [Ganglia-general] Ganglia gmond memory leak?

2012-02-27 Thread Chris Burroughs
I've also observed this and have been unable to find a solution. In my case at least there was no obvious correlation with the number of metrics or weather the gmond was an aggregating or not (so several orders of magnitude in the number of metrics did not matter, it might happen on 2 out of 80

[Ganglia-general] tcpconn.py and netstat

2012-02-27 Thread Chris Burroughs
Currently tcpconn.py uses netstat to get it's socket stats. This gives lots of detail but is far too slow for much production use (running netstat can take many minutes). /proc/net/sockstat gives less information but has no performance problems. There was a suggestion previously to use the ss

Re: [Ganglia-general] O'Reilly eBook on Ganglia

2011-12-13 Thread Chris Burroughs
On 12/09/2011 07:51 PM, Matt Massie wrote: What are the things you would be most interested in? Are there other topics you'd like to see covered? I would also like to see more details on how and why for different common variations. For example, some people set 'host' to something other than

Re: [Ganglia-general] gmond forwarding

2011-11-01 Thread Chris Burroughs
On 10/27/2011 08:36 PM, Rick Cobb wrote: In your case, your sender doesn't know about the other receivers and will need to be configured via its own configuration technique. It can be confusing that gmond is a daemon with three roles (and gmetad one with two), but that's the way it is.

[Ganglia-general] gmond forwarding

2011-10-27 Thread Chris Burroughs
I have a unicast setup where each cluster box is sending to 3 collector servers. For all sorts of normal metrics (cpu load) etc this works fine. Each cluster box also has several java applications [1] are also sending metrics to the gmond on localhost. Running with debug=2, I see that those

[Ganglia-general] Recommended IO hardware (RRDtool and SSDs?)

2011-09-20 Thread Chris Burroughs
We are trying to determine the appropriate hardware for our ganlgia web server holding the RRD files. From what I have been able to tell most of the tuning advice [1] [2] seems to be focused on flushing to disk less frequently (with rrdcached, tempfs, page cache settings, etc.) and not on tuning

Re: [Ganglia-general] Recommended IO hardware (RRDtool and SSDs?)

2011-09-20 Thread Chris Burroughs
On 09/20/2011 09:47 AM, Jesse Becker wrote: I'd also suggest reading this paper: http://www.usenix.org/event/lisa07/tech/full_papers/plonka/plonka_html/ It's about scaling MRTG up to about 320,000 RRD files (without SSDs). Thanks Jesse I saw that paper and found it very helpful. But

[Ganglia-general] network push/pull model clarification

2011-08-16 Thread Chris Burroughs
So my understanding is that gmond on each node receives UDP messages from localhost is polled by gmetad via tcp/xml. The wikipedia page [1] states that gmond also sends data via Unicasting or Multicasting host state in external data representation (XDR) format using UDP messages. But I'm having