Re: [Ganglia-developers] Gmetad bottlenecks

2014-01-15 Thread Nicholas Satterly
Hi Devon, I think now that we the ability to define exactly which metrics should and should not be summarised then the issue of slow-downs due to metric summarisation can be managed. If we are to look at redoing the XML parsing next then the two contenders that come to mind are gzipped JSON and

Re: [Ganglia-developers] Gmetad bottlenecks

2014-01-15 Thread Jesse Becker
On Wed, Jan 15, 2014 at 8:41 AM, Nicholas Satterly nfsatte...@gmail.com wrote: If we are to look at redoing the XML parsing next then the two contenders that come to mind are gzipped JSON and Google Protocol Buffers. PB is meant to be very efficient and therefore faster, however it seems

Re: [Ganglia-developers] Gmetad bottlenecks

2014-01-15 Thread Dave Rawks
On 01/15/2014 06:42 AM, Jesse Becker wrote: On Wed, Jan 15, 2014 at 8:41 AM, Nicholas Satterly nfsatte...@gmail.com wrote: If we are to look at redoing the XML parsing next then the two contenders that come to mind are gzipped JSON and Google Protocol Buffers. PB is meant to be very

Re: [Ganglia-developers] Gmetad bottlenecks

2014-01-15 Thread Jeff Buchbinder
On Wed, Jan 15, 2014 at 12:11 PM, Dave Rawks d...@pandora.com wrote: On 01/15/2014 06:42 AM, Jesse Becker wrote: On Wed, Jan 15, 2014 at 8:41 AM, Nicholas Satterly nfsatte...@gmail.com wrote: If we are to look at redoing the XML parsing next then the two contenders that come to mind are

Re: [Ganglia-developers] Gmetad bottlenecks

2014-01-15 Thread Devon H. O'Dell
Perhaps configurable, then? XDR is a pain in the ass to debug. I was considering going with JSON or BSON initially. --dho 2014/1/15 Dave Rawks d...@pandora.com: On 01/15/2014 06:42 AM, Jesse Becker wrote: On Wed, Jan 15, 2014 at 8:41 AM, Nicholas Satterly nfsatte...@gmail.com wrote: If we

Re: [Ganglia-developers] Gmetad bottlenecks

2014-01-14 Thread Nicholas Satterly
Given the performance benefits gained by Devon's work I will revert the patch that attempted to speed up metric summaries because it's causing grid-of-grids to fail (unless there are any objections) ... https://github.com/ganglia/monitor-core/commit/0705a5defa284e289004daf61ea390338719d5fb

Re: [Ganglia-developers] Gmetad bottlenecks

2014-01-14 Thread Devon H. O'Dell
I don't personally have any objections, but if this remains a pain point, perhaps this is something we can address differently? I think where I left off, XML parsing was the taking the most time; is that something that people are comfortable with changing (data format?) --dho 2014/1/14 Nicholas

Re: [Ganglia-developers] Gmetad bottlenecks

2013-12-10 Thread Chris Burroughs
On 12/08/2013 04:43 PM, Devon H. O'Dell wrote: This is a simple `perf top -p $PID` on one of of our gmetad nodes Samples: 1M of event 'cycles', Event count (approx.): 64115959770 6.59% libexpat.so.1.5.2 [.] 0x00011b8d 4.77% libganglia-3.6.0.so.0.0.0 [.] hashval

Re: [Ganglia-developers] Gmetad bottlenecks

2013-12-08 Thread Chris Burroughs
On 12/07/2013 03:00 PM, Vladimir Vuksan wrote: Were these failures totally random or grouped in some way? (Same cluster, type, etc). We run multiple dozens of clusters and some of the larger clusters ie. clusters that had 2-3x machines that other clusters would exhibit either gaps, slower

Re: [Ganglia-developers] Gmetad bottlenecks

2013-12-08 Thread Chris Burroughs
On 12/07/2013 03:22 PM, Devon H. O'Dell wrote: We were polling every 10 seconds, and it was taking over 2 minutes to finish parsing the XML (which includes writing the RRDs). With my changes, the 10 second poll is feasible. Parse, or fetch parse? We have a simple python script we use to

Re: [Ganglia-developers] Gmetad bottlenecks

2013-12-08 Thread Devon H. O'Dell
2013/12/8 Chris Burroughs chris.burrou...@gmail.com: On 12/07/2013 03:22 PM, Devon H. O'Dell wrote: We were polling every 10 seconds, and it was taking over 2 minutes to finish parsing the XML (which includes writing the RRDs). With my changes, the 10 second poll is feasible. Parse, or

Re: [Ganglia-developers] Gmetad bottlenecks

2013-12-07 Thread Devon H. O'Dell
2013/12/7 Adrian Sevcenco adrian.sevce...@cern.ch: On 12/06/2013 10:51 PM, Devon H. O'Dell wrote: 2013/12/6 Vladimir Vuksan vli...@veus.hr: Hello everyone, Hi! For few weeks now we have had performance issues due to growth of our monitoring setup. One of my colleagues Devon O'Dell

Re: [Ganglia-developers] Gmetad bottlenecks

2013-12-07 Thread Chris Burroughs
Thank you Devon and Vladimir for starting this thread. We (AddThis) have been struggling with gmetad performance and stability for a while and I'm personally excited to see the focus here. I'll explain briefly how we are using ganglia for context and then have inline comments. We have two

Re: [Ganglia-developers] Gmetad bottlenecks

2013-12-07 Thread Vladimir Vuksan
On 12/07/2013 02:23 PM, Chris Burroughs wrote: On 12/06/2013 03:36 PM, Vladimir Vuksan wrote: The Ganglia core is comprised of two daemons, `gmond` and `gmetad`. `Gmond` is primarily responsible for sending and receiving metrics; `gmetad` carries the hefty task of summarizing / aggregating

Re: [Ganglia-developers] Gmetad bottlenecks

2013-12-07 Thread Nikhil
Thank you Vladimir and Devon. Much appreciated. +2 for these below initiatives. * changing the data serialization format from XML to one that is easier / faster to parse, * using a different data structure than a hash table for metrics hierarchies (probably a tree with metrics stored at each

Re: [Ganglia-developers] Gmetad bottlenecks

2013-12-07 Thread Devon H. O'Dell
2013/12/7 Chris Burroughs chris.burrou...@gmail.com: Thank you Devon and Vladimir for starting this thread. We (AddThis) have been struggling with gmetad performance and stability for a while and I'm personally excited to see the focus here. I'll explain briefly how we are using ganglia for

[Ganglia-developers] Gmetad bottlenecks

2013-12-06 Thread Vladimir Vuksan
Hello everyone, For few weeks now we have had performance issues due to growth of our monitoring setup. One of my colleagues Devon O'Dell volunteered to help and below is an e-mail of his findings. We'll submit a pull request once we are comfortable with the

Re: [Ganglia-developers] Gmetad bottlenecks

2013-12-06 Thread Devon H. O'Dell
2013/12/6 Vladimir Vuksan vli...@veus.hr: Hello everyone, For few weeks now we have had performance issues due to growth of our monitoring setup. One of my colleagues Devon O'Dell volunteered to help and below is an e-mail of his findings. Hi! I joined the ML, so I'm around to answer

Re: [Ganglia-developers] Gmetad bottlenecks

2013-12-06 Thread daniel . j . marrera
vli...@veus.hr To: ganglia-developers@lists.sourceforge.net ganglia-developers@lists.sourceforge.net Date: 12/06/2013 02:37 PM Subject: [Ganglia-developers] Gmetad bottlenecks Hello everyone, For few weeks now we have had performance issues due to growth of our monitoring setup. One of my

Re: [Ganglia-developers] Gmetad bottlenecks

2013-12-06 Thread Nicholas Satterly
@lists.sourceforge.net ganglia-developers@lists.sourceforge.net Date: 12/06/2013 02:37 PM Subject: [Ganglia-developers] Gmetad bottlenecks -- Hello everyone, For few weeks now we have had performance issues due to growth of our monitoring setup. One of my colleagues

Re: [Ganglia-developers] Gmetad bottlenecks

2013-12-06 Thread Adrian Sevcenco
On 12/06/2013 10:51 PM, Devon H. O'Dell wrote: 2013/12/6 Vladimir Vuksan vli...@veus.hr: Hello everyone, Hi! For few weeks now we have had performance issues due to growth of our monitoring setup. One of my colleagues Devon O'Dell volunteered to help and below is an e-mail of his findings.