What makes 60 an unlucky number?

On 10/25/2012 05:20 PM, Vladimir Vuksan wrote:
> 60 seconds is likely the problem. I would leave it at default ie 15. I can 
> explain later. 
> 
> "Potter,Mark L" <mlpot...@mdanderson.org> wrote:
> 
>> Nicholas,
>>
>> I have it set to collect every 60 seconds at the moment as per the
>> gmetad I posted yesterday but even with that, running "netstat -ua" in
>> a 1 second watch loop, once Recv-Q pops it is still responding
>> immediately and the Recv-Q never stays lit, so to speak, for more than
>> two seconds. In fact even telneting to the port only lights up Recv-Q
>> for 2 seconds flat.
>> ________________________________________
>> From: Nicholas Satterly [nfsatte...@gmail.com]
>> Sent: Thursday, October 25, 2012 15:19
>> To: Potter,Mark L
>> Cc: ganglia-general@lists.sourceforge.net
>> Subject: Re: [Ganglia-general] Question about scaling
>>
>> Hi Mark,
>>
>> I wouldn't be so quick to dismiss timeouts as the problem. The
>> "0.9751s" it took to download and parse ganglia's XML tree refers to
>> the time it took the PHP web frontend to query the gmetad XML whereas
>> the timeout's I was referring to occur when the gmetad polls the gmonds
>> during metric collection every 15 seconds.
>>
>> My suggestion would be to run "netstat -ua" in a loop on the head node
>> and look for a non-zero "Recv-Q" on UDP port 8649. As soon as you see
>> it go non-zero telnet to port 8649 on the head node and make note of
>> how long it takes to respond. If it's any longer than 10 seconds you
>> will see random hosts down and broken graphs on the ganglia web.
>>
>> --Nick.
>>
>> On Thu, Oct 25, 2012 at 8:30 PM, Potter,Mark L
>> <mlpot...@mdanderson.org<mailto:mlpot...@mdanderson.org>> wrote:
>> Well things blew up ~184 hosts. The web interface shows a random number
>> of hosts down each refresh, although sometimes there are all up. It
>> reports just ~1 second to download and process the XML: "Downloading
>> and parsing ganglia's XML tree took 0.9751s " So I don't think timeouts
>> are the problem. A telnet to 8649 produces the XLM immediately. Could
>> this be the point where I need start using a RAM based partition or
>> could it be something else. Is sflow so much better I should consider
>> using it? Would multiple gmond's, say one per rack, and listing them
>> all in gmetad be a solution? At this point I am not sure of the next
>> step and I really appreciate the help the list have given me so far.
>>
>>
>>
>>> Hi Mark,
>>>
>>> I assume cnode340 is the head node that all ~340 other gmond's send
>> their data to. If so, you could reduce >the amount of redundant
>> metadata flying around by increasing "send_metadata_interval" to 120
>> seconds or
>>> higher.
>>
>> That is correct, cnode340 is the head node for ganglia. I have
>> increased the "send metadata interval" to 120 seconds and have 100
>> nodes reporting at this point and it seems pretty smooth. I am going to
>> add the others ~50 at a time.
>>
>>> Also, I suspect that if you telnet to port 8649 on your head node it
>> will take a while to respond because >it's busy processing incoming UDP
>> metrics. If it takes more than 10 seconds to respond on a regular basis
>>> then gmetad will timeout [1].
>>
>> So far, with the 100 I have the response is an instant dump of the XML.
>>
>>> Try deploying a recently patched version of gmond [2] to the head node
>> which is now multi-threaded and see >if that fixes the problem. It
>> starts a separate thread for responding to XML metric requests and
>> should >respond immediately while the main thread is still processing
>> metrics.
>>
>> I am running:
>>
>> gmond 3.4.0
>> gmetad 3.4.0
>> Ganglia Web Frontend version 3.5.2
>>
>> Would I need to patch gmond at this version?
>>
>>
>> <SNIP>
>>
>> ------------------------------------------------------------------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://p.sf.net/sfu/appdyn_sfd2d_oct
>> _______________________________________________
>> Ganglia-general mailing list
>> Ganglia-general@lists.sourceforge.net<mailto:Ganglia-general@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>>
>> ------------------------------------------------------------------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://p.sf.net/sfu/appdyn_sfd2d_oct
>> _______________________________________________
>> Ganglia-general mailing list
>> Ganglia-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/ganglia-general
> 
> 
> 
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_sfd2d_oct
> 
> 
> 
> _______________________________________________
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
> 


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to