Re: [Ganglia-developers] Ganglia graphs broken

Michael Shearer Tue, 16 Jul 2013 16:49:27 -0700

Hi Nikhil,

Nothing I can think of for rrdcached specifically. However, you mention
multiple disks. A RAID 10 array would help performance (parity based RAID
is probably not ideal for this workload).


I have also successfully run gmetad writing to a tmpfs, and every 5 minutes
rsync the tmpfs to non-volatile storage. If you cannot get fast enough disk
(SSD is ideal), then perhaps that will help.


On 16 July 2013 03:50, Nikhil <[email protected]> wrote:

> Hi Michael,
>
> thanks for the reply.
>
> You are right, its mostly been disk IO issue. I have had some help on the
> network side of gmond by running all of them in deaf mode but couple of the
> nodes designating them as data source nodes as no-deaf node configuration.
> I have looked at rrdcached and ran it for a while, initially it looked
> promising. After adding more metrics, even rrdcached does not really seem
> to help with disk IO problem.
> I could think of having different data source uses different disks (rrd
> location by using symlinks) to have the disk IO problem alleviated a little
> bit but rrdcached seems to be using with only one disk but not multiple
> locations using symlinks.
> Are there any specific options with respect to rrdcached that you would
> suggest to help ?
>
> Thanks,
> Nikhil
>
>
> On Mon, Jul 8, 2013 at 9:04 AM, Michael Shearer 
> <[email protected]>wrote:
>
>> Hi Nikhil,
>>
>> Is the machine running gmetad exhibiting high wait on I/O? I have seen
>> periodic blanks in graphs on servers where the disk I/O was too high
>> writing the RRDs, and so not all of them got updated, leading to missing
>> data in the graphs. If this is the case, you can look at rrdcached to
>> decrease the disk I/O load.
>>
>> If it is specifically multicast related then I cannot help, I have only
>> ever used ganglia in unicast configuration.
>>
>> Cheers, Michael.
>>
>>
>>  On 7 July 2013 02:10, Nikhil <[email protected]> wrote:
>>
>>>  resending it, after getting added to the group.
>>>
>>> Hi,
>>>>
>>>> I have got ganglia enabled recently using multicast configuration for
>>>> couple of clusters, one of them is large node clusters typically some good
>>>> part of them are nodes having lots of metrics in the range of 600-800.
>>>>
>>>> After enabling only the default ganglia metrics, the graphs are good
>>>> but once the custom system metrics are flowing using the gmond multicast
>>>> configuration, graphs tend to be broken in the sense they are
>>>> intermittently blank.
>>>> Quite frequently, almost all of the hosts in the cluster are down and
>>>> for a while they come up again.
>>>>
>>>> I am using default poll interval of 15s in gmetad.conf for a cluster
>>>> data_source and they are more than 1 data sources configured for a cluster,
>>>> although I see that only other data sources are used incase the first one
>>>> is not reachable over the gmond port.
>>>> BTW, I also have enabled the gmond buffer to 10M. I am not sure how do
>>>> I calculate the exact buffer required for gmond with respect to the number
>>>> of exhaustive metrics in the cluster for lots of nodes. If there is a
>>>> relation, like combination of the data types being used for all the metrics
>>>> on a node and the frequency intervals, then I will make up for it. Please
>>>> do let me know in this regard.
>>>>
>>>> Earlier there was a problem with the web interface, error 400 crept in.
>>>> After looking into apache error logs, it was found to be with the memory
>>>> setting of php. So, I increased good amounts for php memory limits for
>>>> ganglia in apache configuration, so that seems to be okay. But the graphs
>>>> being intermittently blank (quite frequently) and hosts showing as down in
>>>> the cluster view is little irritating.
>>>>
>>>> I am wondering if there are any settings that are to be considered for
>>>> optimal usage of the ganglia in large clusters using multicast
>>>> configuration.
>>>>
>>>> Thanks,
>>>> Nikhil
>>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by Windows:
>>>
>>> Build for Windows Store.
>>>
>>> http://p.sf.net/sfu/windows-dev2dev
>>> _______________________________________________
>>> Ganglia-developers mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/ganglia-developers
>>>
>>>
>>
>

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk

_______________________________________________
Ganglia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Re: [Ganglia-developers] Ganglia graphs broken

Reply via email to