Thanks!  That would definitely address this problem.  Meanwhile we're
getting fantastic result with the python script we wrote.

Maziyar


On Wed, Apr 24, 2013 at 6:38 AM, Vladimir Vuksan <[email protected]> wrote:

> I imagine that opening and closing TCP connections for each metric doesn't
> scale. Few days ago we merged a pull request that uses UDP to send metrics
> to Carbon
>
> https://github.com/ganglia/**monitor-core/pull/101<https://github.com/ganglia/monitor-core/pull/101>
>
> that should be far more scalable.
>
> Vladimir
>
> On Tue, 23 Apr 2013, Maziyar Mirabedini wrote:
>
>  I did more testing and research on this.  We're obviously hitting a
>> bottleneck on this.Monitored the logs and found that the problem is that
>> gmetad
>>
>> opens and closes a connection for every metric it wants to send to
>> Graphite. Also you can only specify one Carbon server and port so we were
>> stuck.
>>
>> We were able to write a python script that went directly to Ganglia port
>> on a server for each cluster, gather metrics and package all metrics for
>> each server into one message and send the metrics to Graphite. On
>> Graphite side, we have one carbon-relay and 6 carbon-cache setup with rules
>> to
>> point each cluster to a carbon-cache.  We have 5 scripts running at the
>> same time and its able to gather, parse and send everything to carbon and
>> carbon write it to disk within 10 seconds.  This is a huge improvement.
>>
>> We'll make the script available sometime soon..
>>
>>
>>
>> On Mon, Apr 22, 2013 at 10:33 AM, Maziyar Mirabedini <
>> [email protected]> wrote:
>>       Hi there,
>>
>> I recently set up a server that hosts both Ganglia 3.5,  Graphite 0.9.10,
>> RRDTool 1.4.7 with RRDCACHED enabled and configured. Then I set up
>> the integration between Ganglia and Graphite by setting the
>> carbon_server, carbon_port and prefix in gmetad conf file.
>>
>> Configured the heartbeat for each cluster to happen every 60 seconds. The
>> first RRA is configured such that it keep data every 60 seconds for a
>> week.
>>
>> Since this server is only for monitoring and we have tons of metrics for
>> each server I modified the Carbon conf file to have the following:
>>
>> MAX_UPDATES_PER_SECOND = inf
>>
>> MAX_CREATES_PER_MINUTE = 1000000
>>
>> MAX_QUEUE_SIZE = 100000
>>
>> Whisper retention is configured such that it matches Ganglia.
>>
>> Once the services were started I found that:
>>
>> 1) All RRD files got created for the clusters and servers.  At this point
>> the server is monitoring 5 clusters and in total 94 servers and
>> roughly 500 metrics per server.
>>
>> 2) Fetching the data from RRDs show that gmetad is able to update every
>> single RRD on time and the data points are there every 1 min.
>>
>> 3) All metrics are appropriately created in Graphite.
>>
>> 4) Noticed that Graphite metrics are not updated as often as RRDs.  The
>> updates to metrics seem to happen sporadically.  Sometimes one metric
>> is updated every 2mins other times it wouldn't get updated for another 6
>> mins. I haven't seen the metric get updated every 1 min as per RRD
>> retention consistently.
>>
>> I confirmed this by doing a fetch on both whisper and RRDs. doing a tail
>> on Graphite's update log I can see tons of updates going through ..
>> but maybe its just not fast enough??
>> I don't see any errors in /var/log/messages.
>>
>> any help would be really appreciated!
>>
>> Thanks!
>>
>>
>>
>>
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Ganglia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to