I did more testing and research on this. We're obviously hitting a
bottleneck on this.
Monitored the logs and found that the problem is that gmetad opens and
closes a connection for every metric it wants to send to Graphite. Also you
can only specify one Carbon server and port so we were stuck.
We were able to write a python script that went directly to Ganglia port on
a server for each cluster, gather metrics and package all metrics for each
server into one message and send the metrics to Graphite. On Graphite side,
we have one carbon-relay and 6 carbon-cache setup with rules to point each
cluster to a carbon-cache. We have 5 scripts running at the same time and
its able to gather, parse and send everything to carbon and carbon write it
to disk within 10 seconds. This is a huge improvement.
We'll make the script available sometime soon..
On Mon, Apr 22, 2013 at 10:33 AM, Maziyar Mirabedini
<[email protected]>wrote:
> Hi there,
>
> I recently set up a server that hosts both Ganglia 3.5, Graphite 0.9.10,
> RRDTool 1.4.7 with RRDCACHED enabled and configured. Then I set up the
> integration between Ganglia and Graphite by setting the carbon_server,
> carbon_port and prefix in gmetad conf file.
>
> Configured the heartbeat for each cluster to happen every 60 seconds. The
> first RRA is configured such that it keep data every 60 seconds for a week.
>
> Since this server is only for monitoring and we have tons of metrics for
> each server I modified the Carbon conf file to have the following:
>
> MAX_UPDATES_PER_SECOND = inf
>
> MAX_CREATES_PER_MINUTE = 1000000
>
> MAX_QUEUE_SIZE = 100000
> Whisper retention is configured such that it matches Ganglia.
>
> Once the services were started I found that:
>
> 1) All RRD files got created for the clusters and servers. At this point
> the server is monitoring 5 clusters and in total 94 servers and roughly 500
> metrics per server.
>
> 2) Fetching the data from RRDs show that gmetad is able to update every
> single RRD on time and the data points are there every 1 min.
>
> 3) All metrics are appropriately created in Graphite.
>
> 4) Noticed that Graphite metrics are not updated as often as RRDs. The
> updates to metrics seem to happen sporadically. Sometimes one metric is
> updated every 2mins other times it wouldn't get updated for another 6 mins.
> I haven't seen the metric get updated every 1 min as per RRD retention
> consistently.
>
> I confirmed this by doing a fetch on both whisper and RRDs. doing a tail
> on Graphite's update log I can see tons of updates going through .. but
> maybe its just not fast enough??
> I don't see any errors in /var/log/messages.
>
> any help would be really appreciated!
>
> Thanks!
>
>
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Ganglia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-developers