I imagine that opening and closing TCP connections for each metric doesn't scale. Few days ago we merged a pull request that uses UDP to send metrics to Carbon

https://github.com/ganglia/monitor-core/pull/101

that should be far more scalable.

Vladimir

On Tue, 23 Apr 2013, Maziyar Mirabedini wrote:

I did more testing and research on this.  We're obviously hitting a bottleneck 
on this.Monitored the logs and found that the problem is that gmetad
opens and closes a connection for every metric it wants to send to Graphite. 
Also you can only specify one Carbon server and port so we were stuck.

We were able to write a python script that went directly to Ganglia port on a 
server for each cluster, gather metrics and package all metrics for
each server into one message and send the metrics to Graphite. On Graphite 
side, we have one carbon-relay and 6 carbon-cache setup with rules to
point each cluster to a carbon-cache.  We have 5 scripts running at the same 
time and its able to gather, parse and send everything to carbon and
carbon write it to disk within 10 seconds.  This is a huge improvement.

We'll make the script available sometime soon..



On Mon, Apr 22, 2013 at 10:33 AM, Maziyar Mirabedini <[email protected]> 
wrote:
      Hi there,

I recently set up a server that hosts both Ganglia 3.5,  Graphite 0.9.10, 
RRDTool 1.4.7 with RRDCACHED enabled and configured. Then I set up
the integration between Ganglia and Graphite by setting the carbon_server, 
carbon_port and prefix in gmetad conf file.

Configured the heartbeat for each cluster to happen every 60 seconds. The first 
RRA is configured such that it keep data every 60 seconds for a
week. 

Since this server is only for monitoring and we have tons of metrics for each 
server I modified the Carbon conf file to have the following:

MAX_UPDATES_PER_SECOND = inf

MAX_CREATES_PER_MINUTE = 1000000

MAX_QUEUE_SIZE = 100000

Whisper retention is configured such that it matches Ganglia.

Once the services were started I found that:

1) All RRD files got created for the clusters and servers.  At this point the 
server is monitoring 5 clusters and in total 94 servers and
roughly 500 metrics per server.

2) Fetching the data from RRDs show that gmetad is able to update every single 
RRD on time and the data points are there every 1 min.

3) All metrics are appropriately created in Graphite.

4) Noticed that Graphite metrics are not updated as often as RRDs.  The updates 
to metrics seem to happen sporadically.  Sometimes one metric
is updated every 2mins other times it wouldn't get updated for another 6 mins. 
I haven't seen the metric get updated every 1 min as per RRD
retention consistently.

I confirmed this by doing a fetch on both whisper and RRDs. doing a tail on 
Graphite's update log I can see tons of updates going through ..
but maybe its just not fast enough??
I don't see any errors in /var/log/messages.

any help would be really appreciated!

Thanks!



------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Ganglia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-developers

Reply via email to