Arjun,

I am glad that you brought this up since I am thinking about similar issues. I haven't had time to do any real work yet, but I did have some ideas. I thought perhaps I could pass them along in case they would be helpful to you. (And then maybe you can weed out my bad ideas for me ;-)

I had done some previous calculations like Ramon and Martin to see how many metrics would be reported per minute. It is quite a few. So my first thought was that I would decrease the frequency to no more that once per minute for any metric. That would be sufficient for my needs for the "important" metrics like load and free memory. For things like cpu_num, etc. (which by default are collected every 20 min), I would increase the time to something like once a day. In that scenerio, I would only get about 25 metrics/min/node (assuming the default metric set is used). This is about one million per day total for a 300 node cluster, and that number could be decreased even more if I identified the "required" metrics and eliminate the others.

As far as getting those values into a database, I had four ideas:

1) Hack gmetad - Rewrite the function that puts values into the rrds, and make it put values into MySQL (this was previously mentioned in this mailing list).


2) Write a gmetad-like program - Write a new program that periodically queried gmond (just like gmetad does) and have it write the values to MySQL. (This too was mentioned in this mailing list.)


3) Hack gmond - Since gmond is written to collect the metrics anyway, why not use it as a starting point? It would act like a normal mute gmond that just sits there and listens for metrics, and the you could write an extra bit that periodically pushes the data out to MySQL. (I have no idea how easy/hard this would be.)

One good thing about this approach is that you wouldn't have to record the same data twice. For example, an idle node might not have many metrics that surpass the configured threshold, so maybe it only retransmits the values every 10 minutes instead of every 1 minute. The hacked gmond could write that value once instead of the previous value 10 times. Or the "normal" gmond clients could be configured to always send a value every 1 minute (ignoring thesholds), and the hacked version could enforce thresholds to decide when data should be stored in MySQL.


4) Use the rrds - Keep it simple. Use the ability in gmetad.conf to modify the rrd archive "format". Make it keep one value every minute for a 24 hour period, and then configure gmetad to collect data every minute (instead of the default 15 secs). The new rrds would keep 1440 values for each metric (as opposed to the 966 values by default). Then periodically copy the rrds over to another machine and write a script that just digs through them, pulls out new values, and puts them into the database.


The last idea is probably the simplest to implement, but it does have several drawbacks. For one, you won't have the most current data in MySQL. It would still be available through the web interface, but that won't help you if you need to so a SQL query on it. I don't know how much of an issue that is for you.


So the scoreboard is:

Ideas = 4
Implementations = 0

:-)

Hopefully some of this is useful to you. I look forward to seeing how you accomplish your task.

-- Rick

--------------------------
Rick Mohr
Systems Developer
Ohio Supercomputer Center

On Thu, 23 Feb 2006, Martin Knoblauch wrote:

Arjun, Ramon,

my numbers look a bit different, but equally disturbing:

lets assume 300 hosts with 36 metrics. I would not look at the RRD
format, but just store samples as they come from gmond.

That means we have 300x36 values per sample. About 12000.

Now lets assume the same sample rate as the 1hour resolution in RRD.
That gives 300x36x240. about 2.6 mio values per hour.

About 62.3 mio values per day
About 22.8e9 values per year

That is a lot of capacity and a lot of needed performance.

Of course, a lot of the metrics in Ganglia are not that interesting to
most people, or do not need the 15sec  resolution.

Cheers
Martin

--- Ramon Bastiaans <[EMAIL PROTECTED]> wrote:

Arjun,

I think performance of this database will be a HUGE issue, depending
on
how many metrics/hosts/clusters and timespan that you wish to store.

Now please correct me if I'm wrong, but lets make a estimated
calcution
on what is going to be stored in the database.
Let's make a few assumptions; you are only storing the default
Ganglia
metrics and no custom/extra metrics and you have a cluster of 300
hosts.
Also, you don't archive any values and you use the same graph
resolution/value scheme as used by the RRD's from gmetad (the same
static amount of rotating values per resolution: hours, days, weeks,
months, years).

Metrics per host: 36
Hosts per cluster: 300

Now let's make a quick estimate on how many values you are going to
store in mySql.
Ganglia uses about 240 rows per resolution and 370 rows for the year
summary, this is 1330 rows per metric, per host.

1330 * 36 * 300 = 14364000 values.

This comes down to allmost 15 million values in your database, when
using the same style of value storing as currently done by Ganglia in
RRD's.

Now if you add:
- extra hosts: 1330 * 36 = 47880 values/host
- extra metrics: 1330 * 300 = 399000 values/metric

I don't know your particular setup, but here at SARA we monitor about

1800 machines in total with more than 50 metrics per host.
A quick estimate would come to 120 million values in the database.

Now imagine quering/selecting from such a database....
The performance would seem hell to me, making it totally unusable
from
the web environment where you want the values.

Also take into account that normally on the web frontend, graphs are
generated by RRD itself. But now you are using a database, so if you
want RRDTool to draw the graphs for you, you need to convert your
database values back to some sort of RRD format. This means a lot of
query'ing and converting of those values, each time a
(host/metric/whatever) graph is requested. This would also require
additional hacks (or changes) to the web frontend code as well.

I think you might need insane if not impossible hardware to support
such
a (database) setup, but anyone correct me if I'm wrong.

Kind regards,
- Ramon.

Arjun wrote:
In my case the monitoring db will be on a separate machine along
with
gmetad. I'm monitoring a cluster so can have a
separate (external) machine to store data on so I guess this will
not
be a performace bottleneck if I have a DB like
MySql to store and retreive data.

thanks
Arjun

--
ing. R. Bastiaans            HPC - Systems Programmer

SARA - Computing and Networking Services
Kruislaan 415                PO Box 194613
1098 SJ Amsterdam            1090 GP Amsterdam
Tel. +31 (0) 20 592 3000     Fax. +31 (0) 20 668 3167
---
There are really only three types of people:

  Those who make things happen, those who watch things happen
  and those who say, "What happened?"



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting
language
that extends applications into web and mobile media. Attend the live
webcast
and join the prime developer group breaking into this new coding
territory!

http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Ganglia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-developers




------------------------------------------------------
Martin Knoblauch
email: k n o b i AT knobisoft DOT de
www:   http://www.knobisoft.de


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Ganglia-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-developers



Reply via email to