Arjun,

I think performance of this database will be a HUGE issue, depending on how many metrics/hosts/clusters and timespan that you wish to store.

Now please correct me if I'm wrong, but lets make a estimated calcution on what is going to be stored in the database. Let's make a few assumptions; you are only storing the default Ganglia metrics and no custom/extra metrics and you have a cluster of 300 hosts. Also, you don't archive any values and you use the same graph resolution/value scheme as used by the RRD's from gmetad (the same static amount of rotating values per resolution: hours, days, weeks, months, years).

Metrics per host: 36
Hosts per cluster: 300

Now let's make a quick estimate on how many values you are going to store in mySql. Ganglia uses about 240 rows per resolution and 370 rows for the year summary, this is 1330 rows per metric, per host.

1330 * 36 * 300 = 14364000 values.

This comes down to allmost 15 million values in your database, when using the same style of value storing as currently done by Ganglia in RRD's.

Now if you add:
- extra hosts: 1330 * 36 = 47880 values/host
- extra metrics: 1330 * 300 = 399000 values/metric

I don't know your particular setup, but here at SARA we monitor about 1800 machines in total with more than 50 metrics per host.
A quick estimate would come to 120 million values in the database.

Now imagine quering/selecting from such a database....
The performance would seem hell to me, making it totally unusable from the web environment where you want the values.

Also take into account that normally on the web frontend, graphs are generated by RRD itself. But now you are using a database, so if you want RRDTool to draw the graphs for you, you need to convert your database values back to some sort of RRD format. This means a lot of query'ing and converting of those values, each time a (host/metric/whatever) graph is requested. This would also require additional hacks (or changes) to the web frontend code as well.

I think you might need insane if not impossible hardware to support such a (database) setup, but anyone correct me if I'm wrong.

Kind regards,
- Ramon.

Arjun wrote:
In my case the monitoring db will be on a separate machine along with gmetad. I'm monitoring a cluster so can have a separate (external) machine to store data on so I guess this will not be a performace bottleneck if I have a DB like
MySql to store and retreive data.

thanks
Arjun

--
ing. R. Bastiaans            HPC - Systems Programmer

SARA - Computing and Networking Services
Kruislaan 415                PO Box 194613
1098 SJ Amsterdam            1090 GP Amsterdam
Tel. +31 (0) 20 592 3000     Fax. +31 (0) 20 668 3167
---
There are really only three types of people:

 Those who make things happen, those who watch things happen
 and those who say, "What happened?"


Reply via email to