Jason A. Smith wrote:
If you really want long term storage of the raw or nearly raw data then
rrdtool is probably not the right tool to use. You would be better off
writing your own ganglia frontend client that would collect the xml data
from gmetad at the interval you need, parse it and store it into some
other database or archive. This could also be done from another
computer so it would have a negligible impact on the gmetad host.
~Jason
I have thought about this too.
The problem with this is the fact that if I go to something SQL-ish or
similar, I will have to store about 25+ billion rows (<43 metrics> *
<275 hosts> * <1 year of seconds>) because I'd want to store for about 1
year's worth of metrics, of the detailed view. Meaning a new value every
15 seconds per host per metric.
I am having nightmare's allready about working with a SQL database with
25+ billion rows, I doubt it will ever work on the hardware I have
available for the project.
It would allmost be more useable (performance and storage wise) to just
write additional .rrd files in the same manner gmetad does and perhaps
use a ramdisk for this.
I agree a SQL dbase would be much more desireable, however I am very
tempted to just write a tool that grabs the xml and stores it in
additional rrd's. However it sure is beyond the whole concept of round
robin databases to use it for the archiving of data.
If you have a good idea or suggestion on how to store the amount of data
efficiently, without needing a extra cluster just to store and use the
values, I would love to hear it.
- Ramon.