I've done this for a couple of large site installations with more than 100,000 targets being monitored. It works pretty well, though I've had varying levels of success when the NFS server is also doing its own work; in many ways, having a dedicated NFS server seems to be more stable, with the polling and graphing functions on their own hardware as well.
As long as only one process on one box is writing to a given RRD file at a time, you don't have to worry about the thread-safe-ness. In general, the SNMP polling portion is very lightweight; you'll find that there's not much benefit to trying to split that off from the rrd-update process, as you generate just about as much load sending the results of the SNMP polls from your poller to the rrd-update box as if the rrd-update box did the SNMP request itself. So, I'd recommend having two pools of boxes; your SNMP poller/rrd-update boxes, split so that a given device is polled from just one of your polling boxes at at time; but have the files stored on a common NFS volume, served by a separate NFS server. Then, have a second pool of servers for doing your user presentation layer, whether it be 14all, mrtg.cgi, weathermap, etc. That way, as you add more targets/devices, you can scale your polling/rrd-update layer horizontally, and as you add more viewers/users you can scale your presentation layer horizontally. And if you need additional NFS IO-ops, you can add a second NFS server, and split your rrd files across two volumes, both of which are mounted by the various back-end and front-end boxes as necessary. This is something that's worked well for me in the past, but every situation is different. Matt ----- Original Message ---- > From: Kristoff Bonne <[email protected]> > To: [email protected] > Sent: Mon, January 25, 2010 6:03:32 AM > Subject: [mrtg] "rrdupdate" on load-balancing cluster > > Hi, > > > We are running a couple of servers running "mrtg" on a large number of > devices (+20000 rrd files in total), used for both "mrtg" (14all) and > "weathermap" applications. > > I'm thinking of trying to implement the concept of a computer-cluster > for this to make this more robust and future-proof. > > > The basic idea would be to "seperate" the three different elements of > network-monitoring: > > - First, a number of "polling" boxes would gather the information from > the network-elements. > > - After that, these boxes would fire up a "rrd-update" command to a > number of "rrd-servers" which would contain the rrd-files. > > - The RRD-files would then be made available to the "web-frontend" > (running 14all and weathermap), probably via NFS. > > > > Two questions: > > 1: I do not think I will be the first person to think of this. > > Is anybody aware of any implementations like this? > Does there exist a client-server version of the "rrd-tools"? > > > > > 2: Looking in the librrd API documenation, I found this troublesome > remark concerning threads: > > /* NOTE: rrd_update_r are only thread-safe if no at-style time > specifications get used!!! */ > > What exactly does this mean? > > If I want to write a "rrdupdate-deamon" myself, it needs to run in > threads-mode and it must use be able to use timestamps! > > > Does this mean that this is completely impossible, or are there ways > around this. > > If I would add a piece of code that implements a MUTEX based on the > file-name of the rrd-file being updated, would this be enough to support > rrd-updates with timestamp? > > > > Cheerio! Kr. Bonne. > > _______________________________________________ > mrtg mailing list > [email protected] > https://lists.oetiker.ch/cgi-bin/listinfo/mrtg _______________________________________________ mrtg mailing list [email protected] https://lists.oetiker.ch/cgi-bin/listinfo/mrtg
