> -----Original Message-----
> From: Marc Powell [mailto:[email protected]]
> Sent: Sunday, September 26, 2010 11:27 AM
> To: Nagios Users List
> Subject: Re: [Nagios-users] Alleviating Nagios i/o contention problem
> 
> 
> On Sep 25, 2010, at 10:53 AM, Max wrote:
> 
> > I like the suggestions Matthias makes; those suggestions have worked
> > well for us.
> >
> > RRD updates are very expensive - I am pretty sure without knowing
> > anything more about your system that the RRD writes are causing most
> > of the I/O load.
> 
> I no longer have access to this system but my experience has been
> otherwise. We were running a nagios install with nearly 10,000 services
> received by external pollers every 5 minutes, and a cricket install on
> the same machine polling/updating 100,000+ rrd files during the same
> interval. This was on a Poweredge 6850, 5 disk RAID-5.
> RRDtool itself writes very little data to disk. I think it's 8 Bytes
> per DS per RRA updated. Linux, though, wants to write 4KB chunks at a
> time so it performs a read-modify-write of 4KB just to update those 8
> Bytes.
> 
> The OP can reduce his IO load particularly for RRD updates and help
> Linux better organize it's writes to disk by ensuring that he has
> enough RAM to keep key information for each RRD file in the filesystem
> cache. The OP will need at least 8K * number of rrd files available to
> be used as filesystem buffer cache.
> 
> --
> Marc



Thanks very much to all who replied (Breandan, Marc, Max and Matthias, this 
means you! :-) ).

- I can't say exactly how many checks create perfdata (we have a very 
heterogeneous set of check types).  I can see 9K files in the graph data 
filesystem, so that would be about 4,500.

- I'm not running updates through syslog.  I don't have root on these machines 
so that would not be helpful to me.  I will have to double-check, but I don't 
believe that I have writing to the pnp4nagios turned on, except maybe for the 
lowest level.  I don't recall it logging much of anything at that level, but as 
I say,  I'll check.

- According to our performance analysis team, these servers have way more RAM 
that they're actually using so I wouldn't think I'm limited by the Linux disk 
cache here.  Perhaps it's just the hardware we have (the i/o rates on a 
3-year-old Dell 2950 with a single RAID 5 set) that makes this particularly bad 
for us.  Perhaps on faster hardware we'd not even notice.

- I would assume that the rrdcached was built for a reason (i.e. this i/o issue 
was observed at least somewhere) so it's definitely an avenue I want to try out.

- The ramdisk idea is also interesting.   I'm curious though, about why one 
would want to rsync it back to the local disk periodically.  It's just a 
run-time status file, right?  Unless I misread the docs, it goes away when 
Nagios is shut down.  What would having a local disk copy of status.dat benefit 
me?  Also, nagios.log isn't written to that often in our case (we don't log 
passive check results, for example).  I'm not sure I'd see the benefit for us 
in putting that on ramdisk.  Although... we do have Splunk watch that file so 
that would be some additional read overhead I guess.


Thanks!

Mark

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
Nagios-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting 
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Reply via email to