I have a pair of Linux machines graphing a lot of switch ports (~35000 ports across 1500 devices each) on 5-minute intervals. Disk I/O seems to be my biggest problem. The CPU load tends to average about 60-70% busy. About once every 10-15 minutes, though, the kjournald and kupdated processes launch and take the machine down with it for about 5 minutes at a time. During this, top shows 'iowait' to be at 100% for both CPUs, and almost nothing will run on the machine. On a good run, it takes about 3 minutes to finish. When it lags, sometimes it won't recover for 10+ minutes.
Each machine has a single MRTG config file that the collector is run against. The config looks something like this: ----- Begin config ---- WorkDir: /home/mrtg/workdir LogFormat: rrdtool PathAdd: /usr/local/rrdtool/bin LibAdd: /usr/local/rrdtool/lib/perl options[_]: growright,bits icondir: /icons Interval: 5 Forks: 30 # port id: 11536 - switch id: 922 - ifindex: 10 Directory[922.11536]: 922 Target[922.11536]: 10:[EMAIL PROTECTED]:::1::2 SetEnv[922.11536]: MRTG_INT_IP="" MRTG_INT_DESCR="2/2" MaxBytes[922.11536]: 1250000000 Title[922.11536]: Port 2/2 -- c2948-4 (172.16.99.133) # port id: 11537 - switch id: 922 - ifindex: 11 Directory[922.11537]: 922 Target[922.11537]: 11:[EMAIL PROTECTED]:::1::2 SetEnv[922.11537]: MRTG_INT_IP="" MRTG_INT_DESCR="2/3" MaxBytes[922.11537]: 1250000000 Title[922.11537]: Port 2/3 -- c2948-4 (172.16.99.133) ... (many more similar entries) ... ----- End config ----- Hardware: Dell PowerEdge 2650 - Dual Xeon @ 3.06GHz, 2GB RAM, Hardware RAID 5 (PERC-3, aacraid driver), 5x 10000RPM U320 SCSI drives Software: Redhat AS3, ext3 filesystem, MRTG 2.10.15, RRDTool 1.0.49 So... my question is this. Have any of you tried running similar loads? And have you experienced similar problems? And do you have any ideas how to address them? Any other optimizations that might help? I found that by sorting the individual RRD files into some sort of directory structure (rather than 35000 files in a single directory) helped a lot. Sorting by switch ID was just an arbitrary choice. Also, disabling the set atime on the filesystem (chattr +A) made a little improvement. Thanks. -- Unsubscribe mailto:[EMAIL PROTECTED] Archive http://www.ee.ethz.ch/~slist/mrtg FAQ http://faq.mrtg.org Homepage http://www.mrtg.org WebAdmin http://www.ee.ethz.ch/~slist/lsg2.cgi
