Recently we have encountered some "intresting" problems while using RRDtool derivative for large scale data collection.
Setup: RRDtool -- based on rrdtool 1.0.28 with several modifications. Bug fixes, export to database, percentile, STDDEV, moving average function, ability to use RPN without producing graphs,millisecond resolution But rrd_update function is essentially the same. 22000 Interfaces. Each interface has 10 datasources (in/out octets,packets,errors,discards,avail,queue length) Each interface is stored in separate RRD file. RRD files have custom resolution. Most have 180s step, 1/3 of them have 30s step. About 4M disk space per file. ~100GB total disk space. System runs on Sun V880 with 4Ultra III CPU, 8 GB RAM and 6 disk IBM Fastt2000 RAID5 controller. Data collection is done with our own frontend. The frontend is a major rewrite of Cricket with lots of cool stuff. The data collection can be done either with several processes (~20) or with smaller number of processes(3-5) with attached SNMP slaves. Usual turnaround time is 120-300 seconds for all the interfaces. You can get our version of rrdtool at http://percival.sourceforge.net Problem: At about 17K interfaces we found that collection has slowed to the crawl. Collection time for all interfaces become much more then required 300s. Further investigation revealed that we are spending most time waiting for disk. After performing usual Solaris tunes such as verified DNLC cache, inode cache, etc etc... All to no avail. After the source review we found following problems: - for every update we have to open then close file. - We have to read metadata information. (static head plus dynamic definitions) - rrdtool using buffered stdio functions. However there is absolutely no need for the buffering since the io is random. Also solaris does not support more then 255 open files using stdio functions. - Number of seeks and writes per each update can be drastically reduced. We tried and tested several approaches on both Solaris and Linux. In every case file opened only once and closed upon collector exit. - Improved read/write. Metadata are read once upon file open. pwrite() is used to write data back to file. We tested that pwrite is faster then lseek() and write(). - Improved read/write with metadata mmap'ed. We managed to get it working only on Linux. Performance wise this solution is about 20-30% faster then pure read/pwrite(). - Fully mmaped file. This proved to be the worst possible idea. Again this was tried on Linux only. Possible reason is that msync sync full page which is 4K while pwrite can only write 512 byte sector. This was confirmed with iostat. In the end we have upgraded RRDtool, upped number of available descriptors and the problem magically went away. Our estimation that we can handle about 30-40K of interfaces on the same hardware. The bottom line is that RRDtool produces a lot of random io and the collection time is bound by disk average seek time multiplied by number of interfaces. Our modification reduced number of seeks by several times but it did not overcome fundamental problem. In my opinion further advance in speed will require modification of RRD datastructure. Also I am very surprised that SNMP collection/CPU usage did not become a bottleneck before the disk. According to RTG article this was supposed to be a major problem. On the other hand our collector has about the same performance as RTG even though it is written in perl. P.S. Note that our archive size is about 10-20 times bigger then with MRTG default because we store more data at higher precision. Sasha Mikheev Avalon Net Ltd, CTO +972-50-481787, +972-4-8210307 http://www.avalon-net.co.il -- Unsubscribe mailto:[EMAIL PROTECTED] Help mailto:[EMAIL PROTECTED] Archive http://www.ee.ethz.ch/~slist/rrd-developers WebAdmin http://www.ee.ethz.ch/~slist/lsg2.cgi
