On Mon, Apr 27, 2009 at 07:18:24PM +0300, Ghirai wrote: > Hi, > > I'm running a RAID1 setup with gmirror and geli (AES-128) on top of > that. > While searching for ways to improve read performance, i found some > posts (on kerneltrap i think) about vfs.max_read. > > The author suggested that increasing the default value of 8 to 16 > resulted in increased read speed, and that increasing it further > resulted in no noticeable performance gain. > > Results are below. > > Starting with vfs.read_max=32: > > triton# dd if=a.iso of=/dev/null bs=3M > 1129+1 records in > 1129+1 records out > 3554287616 bytes transferred in 176.825898 secs (20100492 bytes/sec) > > triton# sysctl vfs.read_max=64 > vfs.read_max: 32 -> 64 > > triton# dd if=a.iso of=/dev/null bs=3M > 1129+1 records in > 1129+1 records out > 3554287616 bytes transferred in 162.943189 secs (21813048 bytes/sec) > > triton# sysctl vfs.read_max=128 > vfs.read_max: 64 -> 128 > > triton# dd if=a.iso of=/dev/null bs=3M > 1129+1 records in > 1129+1 records out > 3554287616 bytes transferred in 149.313994 secs (23804116 bytes/sec) > > triton# sysctl vfs.read_max=256 > vfs.read_max: 128 -> 256 > > triton# dd if=a.iso of=/dev/null bs=3M > 1129+1 records in > 1129+1 records out > 3554287616 bytes transferred in 150.466241 secs (23621828 bytes/sec) > > Here is seems to have hit a wall. Going a bit down to 192 results in > almost exactly the same numbers, so the best value seems to be 128. > As i read, vfs.read_max means 'cluster read-ahead max block count'. > Does it read ahead the stuff into some memory? If so, can that memory > size be increased via sysctl?
IIRC, if it gets a read request, it reads vfs.read_max extra clusters into the vfs cache, to improve subsequent reads. This won't do much if you're reading a lot of small files scattered around the disk. > Does the improvement in performance have to do with my particular setup > (gmirror+geli)? In my experience, gmirror is slow (see below). If you have multiple cores, geli isn't much of an issue. On a single-core machine it can become a bottleneck. > I thought i'd share the results and maybe get a discussion going in > this direction. > > Test was done on a pair of SATA300 HDs spinning at 7200rmp (which are > seen as SATA150 by the OS for some reason; i couldn't fix it from the > BIOS, so it must be the mobo), and 7.1-RELEASE, i386. It doesn't matter much if your disk is seen as SATA 1.5 Gbit/s or 3 Gbit/s. A current rotating harddisk cannot max out a SATA 1.5 Gbit/s connection, see [http://en.wikipedia.org/wiki/Serial_ATA] (A flash-based drive can, though). - Intel ICH7 SATA 3Gbit/s controller - WDC WD5001ABYS-01YNA0 (500,107,862,016 bytes) - FreeBSD 7.2-PRERELEASE amd64 - no mirroring or encryption on this partition. My results: sysctl vfs.read_max=8 dd if=/tmp/var-0-20090426.dump of=/dev/null bs=3M 69+1 records in 69+1 records out 217405440 bytes transferred in 2.762058 secs (78,711,395 bytes/sec) (I added the commas to the bytes/sec figure for readability) Try it again: dd if=/tmp/var-0-20090426.dump of=/dev/null bs=3M 69+1 records in 69+1 records out 217405440 bytes transferred in 0.119592 secs (1,817,893,575 bytes/sec) This large figure on the second try is probably an effect of the disk's and/or vfs cache! All following reads are done after another huge file was read to try and eliminate cache effect. sysctl vfs.read_max=16 dd if=/tmp/usr-0-20090426.dump.bz2 of=/dev/null bs=3M 728+1 records in 728+1 records out 2292555598 bytes transferred in 29.368194 secs (78,062,532 bytes/sec) sysctl vfs.read_max=32 dd if=/tmp/root-0-20090426.dump of=/dev/null bs=3M 32+1 records in 32+1 records out 101068800 bytes transferred in 1.276318 secs (79,187,799 bytes/sec) sysctl vfs.read_max=64 dd if=/tmp/usr-0-20090426.dump of=/dev/null bs=3M 1753+1 records in 1753+1 records out 5516308480 bytes transferred in 70.226765 secs (78,549,944 bytes/sec) sysctl vfs.read_max=128 dd if=/tmp/usr-0-20090426.dump of=/dev/null bs=3M 1753+1 records in 1753+1 records out 5516308480 bytes transferred in 71.032365 secs (77,659,085 bytes/sec) So, for large reads not much difference. vfs.read_max=32 looks best. Let's try a smaller block size. sysctl vfs.read_max=8 dd if=/tmp/root-0-20090426.dump of=/dev/null bs=256k 385+1 records in 385+1 records out 101068800 bytes transferred in 1.391538 secs (72,631,008 bytes/sec) sysctl vfs.read_max=16 dd if=/tmp/usr-0-20090426.dump.bz2 of=/dev/null bs=256k 8745+1 records in 8745+1 records out 2292555598 bytes transferred in 29.736135 secs (77,096,623 bytes/sec) sysctl vfs.read_max=32 dd if=/tmp/var-0-20090426.dump of=/dev/null bs=256k 829+1 records in 829+1 records out 217405440 bytes transferred in 2.753552 secs (78,954,544 bytes/sec) sysctl vfs.read_max=64 dd if=/tmp/usr-0-20090426.dump of=/dev/null bs=256k 21043+1 records in 21043+1 records out 5516308480 bytes transferred in 71.165780 secs (77,513,497 bytes/sec) sysctl vfs.read_max=256 dd if=/tmp/var-0-20090426.dump of=/dev/null bs=256k 829+1 records in 829+1 records out 217405440 bytes transferred in 2.751325 secs (79,018,447 bytes/sec) So for this partition, vfs.read_max=32 seems to be optimal. Negligable CPU load. Now, reading from a GELI encrypted partition: sysctl vfs.read_max=32 dd if=usr-0-20090426.dump.bz2 of=/dev/null bs=256k 6572+1 records in 6572+1 records out 1722903675 bytes transferred in 22.951109 secs (75,068,428 bytes/sec) CPU load on Core2 Quad Q9300 is hovering at around 25-30%. sysctl vfs.read_max=64 dd if=film.avi of=/dev/null bs=256k 2703+1 records in 2703+1 records out 708665574 bytes transferred in 9.892170 secs (71,639,042 bytes/sec) sysctl vfs.read_max=256 dd if=film2.avi of=/dev/null bs=256k 3057+1 records in 3057+1 records out 801435148 bytes transferred in 11.061225 secs (72,454,466 bytes/sec) Again, vfs.read_max=32 seems about right. I dropped gmirror in favor of running an rsync to the second disk at night because gmirror is kinda slow. I saw the same performance as you did with the combination of gmirror and geli. Roland -- R.F.Smith http://www.xs4all.nl/~rsmith/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)
Description: PGP signature