On Mon, Apr 27, 2009 at 07:18:24PM +0300, Ghirai wrote:
> Hi,
> 
> I'm running a RAID1 setup with gmirror and geli (AES-128) on top of
> that.
> While searching for ways to improve read performance, i found some
> posts (on kerneltrap i think) about vfs.max_read.
> 
> The author suggested that increasing the default value of 8 to 16
> resulted in increased read speed, and that increasing it further
> resulted in no noticeable performance gain.
> 
> Results are below.
> 
> Starting with vfs.read_max=32:
> 
> triton# dd if=a.iso of=/dev/null bs=3M
> 1129+1 records in
> 1129+1 records out
> 3554287616 bytes transferred in 176.825898 secs (20100492 bytes/sec)
> 
> triton# sysctl vfs.read_max=64
> vfs.read_max: 32 -> 64
> 
> triton# dd if=a.iso of=/dev/null bs=3M
> 1129+1 records in
> 1129+1 records out
> 3554287616 bytes transferred in 162.943189 secs (21813048 bytes/sec)
> 
> triton# sysctl vfs.read_max=128
> vfs.read_max: 64 -> 128
> 
> triton# dd if=a.iso of=/dev/null bs=3M
> 1129+1 records in
> 1129+1 records out
> 3554287616 bytes transferred in 149.313994 secs (23804116 bytes/sec)
> 
> triton# sysctl vfs.read_max=256
> vfs.read_max: 128 -> 256
> 
> triton# dd if=a.iso of=/dev/null bs=3M
> 1129+1 records in
> 1129+1 records out
> 3554287616 bytes transferred in 150.466241 secs (23621828 bytes/sec)
> 
> Here is seems to have hit a wall. Going a bit down to 192 results in
> almost exactly the same numbers, so the best value seems to be 128.
> As i read, vfs.read_max means 'cluster read-ahead max block count'.
> Does it read ahead the stuff into some memory? If so, can that memory
> size be increased via sysctl?

IIRC, if it gets a read request, it reads vfs.read_max extra clusters
into the vfs cache, to improve subsequent reads. This won't do much if
you're reading a lot of small files scattered around the disk.
 
> Does the improvement in performance have to do with my particular setup
> (gmirror+geli)?

In my experience, gmirror is slow (see below). If you have multiple cores, geli
isn't much of an issue. On a single-core machine it can become a bottleneck.

> I thought i'd share the results and maybe get a discussion going in
> this direction.
> 
> Test was done on a pair of SATA300 HDs spinning at 7200rmp (which are
> seen as SATA150 by the OS for some reason; i couldn't fix it from the
> BIOS, so it must be the mobo), and 7.1-RELEASE, i386.

It doesn't matter much if your disk is seen as SATA 1.5 Gbit/s or 3
Gbit/s. A current rotating harddisk cannot max out a SATA 1.5 Gbit/s
connection, see [http://en.wikipedia.org/wiki/Serial_ATA] (A flash-based
drive can, though).

- Intel ICH7 SATA 3Gbit/s controller
- WDC WD5001ABYS-01YNA0 (500,107,862,016 bytes)
- FreeBSD 7.2-PRERELEASE amd64
- no mirroring or encryption on this partition.

My results:

sysctl vfs.read_max=8
dd if=/tmp/var-0-20090426.dump of=/dev/null bs=3M
69+1 records in
69+1 records out
217405440 bytes transferred in 2.762058 secs (78,711,395 bytes/sec)

(I added the commas to the bytes/sec figure for readability)

Try it again:
dd if=/tmp/var-0-20090426.dump of=/dev/null bs=3M
69+1 records in
69+1 records out
217405440 bytes transferred in 0.119592 secs (1,817,893,575 bytes/sec)

This large figure on the second try is probably an effect of the disk's
and/or vfs cache! All following reads are done after another huge file
was read to try and eliminate cache effect.

sysctl vfs.read_max=16
dd if=/tmp/usr-0-20090426.dump.bz2 of=/dev/null bs=3M
728+1 records in
728+1 records out
2292555598 bytes transferred in 29.368194 secs (78,062,532 bytes/sec)

sysctl vfs.read_max=32
dd if=/tmp/root-0-20090426.dump of=/dev/null bs=3M
32+1 records in
32+1 records out
101068800 bytes transferred in 1.276318 secs (79,187,799 bytes/sec)

sysctl vfs.read_max=64
dd if=/tmp/usr-0-20090426.dump of=/dev/null bs=3M
1753+1 records in
1753+1 records out
5516308480 bytes transferred in 70.226765 secs (78,549,944 bytes/sec)

sysctl vfs.read_max=128
dd if=/tmp/usr-0-20090426.dump of=/dev/null bs=3M
1753+1 records in
1753+1 records out
5516308480 bytes transferred in 71.032365 secs (77,659,085 bytes/sec)

So, for large reads not much difference. vfs.read_max=32 looks
best. Let's try a smaller block size.

sysctl vfs.read_max=8
dd if=/tmp/root-0-20090426.dump of=/dev/null bs=256k
385+1 records in
385+1 records out
101068800 bytes transferred in 1.391538 secs (72,631,008 bytes/sec)

sysctl vfs.read_max=16
dd if=/tmp/usr-0-20090426.dump.bz2 of=/dev/null bs=256k
8745+1 records in
8745+1 records out
2292555598 bytes transferred in 29.736135 secs (77,096,623 bytes/sec)

sysctl vfs.read_max=32
dd if=/tmp/var-0-20090426.dump of=/dev/null bs=256k
829+1 records in
829+1 records out
217405440 bytes transferred in 2.753552 secs (78,954,544 bytes/sec)

sysctl vfs.read_max=64
dd if=/tmp/usr-0-20090426.dump of=/dev/null bs=256k
21043+1 records in
21043+1 records out
5516308480 bytes transferred in 71.165780 secs (77,513,497 bytes/sec)

sysctl vfs.read_max=256
dd if=/tmp/var-0-20090426.dump of=/dev/null bs=256k
829+1 records in
829+1 records out
217405440 bytes transferred in 2.751325 secs (79,018,447 bytes/sec)

So for this partition, vfs.read_max=32 seems to be optimal. Negligable CPU load.

Now, reading from a GELI encrypted partition:

sysctl vfs.read_max=32
dd if=usr-0-20090426.dump.bz2 of=/dev/null bs=256k
6572+1 records in
6572+1 records out
1722903675 bytes transferred in 22.951109 secs (75,068,428 bytes/sec)

CPU load on Core2 Quad Q9300 is hovering at around 25-30%.

sysctl vfs.read_max=64
dd if=film.avi of=/dev/null bs=256k
2703+1 records in
2703+1 records out
708665574 bytes transferred in 9.892170 secs (71,639,042 bytes/sec)

sysctl vfs.read_max=256
dd if=film2.avi of=/dev/null bs=256k
3057+1 records in
3057+1 records out
801435148 bytes transferred in 11.061225 secs (72,454,466 bytes/sec)

Again, vfs.read_max=32 seems about right.

I dropped gmirror in favor of running an rsync to the second disk at
night because gmirror is kinda slow. I saw the same performance as you
did with the combination of gmirror and geli.

Roland
-- 
R.F.Smith                                   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)

Attachment: pgpCizXIc83Qq.pgp
Description: PGP signature

Reply via email to