On 5 August 2010 22:21, Freek Dijkstra <freek.dijks...@sara.nl> wrote: > Chris, Daniel and Mathieu, > > Thanks for your constructive feedback! > >> On Thu, Aug 05, 2010 at 04:05:33PM +0200, Freek Dijkstra wrote: >>> ZFS BtrFS >>> 1 SSD 256 MiByte/s 256 MiByte/s >>> 2 SSDs 505 MiByte/s 504 MiByte/s >>> 3 SSDs 736 MiByte/s 756 MiByte/s >>> 4 SSDs 952 MiByte/s 916 MiByte/s >>> 5 SSDs 1226 MiByte/s 986 MiByte/s >>> 6 SSDs 1450 MiByte/s 978 MiByte/s >>> 8 SSDs 1653 MiByte/s 932 MiByte/s >>> 16 SSDs 2750 MiByte/s 919 MiByte/s >>> > [...] >>> The above results were for Ubuntu 10.04.1 server, with BtrFS v0.19, >> >> Which kernels are those? > > For BtrFS: Linux 2.6.32-21-server #32-Ubuntu SMP x86_64 GNU/Linux > For ZFS: FreeBSD 8.1-RELEASE (GENERIC) > > (Note that we can currently not upgrade easily due to binary drivers for > the SAS+SATA controllers :(. I'd be happy to push the vendor though, if > you think it makes a difference.) > > > Daniel J Blueman wrote: > >> Perhaps create a new filesystem and mount with 'nodatasum' > > I get an improvement: 919 MiByte/s just became 1580 MiByte/s. Not as > fast as it can, but most certainly an improvement. > >> existing extents which were previously created will be checked, so >> need to start fresh. > > Indeed, also the other way around. I created two test files, while > mounted with and without the -o nodatasum option: > write w/o nodatasum; read w/o nodatasum: 919 ą 43 MiByte/s > write w/o nodatasum; read w/ nodatasum: 922 ą 72 MiByte/s > write w/ nodatasum; read w/o nodatasum: 1082 ą 46 MiByte/s > write w/ nodatasum; read w/ nodatasum: 1586 ą 126 MiByte/s > > So even if I remount the disk in the normal way, and read a file created > without checksums, I still get a small improvement :) > > (PS: the above tests were repeated 4 times, the last even 8 times. As > you can see from the standard deviation, the results are not always very > accurate. The cause is unknown; CPU load is low.) > > > Chris Mason wrote: > >> Basically we have two different things to tune. First the block layer >> and then btrfs. > > >> And then we need to setup a fio job file that hammers on all the ssds at >> once. I'd have it use adio/dio and talk directly to the drives. >> >> [global] >> size=32g >> direct=1 >> iodepth=8 >> bs=20m >> rw=read >> >> [f1] >> filename=/dev/sdd >> [f2] >> filename=/dev/sde >> [f3] >> filename=/dev/sdf > [...] >> [f16] >> filename=/dev/sds > > Thanks. First one disk: > >> f1: (groupid=0, jobs=1): err= 0: pid=6273 >> read : io=32780MB, bw=260964KB/s, iops=12, runt=128626msec >> clat (usec): min=74940, max=80721, avg=78449.61, stdev=923.24 >> bw (KB/s) : min=240469, max=269981, per=100.10%, avg=261214.77, >> stdev=2765.91 >> cpu : usr=0.01%, sys=2.69%, ctx=1747, majf=0, minf=5153 >> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >> >=64=0.0% >> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >> >=64=0.0% >> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >> >=64=0.0% >> issued r/w: total=1639/0, short=0/0 >> >> lat (msec): 100=100.00% >> >> Run status group 0 (all jobs): >> READ: io=32780MB, aggrb=260963KB/s, minb=267226KB/s, maxb=267226KB/s, >> mint=128626msec, maxt=128626msec >> >> Disk stats (read/write): >> sdd: ios=261901/0, merge=0/0, ticks=10135270/0, in_queue=10136460, >> util=99.30% > > So 255 MiByte/s. > Out of curiousity, what is the distinction between the reported figures > of 260964 kiB/s, 261214.77 kiB/s, 267226 kiB/s and 260963 kiB/s? > > > Now 16 disks (abbreviated): > >> ~/fio# ./fio ssd.fio >> Starting 16 processes >> f1: (groupid=0, jobs=1): err= 0: pid=4756 >> read : io=32780MB, bw=212987KB/s, iops=10, runt=157600msec >> clat (msec): min=75, max=138, avg=96.15, stdev= 4.47 >> lat (msec): min=75, max=138, avg=96.15, stdev= 4.47 >> bw (KB/s) : min=153121, max=268968, per=6.31%, avg=213181.15, >> stdev=9052.26 >> cpu : usr=0.00%, sys=1.71%, ctx=2737, majf=0, minf=5153 >> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >> >=64=0.0% >> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >> >=64=0.0% >> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >> >=64=0.0% >> issued r/w: total=1639/0, short=0/0 >> >> lat (msec): 100=97.99%, 250=2.01% > > [..similar for f2 to f16..] > >> f1: read : io=32780MB, bw=212987KB/s, iops=10, runt=157600msec >> bw (KB/s) : min=153121, max=268968, per=6.31%, avg=213181.15, >> stdev=9052.26 >> f2: read : io=32780MB, bw=213873KB/s, iops=10, runt=156947msec >> bw (KB/s) : min=151143, max=251508, per=6.33%, avg=213987.34, >> stdev=8958.86 >> f3: read : io=32780MB, bw=214613KB/s, iops=10, runt=156406msec >> bw (KB/s) : min=149216, max=219037, per=6.35%, avg=214779.89, >> stdev=9332.99 >> f4: read : io=32780MB, bw=214388KB/s, iops=10, runt=156570msec >> bw (KB/s) : min=148675, max=226298, per=6.35%, avg=214576.51, >> stdev=8985.03 >> f5: read : io=32780MB, bw=213848KB/s, iops=10, runt=156965msec >> bw (KB/s) : min=144479, max=241414, per=6.33%, avg=213935.81, >> stdev=10023.68 >> f6: read : io=32780MB, bw=213514KB/s, iops=10, runt=157211msec >> bw (KB/s) : min=141730, max=264990, per=6.32%, avg=213656.75, >> stdev=10871.71 >> f7: read : io=32780MB, bw=213431KB/s, iops=10, runt=157272msec >> bw (KB/s) : min=148137, max=254635, per=6.32%, avg=213493.12, >> stdev=9319.08 >> f8: read : io=32780MB, bw=213099KB/s, iops=10, runt=157517msec >> bw (KB/s) : min=143467, max=267962, per=6.31%, avg=213267.60, >> stdev=11224.35 >> f9: read : io=32780MB, bw=211254KB/s, iops=10, runt=158893msec >> bw (KB/s) : min=149489, max=267962, per=6.25%, avg=211257.05, >> stdev=9370.64 >> f10: read : io=32780MB, bw=212251KB/s, iops=10, runt=158146msec >> bw (KB/s) : min=150865, max=225882, per=6.28%, avg=212300.50, >> stdev=8431.06 >> f11: read : io=32780MB, bw=212988KB/s, iops=10, runt=157599msec >> bw (KB/s) : min=149489, max=221007, per=6.31%, avg=213123.72, >> stdev=9569.27 >> f12: read : io=32780MB, bw=212788KB/s, iops=10, runt=157747msec >> bw (KB/s) : min=154274, max=218647, per=6.30%, avg=212957.41, >> stdev=8233.52 >> f13: read : io=32780MB, bw=212315KB/s, iops=10, runt=158099msec >> bw (KB/s) : min=153696, max=256000, per=6.29%, avg=212482.68, >> stdev=9203.34 >> f14: read : io=32780MB, bw=212033KB/s, iops=10, runt=158309msec >> bw (KB/s) : min=150588, max=267962, per=6.28%, avg=212198.76, >> stdev=9572.31 >> f15: read : io=32780MB, bw=211720KB/s, iops=10, runt=158543msec >> bw (KB/s) : min=146024, max=268968, per=6.27%, avg=211846.40, >> stdev=10341.58 >> f16: read : io=32780MB, bw=211637KB/s, iops=10, runt=158605msec >> bw (KB/s) : min=148945, max=261605, per=6.26%, avg=211618.40, >> stdev=9240.64 >> >> Run status group 0 (all jobs): >> READ: io=524480MB, aggrb=3301MB/s, minb=216323KB/s, maxb=219763KB/s, >> mint=156406msec, maxt=158893msec >> >> Disk stats (read/write): >> sdd: ios=261902/0, merge=0/0, ticks=12531810/0, in_queue=12532910, >> util=99.46% >> sde: ios=262221/0, merge=0/0, ticks=12494200/0, in_queue=12495300, >> util=99.50% >> sdf: ios=261867/0, merge=0/0, ticks=12427000/0, in_queue=12430530, >> util=99.47% >> sdg: ios=261983/0, merge=0/0, ticks=12462320/0, in_queue=12466060, >> util=99.62% >> sdh: ios=262184/0, merge=0/0, ticks=12487350/0, in_queue=12489960, >> util=99.49% >> sdi: ios=262193/0, merge=0/0, ticks=12524400/0, in_queue=12526580, >> util=99.47% >> sdj: ios=262044/0, merge=0/0, ticks=12511850/0, in_queue=12513840, >> util=99.50% >> sdk: ios=262055/0, merge=0/0, ticks=12526560/0, in_queue=12527890, >> util=99.50% >> sdl: ios=261789/0, merge=0/0, ticks=12609230/0, in_queue=12610400, >> util=99.54% >> sdm: ios=261787/0, merge=0/0, ticks=12579000/0, in_queue=12581050, >> util=99.44% >> sdn: ios=261941/0, merge=0/0, ticks=12524530/0, in_queue=12525790, >> util=99.48% >> sdo: ios=262100/0, merge=0/0, ticks=12554650/0, in_queue=12555820, >> util=99.58% >> sdp: ios=261877/0, merge=0/0, ticks=12572220/0, in_queue=12574610, >> util=99.54% >> sdq: ios=261956/0, merge=0/0, ticks=12601480/0, in_queue=12603770, >> util=99.62% >> sdr: ios=261991/0, merge=0/0, ticks=12599680/0, in_queue=12602190, >> util=99.49% >> sds: ios=261852/0, merge=0/0, ticks=12624070/0, in_queue=12626580, >> util=99.58% > > So, the maximum for these 16 disks is 3301 MiByte/s. > > I also tried hardware RAID (2 sets of 8 disks), and got a similar result: > >> Run status group 0 (all jobs): >> READ: io=65560MB, aggrb=3024MB/s, minb=1548MB/s, maxb=1550MB/s, >> mint=21650msec, maxt=21681msec > > > >> fio should be able to push these devices up to the line speed. If it >> doesn't I would suggest changing elevators (deadline, cfq, noop) and >> bumping the max request size to the max supported by the device. > > 3301 MiByte/s seems like a reasonable number, given the theoretic > maximum of 16 times the single disk performance of 16*256 MiByte/s = > 4096 MiByte/s. > > Based on this, I have not looked at tuning. Would you recommend that I do? > > Our minimal goal is 2500 MiByte/s; that seems achievable as ZFS was able > to reach 2750 MiByte/s without tuning. > >> When we have a config that does so, we can tune the btrfs side of things >> as well. > > Some files are created in the root folder of the mount point, but I get > errors instead of results: > >> ~/fio# ./fio btrfs16.fio >> btrfs: (g=0): rw=read, bs=20M-20M/20M-20M, ioengine=sync, iodepth=8 >> Starting 16 processes >> btrfs: Laying out IO file(s) (1 file(s) / 32768MB) >> btrfs: Laying out IO file(s) (1 file(s) / 32768MB) > [...] > >> btrfs: Laying out IO file(s) (1 file(s) / 32768MB) >> fio: first direct IO errored. File system may not support direct IO, or >> iomem_align= is bad. >> fio: first direct IO errored. File system may not support direct IO, or >> iomem_align= is bad. >> fio: first direct IO errored. File system may not support direct IO, or >> iomem_align= is bad. >> fio: pid=5958, err=22/file:engines/sync.c:62, func=xfer, error=Invalid >> argument >> fio: pid=5961, err=22/file:engines/sync.c:62, func=xfer, error=Invalid >> argument >> fio: pid=5962, err=22/file:engines/sync.c:62, func=xfer, error=Invalid >> argument >> fio: first direct IO errored. File system may not support direct IO, or >> iomem_align= is bad. > [...] >> >> btrfs: (groupid=0, jobs=1): err=22 (file:engines/sync.c:62, func=xfer, >> error=Invalid argument): pid=5956 >> cpu : usr=0.00%, sys=0.00%, ctx=1, majf=0, minf=52 >> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >> >=64=0.0% >> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >> >=64=0.0% >> complete : 0=50.0%, 4=50.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >> >=64=0.0% >> issued r/w: total=1/0, short=0/0 > [no results] > > What could be going on here? > (I get the same result from the github version of fio, fio 1.42, as well > as the one that came with Ubuntu, fio 1.33.1).
If you are using 2.6.32 (as above), BTRFS on this release doesn't support direct I/O. It is supported on 2.6.35, so you could retry with (eg): http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.35-maverick/linux-image-2.6.35-020635-generic_2.6.35-020635_amd64.deb -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html