On 5 August 2010 22:21, Freek Dijkstra <freek.dijks...@sara.nl> wrote:
> Chris, Daniel and Mathieu,
>
> Thanks for your constructive feedback!
>
>> On Thu, Aug 05, 2010 at 04:05:33PM +0200, Freek Dijkstra wrote:
>>>              ZFS             BtrFS
>>> 1 SSD      256 MiByte/s     256 MiByte/s
>>> 2 SSDs     505 MiByte/s     504 MiByte/s
>>> 3 SSDs     736 MiByte/s     756 MiByte/s
>>> 4 SSDs     952 MiByte/s     916 MiByte/s
>>> 5 SSDs    1226 MiByte/s     986 MiByte/s
>>> 6 SSDs    1450 MiByte/s     978 MiByte/s
>>> 8 SSDs    1653 MiByte/s     932 MiByte/s
>>> 16 SSDs   2750 MiByte/s     919 MiByte/s
>>>
> [...]
>>> The above results were for Ubuntu 10.04.1 server, with BtrFS v0.19,
>>
>> Which kernels are those?
>
> For BtrFS: Linux 2.6.32-21-server #32-Ubuntu SMP x86_64 GNU/Linux
> For ZFS: FreeBSD 8.1-RELEASE (GENERIC)
>
> (Note that we can currently not upgrade easily due to binary drivers for
> the SAS+SATA controllers :(. I'd be happy to push the vendor though, if
> you think it makes a difference.)
>
>
> Daniel J Blueman wrote:
>
>> Perhaps create a new filesystem and mount with 'nodatasum'
>
> I get an improvement: 919 MiByte/s just became 1580 MiByte/s. Not as
> fast as it can, but most certainly an improvement.
>
>> existing extents which were previously created will be checked, so
>> need to start fresh.
>
> Indeed, also the other way around. I created two test files, while
> mounted with and without the -o nodatasum option:
> write w/o nodatasum; read w/o nodatasum:  919 ą 43 MiByte/s
> write w/o nodatasum; read w/  nodatasum:  922 ą 72 MiByte/s
> write w/  nodatasum; read w/o nodatasum: 1082 ą 46 MiByte/s
> write w/  nodatasum; read w/  nodatasum: 1586 ą 126 MiByte/s
>
> So even if I remount the disk in the normal way, and read a file created
> without checksums, I still get a small improvement :)
>
> (PS: the above tests were repeated 4 times, the last even 8 times. As
> you can see from the standard deviation, the results are not always very
> accurate. The cause is unknown; CPU load is low.)
>
>
> Chris Mason wrote:
>
>> Basically we have two different things to tune.  First the block layer
>> and then btrfs.
>
>
>> And then we need to setup a fio job file that hammers on all the ssds at
>> once.  I'd have it use adio/dio and talk directly to the drives.
>>
>> [global]
>> size=32g
>> direct=1
>> iodepth=8
>> bs=20m
>> rw=read
>>
>> [f1]
>> filename=/dev/sdd
>> [f2]
>> filename=/dev/sde
>> [f3]
>> filename=/dev/sdf
> [...]
>> [f16]
>> filename=/dev/sds
>
> Thanks. First one disk:
>
>> f1: (groupid=0, jobs=1): err= 0: pid=6273
>>   read : io=32780MB, bw=260964KB/s, iops=12, runt=128626msec
>>     clat (usec): min=74940, max=80721, avg=78449.61, stdev=923.24
>>     bw (KB/s) : min=240469, max=269981, per=100.10%, avg=261214.77, 
>> stdev=2765.91
>>   cpu          : usr=0.01%, sys=2.69%, ctx=1747, majf=0, minf=5153
>>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>> >=64=0.0%
>>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>> >=64=0.0%
>>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>> >=64=0.0%
>>      issued r/w: total=1639/0, short=0/0
>>
>>      lat (msec): 100=100.00%
>>
>> Run status group 0 (all jobs):
>>    READ: io=32780MB, aggrb=260963KB/s, minb=267226KB/s, maxb=267226KB/s, 
>> mint=128626msec, maxt=128626msec
>>
>> Disk stats (read/write):
>>   sdd: ios=261901/0, merge=0/0, ticks=10135270/0, in_queue=10136460, 
>> util=99.30%
>
> So 255 MiByte/s.
> Out of curiousity, what is the distinction between the reported figures
> of 260964 kiB/s, 261214.77 kiB/s, 267226 kiB/s and 260963 kiB/s?
>
>
> Now 16 disks (abbreviated):
>
>> ~/fio# ./fio ssd.fio
>> Starting 16 processes
>> f1: (groupid=0, jobs=1): err= 0: pid=4756
>>   read : io=32780MB, bw=212987KB/s, iops=10, runt=157600msec
>>     clat (msec): min=75, max=138, avg=96.15, stdev= 4.47
>>      lat (msec): min=75, max=138, avg=96.15, stdev= 4.47
>>     bw (KB/s) : min=153121, max=268968, per=6.31%, avg=213181.15, 
>> stdev=9052.26
>>   cpu          : usr=0.00%, sys=1.71%, ctx=2737, majf=0, minf=5153
>>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>> >=64=0.0%
>>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>> >=64=0.0%
>>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>> >=64=0.0%
>>      issued r/w: total=1639/0, short=0/0
>>
>>      lat (msec): 100=97.99%, 250=2.01%
>
> [..similar for f2 to f16..]
>
>> f1:      read : io=32780MB, bw=212987KB/s, iops=10, runt=157600msec
>>     bw (KB/s) : min=153121, max=268968, per=6.31%, avg=213181.15, 
>> stdev=9052.26
>> f2:      read : io=32780MB, bw=213873KB/s, iops=10, runt=156947msec
>>     bw (KB/s) : min=151143, max=251508, per=6.33%, avg=213987.34, 
>> stdev=8958.86
>> f3:      read : io=32780MB, bw=214613KB/s, iops=10, runt=156406msec
>>     bw (KB/s) : min=149216, max=219037, per=6.35%, avg=214779.89, 
>> stdev=9332.99
>> f4:      read : io=32780MB, bw=214388KB/s, iops=10, runt=156570msec
>>     bw (KB/s) : min=148675, max=226298, per=6.35%, avg=214576.51, 
>> stdev=8985.03
>> f5:      read : io=32780MB, bw=213848KB/s, iops=10, runt=156965msec
>>     bw (KB/s) : min=144479, max=241414, per=6.33%, avg=213935.81, 
>> stdev=10023.68
>> f6:      read : io=32780MB, bw=213514KB/s, iops=10, runt=157211msec
>>     bw (KB/s) : min=141730, max=264990, per=6.32%, avg=213656.75, 
>> stdev=10871.71
>> f7:      read : io=32780MB, bw=213431KB/s, iops=10, runt=157272msec
>>     bw (KB/s) : min=148137, max=254635, per=6.32%, avg=213493.12, 
>> stdev=9319.08
>> f8:      read : io=32780MB, bw=213099KB/s, iops=10, runt=157517msec
>>     bw (KB/s) : min=143467, max=267962, per=6.31%, avg=213267.60, 
>> stdev=11224.35
>> f9:      read : io=32780MB, bw=211254KB/s, iops=10, runt=158893msec
>>     bw (KB/s) : min=149489, max=267962, per=6.25%, avg=211257.05, 
>> stdev=9370.64
>> f10:     read : io=32780MB, bw=212251KB/s, iops=10, runt=158146msec
>>     bw (KB/s) : min=150865, max=225882, per=6.28%, avg=212300.50, 
>> stdev=8431.06
>> f11:     read : io=32780MB, bw=212988KB/s, iops=10, runt=157599msec
>>     bw (KB/s) : min=149489, max=221007, per=6.31%, avg=213123.72, 
>> stdev=9569.27
>> f12:     read : io=32780MB, bw=212788KB/s, iops=10, runt=157747msec
>>     bw (KB/s) : min=154274, max=218647, per=6.30%, avg=212957.41, 
>> stdev=8233.52
>> f13:     read : io=32780MB, bw=212315KB/s, iops=10, runt=158099msec
>>     bw (KB/s) : min=153696, max=256000, per=6.29%, avg=212482.68, 
>> stdev=9203.34
>> f14:     read : io=32780MB, bw=212033KB/s, iops=10, runt=158309msec
>>     bw (KB/s) : min=150588, max=267962, per=6.28%, avg=212198.76, 
>> stdev=9572.31
>> f15:     read : io=32780MB, bw=211720KB/s, iops=10, runt=158543msec
>>     bw (KB/s) : min=146024, max=268968, per=6.27%, avg=211846.40, 
>> stdev=10341.58
>> f16:     read : io=32780MB, bw=211637KB/s, iops=10, runt=158605msec
>>     bw (KB/s) : min=148945, max=261605, per=6.26%, avg=211618.40, 
>> stdev=9240.64
>>
>> Run status group 0 (all jobs):
>>    READ: io=524480MB, aggrb=3301MB/s, minb=216323KB/s, maxb=219763KB/s, 
>> mint=156406msec, maxt=158893msec
>>
>> Disk stats (read/write):
>>   sdd: ios=261902/0, merge=0/0, ticks=12531810/0, in_queue=12532910, 
>> util=99.46%
>>   sde: ios=262221/0, merge=0/0, ticks=12494200/0, in_queue=12495300, 
>> util=99.50%
>>   sdf: ios=261867/0, merge=0/0, ticks=12427000/0, in_queue=12430530, 
>> util=99.47%
>>   sdg: ios=261983/0, merge=0/0, ticks=12462320/0, in_queue=12466060, 
>> util=99.62%
>>   sdh: ios=262184/0, merge=0/0, ticks=12487350/0, in_queue=12489960, 
>> util=99.49%
>>   sdi: ios=262193/0, merge=0/0, ticks=12524400/0, in_queue=12526580, 
>> util=99.47%
>>   sdj: ios=262044/0, merge=0/0, ticks=12511850/0, in_queue=12513840, 
>> util=99.50%
>>   sdk: ios=262055/0, merge=0/0, ticks=12526560/0, in_queue=12527890, 
>> util=99.50%
>>   sdl: ios=261789/0, merge=0/0, ticks=12609230/0, in_queue=12610400, 
>> util=99.54%
>>   sdm: ios=261787/0, merge=0/0, ticks=12579000/0, in_queue=12581050, 
>> util=99.44%
>>   sdn: ios=261941/0, merge=0/0, ticks=12524530/0, in_queue=12525790, 
>> util=99.48%
>>   sdo: ios=262100/0, merge=0/0, ticks=12554650/0, in_queue=12555820, 
>> util=99.58%
>>   sdp: ios=261877/0, merge=0/0, ticks=12572220/0, in_queue=12574610, 
>> util=99.54%
>>   sdq: ios=261956/0, merge=0/0, ticks=12601480/0, in_queue=12603770, 
>> util=99.62%
>>   sdr: ios=261991/0, merge=0/0, ticks=12599680/0, in_queue=12602190, 
>> util=99.49%
>>   sds: ios=261852/0, merge=0/0, ticks=12624070/0, in_queue=12626580, 
>> util=99.58%
>
> So, the maximum for these 16 disks is 3301 MiByte/s.
>
> I also tried hardware RAID (2 sets of 8 disks), and got a similar result:
>
>> Run status group 0 (all jobs):
>>    READ: io=65560MB, aggrb=3024MB/s, minb=1548MB/s, maxb=1550MB/s, 
>> mint=21650msec, maxt=21681msec
>
>
>
>> fio should be able to push these devices up to the line speed.  If it
>> doesn't I would suggest changing elevators (deadline, cfq, noop) and
>> bumping the max request size to the max supported by the device.
>
> 3301 MiByte/s seems like a reasonable number, given the theoretic
> maximum of 16 times the single disk performance of 16*256 MiByte/s =
> 4096 MiByte/s.
>
> Based on this, I have not looked at tuning. Would you recommend that I do?
>
> Our minimal goal is 2500 MiByte/s; that seems achievable as ZFS was able
> to reach 2750 MiByte/s without tuning.
>
>> When we have a config that does so, we can tune the btrfs side of things
>> as well.
>
> Some files are created in the root folder of the mount point, but I get
> errors instead of results:
>
>> ~/fio# ./fio btrfs16.fio
>> btrfs: (g=0): rw=read, bs=20M-20M/20M-20M, ioengine=sync, iodepth=8
>> Starting 16 processes
>> btrfs: Laying out IO file(s) (1 file(s) / 32768MB)
>> btrfs: Laying out IO file(s) (1 file(s) / 32768MB)
> [...]
>
>> btrfs: Laying out IO file(s) (1 file(s) / 32768MB)
>> fio: first direct IO errored. File system may not support direct IO, or 
>> iomem_align= is bad.
>> fio: first direct IO errored. File system may not support direct IO, or 
>> iomem_align= is bad.
>> fio: first direct IO errored. File system may not support direct IO, or 
>> iomem_align= is bad.
>> fio: pid=5958, err=22/file:engines/sync.c:62, func=xfer, error=Invalid 
>> argument
>> fio: pid=5961, err=22/file:engines/sync.c:62, func=xfer, error=Invalid 
>> argument
>> fio: pid=5962, err=22/file:engines/sync.c:62, func=xfer, error=Invalid 
>> argument
>> fio: first direct IO errored. File system may not support direct IO, or 
>> iomem_align= is bad.
> [...]
>>
>> btrfs: (groupid=0, jobs=1): err=22 (file:engines/sync.c:62, func=xfer, 
>> error=Invalid argument): pid=5956
>>   cpu          : usr=0.00%, sys=0.00%, ctx=1, majf=0, minf=52
>>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>> >=64=0.0%
>>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>> >=64=0.0%
>>      complete  : 0=50.0%, 4=50.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>> >=64=0.0%
>>      issued r/w: total=1/0, short=0/0
> [no results]
>
> What could be going on here?
> (I get the same result from the github version of fio, fio 1.42, as well
> as the one that came with Ubuntu, fio 1.33.1).

If you are using 2.6.32 (as above), BTRFS on this release doesn't
support direct I/O. It is supported on 2.6.35, so you could retry with
(eg):
http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.35-maverick/linux-image-2.6.35-020635-generic_2.6.35-020635_amd64.deb
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to