2017-12-29 22:14 GMT+03:00 Dmitrii Tcvetkov <[email protected]>:
> On Fri, 29 Dec 2017 21:44:19 +0300
> Dmitrii Tcvetkov <[email protected]> wrote:
>> > +/**
>> > + * guess_optimal - return guessed optimal mirror
>> > + *
>> > + * Optimal expected to be pid % num_stripes
>> > + *
>> > + * That's generaly ok for spread load
>> > + * Add some balancer based on queue leght to device
>> > + *
>> > + * Basic ideas:
>> > + * - Sequential read generate low amount of request
>> > + * so if load of drives are equal, use pid % num_stripes
>> > balancing
>> > + * - For mixed rotate/non-rotate mirrors, pick non-rotate as
>> > optimal
>> > + * and repick if other dev have "significant" less queue lenght
>> > + * - Repick optimal if queue leght of other mirror are less
>> > + */
>> > +static int guess_optimal(struct map_lookup *map, int optimal)
>> > +{
>> > + int i;
>> > + int round_down = 8;
>> > + int num = map->num_stripes;
>>
>> num has to be initialized from map->sub_stripes if we're reading
>> RAID10, otherwise there will be NULL pointer dereference
>>
>
> Check can be like:
> if (map->type & BTRFS_BLOCK_GROUP_RAID10)
> num = map->sub_stripes;
>
>>@@ -5804,10 +5914,12 @@ static int __btrfs_map_block(struct
>>btrfs_fs_info *fs_info,
>> stripe_index += mirror_num - 1;
>> else {
>> int old_stripe_index = stripe_index;
>>+ optimal = guess_optimal(map,
>>+ current->pid %
>>map->num_stripes);
>> stripe_index = find_live_mirror(fs_info, map,
>> stripe_index,
>> map->sub_stripes,
>> stripe_index +
>>- current->pid %
>>map->sub_stripes,
>>+ optimal,
>> dev_replace_is_ongoing);
>> mirror_num = stripe_index - old_stripe_index
>> + 1; }
>>--
>>2.15.1
>
> Also here calculation should be with map->sub_stripes too.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Why you think we need such check?
I.e. guess_optimal always called for find_live_mirror()
Both in same context, like that:
if (map->type & BTRFS_BLOCK_GROUP_RAID10) {
u32 factor = map->num_stripes / map->sub_stripes;
stripe_nr = div_u64_rem(stripe_nr, factor, &stripe_index);
stripe_index *= map->sub_stripes;
if (need_full_stripe(op))
num_stripes = map->sub_stripes;
else if (mirror_num)
stripe_index += mirror_num - 1;
else {
int old_stripe_index = stripe_index;
stripe_index = find_live_mirror(fs_info, map,
stripe_index,
map->sub_stripes, stripe_index +
current->pid % map->sub_stripes,
dev_replace_is_ongoing);
mirror_num = stripe_index - old_stripe_index + 1;
}
That useless to check that internally
---
Also, fio results for all hdd raid1, results from waxhead:
Original:
Disk-4k-randread-depth-32: (g=0): rw=randread, bs=(R) 4096B-512KiB,
(W) 4096B-512KiB, (T) 4096B-512KiB, ioengine=libaio, iodepth=32
Disk-4k-read-depth-8: (g=0): rw=read, bs=(R) 4096B-512KiB, (W)
4096B-512KiB, (T) 4096B-512KiB, ioengine=libaio, iodepth=8
Disk-4k-randwrite-depth-8: (g=0): rw=randwrite, bs=(R) 4096B-512KiB,
(W) 4096B-512KiB, (T) 4096B-512KiB, ioengine=libaio, iodepth=8
fio-3.1
Starting 3 processes
Disk-4k-randread-depth-32: Laying out IO file (1 file / 65536MiB)
Jobs: 3 (f=3): [r(1),R(1),w(1)][100.0%][r=120MiB/s,w=9.88MiB/s][r=998,w=96
IOPS][eta 00m:00s]
Disk-4k-randread-depth-32: (groupid=0, jobs=1): err= 0: pid=3132: Fri
Dec 29 16:16:33 2017
read: IOPS=375, BW=41.3MiB/s (43.3MB/s)(24.2GiB/600128msec)
slat (usec): min=15, max=206039, avg=88.71, stdev=990.35
clat (usec): min=357, max=3487.1k, avg=85022.93, stdev=141872.25
lat (usec): min=399, max=3487.2k, avg=85112.58, stdev=141880.31
clat percentiles (msec):
| 1.00th=[ 5], 5.00th=[ 7], 10.00th=[ 9], 20.00th=[ 13],
| 30.00th=[ 19], 40.00th=[ 27], 50.00th=[ 39], 60.00th=[ 56],
| 70.00th=[ 83], 80.00th=[ 127], 90.00th=[ 209], 95.00th=[ 300],
| 99.00th=[ 600], 99.50th=[ 852], 99.90th=[ 1703], 99.95th=[ 2165],
| 99.99th=[ 2937]
bw ( KiB/s): min= 392, max=75824, per=30.46%, avg=42736.09,
stdev=12019.09, samples=1186
iops : min= 3, max= 500, avg=380.24, stdev=99.50, samples=1186
lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.29%, 10=12.33%, 20=19.67%, 50=24.92%
lat (msec) : 100=17.51%, 250=18.05%, 500=5.72%, 750=0.85%, 1000=0.28%
lat (msec) : 2000=0.29%, >=2000=0.07%
cpu : usr=0.67%, sys=4.62%, ctx=215716, majf=0, minf=526
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwt: total=225609,0,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Disk-4k-read-depth-8: (groupid=0, jobs=1): err= 0: pid=3133: Fri Dec
29 16:16:33 2017
read: IOPS=694, BW=95.8MiB/s (100MB/s)(56.1GiB/600017msec)
slat (usec): min=8, max=617652, avg=88.88, stdev=1996.00
clat (usec): min=95, max=1127.4k, avg=11424.86, stdev=10606.45
lat (usec): min=138, max=1127.5k, avg=11514.53, stdev=10796.64
clat percentiles (usec):
| 1.00th=[ 1270], 5.00th=[ 2507], 10.00th=[ 3261], 20.00th=[ 5932],
| 30.00th=[ 6783], 40.00th=[ 7701], 50.00th=[ 9896], 60.00th=[ 11076],
| 70.00th=[ 13435], 80.00th=[ 15795], 90.00th=[ 20841], 95.00th=[ 25822],
| 99.00th=[ 36963], 99.50th=[ 45351], 99.90th=[108528], 99.95th=[137364],
| 99.99th=[387974]
bw ( KiB/s): min=10720, max=131855, per=69.93%, avg=98104.00,
stdev=14476.04, samples=1200
iops : min= 78, max= 1082, avg=694.71, stdev=111.69, samples=1200
lat (usec) : 100=0.01%, 250=0.04%, 500=0.12%, 750=0.25%, 1000=0.26%
lat (msec) : 2=2.02%, 4=11.65%, 10=36.16%, 20=38.45%, 50=10.67%
lat (msec) : 100=0.27%, 250=0.11%, 500=0.01%, 750=0.01%, 2000=0.01%
cpu : usr=0.78%, sys=7.69%, ctx=264209, majf=0, minf=521
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=416698,0,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=8
Disk-4k-randwrite-depth-8: (groupid=0, jobs=1): err= 0: pid=3134: Fri
Dec 29 16:16:33 2017
write: IOPS=81, BW=10.6MiB/s (11.1MB/s)(6362MiB/600133msec)
slat (usec): min=16, max=429897, avg=98.81, stdev=3109.35
clat (usec): min=240, max=2206.9k, avg=98358.53, stdev=309465.43
lat (usec): min=305, max=2206.0k, avg=98458.24, stdev=309483.50
clat percentiles (usec):
| 1.00th=[ 1237], 5.00th=[ 3326], 10.00th=[ 5080],
| 20.00th=[ 7635], 30.00th=[ 10159], 40.00th=[ 12911],
| 50.00th=[ 16319], 60.00th=[ 21890], 70.00th=[ 36439],
| 80.00th=[ 91751], 90.00th=[ 166724], 95.00th=[ 287310],
| 99.00th=[2021655], 99.50th=[2038432], 99.90th=[2088764],
| 99.95th=[2105541], 99.99th=[2164261]
bw ( KiB/s): min= 8, max=91796, per=100.00%, avg=16619.21,
stdev=18128.65, samples=797
iops : min= 2, max= 640, avg=123.92, stdev=132.83, samples=797
lat (usec) : 250=0.01%, 500=0.17%, 750=0.19%, 1000=0.33%
lat (msec) : 2=1.34%, 4=5.01%, 10=22.34%, 20=28.22%, 50=15.76%
lat (msec) : 100=8.23%, 250=12.71%, 500=2.37%, 750=0.51%, 1000=0.15%
lat (msec) : 2000=1.05%, >=2000=1.62%
cpu : usr=0.20%, sys=0.72%, ctx=41618, majf=0, minf=7
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,48759,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=8
Run status group 0 (all jobs):
READ: bw=137MiB/s (144MB/s), 41.3MiB/s-95.8MiB/s
(43.3MB/s-100MB/s), io=80.3GiB (86.2GB), run=600017-600128msec
WRITE: bw=10.6MiB/s (11.1MB/s), 10.6MiB/s-10.6MiB/s
(11.1MB/s-11.1MB/s), io=6362MiB (6671MB), run=600133-600133msec
Patched:
Disk-4k-randread-depth-32: (g=0): rw=randread, bs=(R) 4096B-512KiB,
(W) 4096B-512KiB, (T) 4096B-512KiB, ioengine=libaio, iodepth=32
Disk-4k-read-depth-8: (g=0): rw=read, bs=(R) 4096B-512KiB, (W)
4096B-512KiB, (T) 4096B-512KiB, ioengine=libaio, iodepth=8
Disk-4k-randwrite-depth-8: (g=0): rw=randwrite, bs=(R) 4096B-512KiB,
(W) 4096B-512KiB, (T) 4096B-512KiB, ioengine=libaio, iodepth=8
fio-3.1
Starting 3 processes
Jobs: 3 (f=3): [r(1),R(1),w(1)][100.0%][r=67.3MiB/s,w=17.7MiB/s][r=734,w=150
IOPS][eta 00m:00s]
Disk-4k-randread-depth-32: (groupid=0, jobs=1): err= 0: pid=1755: Fri
Dec 29 22:56:57 2017
read: IOPS=613, BW=60.6MiB/s (63.5MB/s)(35.5GiB/600060msec)
slat (usec): min=12, max=237473, avg=163.70, stdev=1695.77
clat (usec): min=220, max=1152.1k, avg=51952.45, stdev=56779.39
lat (usec): min=263, max=1152.3k, avg=52117.15, stdev=56934.65
clat percentiles (msec):
| 1.00th=[ 5], 5.00th=[ 7], 10.00th=[ 10], 20.00th=[ 14],
| 30.00th=[ 19], 40.00th=[ 25], 50.00th=[ 33], 60.00th=[ 43],
| 70.00th=[ 57], 80.00th=[ 80], 90.00th=[ 121], 95.00th=[ 165],
| 99.00th=[ 271], 99.50th=[ 326], 99.90th=[ 456], 99.95th=[ 502],
| 99.99th=[ 651]
bw ( KiB/s): min= 7006, max=106682, per=60.69%, avg=62211.51,
stdev=14166.42, samples=1199
iops : min= 72, max= 825, avg=615.08, stdev=106.32, samples=1199
lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.72%, 10=11.63%, 20=20.86%, 50=32.26%
lat (msec) : 100=20.54%, 250=12.60%, 500=1.31%, 750=0.05%, 1000=0.01%
lat (msec) : 2000=0.01%
cpu : usr=1.14%, sys=7.37%, ctx=333462, majf=0, minf=528
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwt: total=368214,0,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Disk-4k-read-depth-8: (groupid=0, jobs=1): err= 0: pid=1756: Fri Dec
29 22:56:57 2017
read: IOPS=285, BW=39.5MiB/s (41.4MB/s)(23.2GiB/600056msec)
slat (usec): min=7, max=523518, avg=115.85, stdev=2072.83
clat (usec): min=90, max=1880.8k, avg=27860.58, stdev=49717.09
lat (usec): min=127, max=1880.9k, avg=27977.32, stdev=49780.09
clat percentiles (usec):
| 1.00th=[ 469], 5.00th=[ 1074], 10.00th=[ 1762], 20.00th=[ 3654],
| 30.00th=[ 5866], 40.00th=[ 7767], 50.00th=[ 10159], 60.00th=[ 13829],
| 70.00th=[ 20841], 80.00th=[ 35390], 90.00th=[ 76022], 95.00th=[124257],
| 99.00th=[229639], 99.50th=[304088], 99.90th=[484443], 99.95th=[591397],
| 99.99th=[742392]
bw ( KiB/s): min= 672, max=100966, per=39.58%, avg=40570.62,
stdev=17431.49, samples=1197
iops : min= 17, max= 744, avg=286.45, stdev=126.09, samples=1197
lat (usec) : 100=0.01%, 250=0.31%, 500=0.97%, 750=1.79%, 1000=1.49%
lat (msec) : 2=7.55%, 4=8.70%, 10=28.45%, 20=19.95%, 50=15.85%
lat (msec) : 100=7.67%, 250=6.49%, 500=0.69%, 750=0.08%, 1000=0.01%
lat (msec) : 2000=0.01%
cpu : usr=0.39%, sys=3.36%, ctx=130493, majf=0, minf=524
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=171546,0,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=8
Disk-4k-randwrite-depth-8: (groupid=0, jobs=1): err= 0: pid=1757: Fri
Dec 29 22:56:57 2017
write: IOPS=136, BW=17.2MiB/s (18.0MB/s)(10.1GiB/600007msec)
slat (usec): min=19, max=136510, avg=114.43, stdev=1121.55
clat (usec): min=258, max=2084.5k, avg=58607.26, stdev=103204.91
lat (usec): min=334, max=2084.7k, avg=58722.67, stdev=103202.00
clat percentiles (msec):
| 1.00th=[ 3], 5.00th=[ 6], 10.00th=[ 8], 20.00th=[ 12],
| 30.00th=[ 16], 40.00th=[ 21], 50.00th=[ 27], 60.00th=[ 37],
| 70.00th=[ 53], 80.00th=[ 80], 90.00th=[ 131], 95.00th=[ 205],
| 99.00th=[ 506], 99.50th=[ 718], 99.90th=[ 1183], 99.95th=[ 1385],
| 99.99th=[ 1989]
bw ( KiB/s): min= 8, max=61572, per=100.00%, avg=18098.74,
stdev=11881.08, samples=1175
iops : min= 2, max= 495, avg=139.63, stdev=92.08, samples=1175
lat (usec) : 500=0.06%, 750=0.06%, 1000=0.13%
lat (msec) : 2=0.55%, 4=2.32%, 10=13.21%, 20=23.07%, 50=29.51%
lat (msec) : 100=16.15%, 250=11.65%, 500=2.26%, 750=0.59%, 1000=0.27%
lat (msec) : 2000=0.16%, >=2000=0.01%
cpu : usr=0.38%, sys=1.35%, ctx=77040, majf=0, minf=9
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,81727,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=8
Run status group 0 (all jobs):
READ: bw=100MiB/s (105MB/s), 39.5MiB/s-60.6MiB/s
(41.4MB/s-63.5MB/s), io=58.7GiB (62.0GB), run=600056-600060msec
WRITE: bw=17.2MiB/s (18.0MB/s), 17.2MiB/s-17.2MiB/s
(18.0MB/s-18.0MB/s), io=10.1GiB (10.8GB), run=600007-600007msec
So, as you can observe, with mixed load,
that rebalance iops to random read, decrease random write/read latency.
But make sequential thread a little bit hungry,
For systems with less random load, sequential read performance must not change.
(i.e. if load below threshold of queue length for hdd).
Thanks.
--
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html