On Wed, Feb 10, 2021 at 02:52:01PM +0800, Anand Jain wrote: > On 10/02/2021 04:30, Michal Rostecki wrote: > > From: Michal Rostecki <mroste...@suse.com> > > > > This patch series adds a new raid1 read policy - roundrobin. For each > > request, it selects the mirror which has lower load than queue depth. > > Load is defined as the number of inflight requests + a penalty value > > (if the scheduled request is not local to the last processed request for > > a rotational disk). > > > > The series consists of preparational changes which add necessary > > information to the btrfs_device struct and the change with the policy. > > > > This policy was tested with fio and compared with the default `pid` > > policy. > > > > The singlethreaded test has the following parameters: > > > > [global] > > name=btrfs-raid1-seqread > > filename=btrfs-raid1-seqread > > rw=read > > bs=64k > > direct=0 > > numjobs=1 > > time_based=0 > > > > [file1] > > size=10G > > ioengine=libaio > > > > and shows the following results: > > > > - raid1c3 with 3 HDDs: > > 3 x Segate Barracuda ST2000DM008 (2TB) > > * pid policy > > READ: bw=217MiB/s (228MB/s), 217MiB/s-217MiB/s (228MB/s-228MB/s), > > io=10.0GiB (10.7GB), run=47082-47082msec > > * roundrobin policy > > READ: bw=409MiB/s (429MB/s), 409MiB/s-409MiB/s (429MB/s-429MB/s), > > io=10.0GiB (10.7GB), run=25028-25028mse > > > > - raid1c3 with 2 HDDs and 1 SSD: > > 2 x Segate Barracuda ST2000DM008 (2TB) > > 1 x Crucial CT256M550SSD1 (256GB) > > * pid policy (the worst case when only HDDs were chosen) > > READ: bw=220MiB/s (231MB/s), 220MiB/s-220MiB/s (231MB/s-231MB/s), > > io=10.0GiB (10.7GB), run=46577-46577mse > > * pid policy (the best case when SSD was used as well) > > READ: bw=513MiB/s (538MB/s), 513MiB/s-513MiB/s (538MB/s-538MB/s), > > io=10.0GiB (10.7GB), run=19954-19954msec > > * roundrobin (there are no noticeable differences when testing multiple > > times) > > READ: bw=541MiB/s (567MB/s), 541MiB/s-541MiB/s (567MB/s-567MB/s), > > io=10.0GiB (10.7GB), run=18933-18933msec > > > > The multithreaded test has the following parameters: > > > > [global] > > name=btrfs-raid1-seqread > > filename=btrfs-raid1-seqread > > rw=read > > bs=64k > > direct=0 > > numjobs=8 > > time_based=0 > > > > [file1] > > size=10G > > ioengine=libaio > > > > and shows the following results: > > > > - raid1c3 with 3 HDDs: 3 x Segate Barracuda ST2000DM008 (2TB) > > 3 x Segate Barracuda ST2000DM008 (2TB) > > * pid policy > > READ: bw=1569MiB/s (1645MB/s), 196MiB/s-196MiB/s (206MB/s-206MB/s), > > io=80.0GiB (85.9GB), run=52210-52211msec > > * roundrobin > > READ: bw=1733MiB/s (1817MB/s), 217MiB/s-217MiB/s (227MB/s-227MB/s), > > io=80.0GiB (85.9GB), run=47269-47271msec > > - raid1c3 with 2 HDDs and 1 SSD: > > 2 x Segate Barracuda ST2000DM008 (2TB) > > 1 x Crucial CT256M550SSD1 (256GB) > > * pid policy > > READ: bw=1843MiB/s (1932MB/s), 230MiB/s-230MiB/s (242MB/s-242MB/s), > > io=80.0GiB (85.9GB), run=44449-44450msec > > * roundrobin > > READ: bw=2485MiB/s (2605MB/s), 311MiB/s-311MiB/s (326MB/s-326MB/s), > > io=80.0GiB (85.9GB), run=32969-32970msec > > > > Both of the above test cases are sequential. How about some random IO > workload? > > Also, the seek time for non-rotational devices does not exist. So it is > a good idea to test with ssd + nvme and all nvme or all ssd. >
Good idea. I will test random I/O and will try to test all-nvme / all-ssd and mixed nonrot. > > To measure the performance of each policy and find optimal penalty > > values, I created scripts which are available here: > > > > https://gitlab.com/vadorovsky/btrfs-perf > > https://github.com/mrostecki/btrfs-perf > > > > Michal Rostecki (6): > > > > btrfs: Add inflight BIO request counter > > btrfs: Store the last device I/O offset > > These patches look good. But as only round-robin policy requires > to monitor the inflight and last-offset. Could you bring them under > if policy=roundrobin? Otherwise, it is just a waste of CPU cycles > if the policy != roundrobin. > If I bring those stats under if policy=roundrobin, they are going to be inaccurate if someone switches policies on the running system, after doing any I/O in that filesystem. I'm open to suggestions how can I make those stats as lightweight as possible. Unfortunately, I don't think I can store the last physical location without atomic_t. The BIO percpu counter is probably the least to be worried about, though I could maybe get rid of it entirely in favor of using part_stat_read(). > > btrfs: Add stripe_physical function > > btrfs: Check if the filesystem is has mixed type of devices > > Thanks, Anand > > > btrfs: sysfs: Add directory for read policies > > btrfs: Add roundrobin raid1 read policy > > > > fs/btrfs/ctree.h | 3 + > > fs/btrfs/disk-io.c | 3 + > > fs/btrfs/sysfs.c | 144 ++++++++++++++++++++++++++---- > > fs/btrfs/volumes.c | 218 +++++++++++++++++++++++++++++++++++++++++++-- > > fs/btrfs/volumes.h | 22 +++++ > > 5 files changed, 366 insertions(+), 24 deletions(-) > > > Thanks for review, Michal