2017-12-28 11:06 GMT+03:00 Dmitrii Tcvetkov <demfl...@demfloro.ru>:
> On Thu, 28 Dec 2017 01:39:31 +0300
> Timofey Titovets <nefelim...@gmail.com> wrote:
>
>> Currently btrfs raid1/10 balancer blance requests to mirrors,
>> based on pid % num of mirrors.
>>
>> Update logic and make it understood if underline device are non rotational.
>>
>> If one of mirrors are non rotational, then all read requests will be moved to
>> non rotational device.
>>
>> If both of mirrors are non rotational, calculate sum of
>> pending and in flight request for queue on that bdev and use
>> device with least queue leght.
>>
>> P.S.
>> Inspired by md-raid1 read balancing
>>
>> Signed-off-by: Timofey Titovets <nefelim...@gmail.com>
>> ---
>>  fs/btrfs/volumes.c | 59
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59
>> insertions(+)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 9a04245003ab..98bc2433a920 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -5216,13 +5216,30 @@ int btrfs_is_parity_mirror(struct btrfs_fs_info
>> *fs_info, u64 logical, u64 len) return ret;
>>  }
>>
>> +static inline int bdev_get_queue_len(struct block_device *bdev)
>> +{
>> +     int sum = 0;
>> +     struct request_queue *rq = bdev_get_queue(bdev);
>> +
>> +     sum += rq->nr_rqs[BLK_RW_SYNC] + rq->nr_rqs[BLK_RW_ASYNC];
>> +     sum += rq->in_flight[BLK_RW_SYNC] + rq->in_flight[BLK_RW_ASYNC];
>> +
>
> This won't work as expected if bdev is controlled by blk-mq, these
> counters will be zero. AFAIK to get this info in block layer agnostic way
> part_in_flight[1] has to be used. It extracts these counters approriately.
>
> But it needs to be EXPORT_SYMBOL()'ed in block/genhd.c so we can continue
> to build btrfs as module.
>
>> +     /*
>> +      * Try prevent switch for every sneeze
>> +      * By roundup output num by 2
>> +      */
>> +     return ALIGN(sum, 2);
>> +}
>> +
>>  static int find_live_mirror(struct btrfs_fs_info *fs_info,
>>                           struct map_lookup *map, int first, int num,
>>                           int optimal, int dev_replace_is_ongoing)
>>  {
>>       int i;
>>       int tolerance;
>> +     struct block_device *bdev;
>>       struct btrfs_device *srcdev;
>> +     bool all_bdev_nonrot = true;
>>
>>       if (dev_replace_is_ongoing &&
>>           fs_info->dev_replace.cont_reading_from_srcdev_mode ==
>> @@ -5231,6 +5248,48 @@ static int find_live_mirror(struct btrfs_fs_info
>> *fs_info, else
>>               srcdev = NULL;
>>
>> +     /*
>> +      * Optimal expected to be pid % num
>> +      * That's generaly ok for spinning rust drives
>> +      * But if one of mirror are non rotating,
>> +      * that bdev can show better performance
>> +      *
>> +      * if one of disks are non rotating:
>> +      *  - set optimal to non rotating device
>> +      * if both disk are non rotating
>> +      *  - set optimal to bdev with least queue
>> +      * If both disks are spinning rust:
>> +      *  - leave old pid % nu,
>> +      */
>> +     for (i = 0; i < num; i++) {
>> +             bdev = map->stripes[i].dev->bdev;
>> +             if (!bdev)
>> +                     continue;
>> +             if (blk_queue_nonrot(bdev_get_queue(bdev)))
>> +                     optimal = i;
>> +             else
>> +                     all_bdev_nonrot = false;
>> +     }
>> +
>> +     if (all_bdev_nonrot) {
>> +             int qlen;
>> +             /* Forse following logic choise by init with some big number
>> */
>> +             int optimal_dev_rq_count = 1 << 24;
>
> Probably better to use INT_MAX macro instead.
>
> [1] https://elixir.free-electrons.com/linux/v4.15-rc5/source/block/genhd.c#L68
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thank you very much!

-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to