On Wed, Feb 10, 2021 at 07:23:04PM +0000, Michal Rostecki wrote: > On Wed, Feb 10, 2021 at 01:58:15PM +0100, Michał Mirosław wrote: > > On Wed, Feb 10, 2021 at 12:29:25PM +0000, Michal Rostecki wrote: > > > On Wed, Feb 10, 2021 at 05:24:28AM +0100, Michał Mirosław wrote: > > > > This looks like it effectively decreases queue depth for non-last > > > > device. After all devices are filled to queue_depth-penalty, only > > > > a single mirror will be selected for next reads (until a read on > > > > some other one completes). > > > > > > > > > > Good point. And if all devices are going to be filled for longer time, > > > this function will keep selecting the last one. Maybe I should select > > > last+1 in that case. Would that address your concern or did you have any > > > other solution in mind? > > > > The best would be to postpone the selection until one device becomes free > > again. But if that's not doable, then yes, you could make sure it stays > > round-robin after filling the queues (the scheduling will loose the > > "penalty"-driven adjustment though). > > Or another idea - when all the queues are filled, return the mirror > which has the lowest load (inflight + penalty), even though it's greater > than queue depth. In that case the scheduling will not lose the penalty > adjustment and the load is going to be spreaded more fair. > > I'm not sure if postponing the selection is that good idea. I think it's > better if the request is added to the iosched queue anyway, even if the > disks' queues are filled, and let the I/O scheduler handle that. The > length of the iosched queue (nr_requests, attribute of the iosched) is > usually greater than queue depth (attribute of the devide), which means > that it's fine to schedule more requests for iosched to handle. > > IMO btrfs should use the information given by iosched only for heuristic > mirror selection, rather than implement its own throttling logic. > > Does it make sense to you? > > An another idea could be an additional iteration in regard to > nr_requests, if all load values are greater than queue depths, though it > might be an overkill. I would prefer to stick to my first idea if > everyone agrees.
What if iosched could provide an estimate of request's latency? Then btrfs could always select the lowest. For reads from NVME/SSD I would normally expect something simple: speed_factor * (pending_bytes + req_bytes). For HDDs this could do more computation like looking into what is there in the queue already. This would deviate from simple round-robin scheme, though. Best Regards Michał Mirosław