Hello,

On Thu, Nov 12, 2020 at 09:07:52AM -0500, Rachit Agarwal wrote:
> From: Rachit Agarwal <[email protected]>>
> 
> Hi All,
> 
> I/O batching is beneficial for optimizing IOPS and throughput for various
> applications. For instance, several kernel block drivers would benefit from 
> batching,
> including mmc [1] and tcp-based storage drivers like nvme-tcp [2,3]. While we 
> have
> support for batching dispatch [4], we need an I/O scheduler to efficiently 
> enable
> batching. Such a scheduler is particularly interesting for disaggregated 
> storage,
> where the access latency of remote disaggregated storage may be higher than 
> local
> storage access; thus, batching can significantly help in amortizing the 
> remote access
> latency while increasing the throughput.
> 
> This patch introduces the i10 I/O scheduler, which performs batching per hctx 
> in terms
> of #requests, #bytes, and timeouts (at microseconds granularity). i10 starts
> dispatching only when #requests or #bytes is larger than a default threshold 
> or when
> a timer expires. After that, batching dispatch [3] would happen, allowing 
> batching
> at device drivers along with "bd->last" and ".commit_rqs".

blk-mq actually has built-in batching(or sort of) mechanism, which is enabled
if the hw queue is busy(hctx->dispatch_busy is > 0). We use EWMA to compute
hctx->dispatch_busy, and it is adaptive, even though the implementation is quite
coarse. But there should be much space to improve, IMO.

It is reported that this way improves SQ high-end SCSI SSD very much[1],
and MMC performance gets improved too[2].

[1] 
https://lore.kernel.org/linux-block/[email protected]/
[2] 
https://lore.kernel.org/linux-block/CADBw62o9eTQDJ9RvNgEqSpXmg6Xcq=2TxH0Hfxhp29uF2W=t...@mail.gmail.com/

> 
> The i10 I/O scheduler builds upon recent work on [6]. We have tested the i10 
> I/O
> scheduler with nvme-tcp optimizaitons [2,3] and batching dispatch [4], 
> varying number
> of cores, varying read/write ratios, and varying request sizes, and with NVMe 
> SSD and
> RAM block device. For NVMe SSDs, the i10 I/O scheduler achieves ~60% 
> improvements in
> terms of IOPS per core over "noop" I/O scheduler. These results are available 
> at [5],
> and many additional results are presented in [6].

In case of none scheduler, basically nvme driver won't provide any queue busy
feedback, so the built-in batching dispatch doesn't work simply.

kyber scheduler uses io latency feedback to throttle and build io batch,
can you compare i10 with kyber on nvme/nvme-tcp?

> 
> While other schedulers may also batch I/O (e.g., mq-deadline), the 
> optimization target
> in the i10 I/O scheduler is throughput maximization. Hence there is no 
> latency target
> nor a need for a global tracking context, so a new scheduler is needed rather 
> than
> to build this functionality to an existing scheduler.
> 
> We currently use fixed default values as batching thresholds (e.g., 16 for 
> #requests,
> 64KB for #bytes, and 50us for timeout). These default values are based on 
> sensitivity
> tests in [6]. For our future work, we plan to support adaptive batching 
> according to

Frankly speaking, hardcode 16 #rquests or 64KB may not work everywhere,
and product environment could be much complicated than your sensitivity
tests. If possible, please start with adaptive batching.

50us timeout can be contributed to IO latency, and I'd like to see io latency
data with i10, especially i10 vs. vanilla none.


Thanks, 
Ming

Reply via email to