Thanks for the detailed explanation! As both you and Alexandre hinted
at I was using the wrong tool. dd(1) indeed yields the expected speed
on rsd1c. At least for this particular SD card a block size of 64k
appears to be optimal:

$ doas dd if=/dev/rsd1c of=/dev/null bs=32k count=16k
16384+0 records in
16384+0 records out
536870912 bytes transferred in 14.867 secs (36110418 bytes/sec)

$ doas dd if=/dev/rsd1c of=/dev/null bs=64k count=8k
8192+0 records in
8192+0 records out
536870912 bytes transferred in 8.494 secs (63202126 bytes/sec)

$ doas dd if=/dev/rsd1c of=/dev/null bs=128k count=4k
4096+0 records in
4096+0 records out
536870912 bytes transferred in 8.434 secs (63649774 bytes/sec)

On Fri, Jun 15, 2018 at 9:30 AM, Joseph Mayer
<[email protected]> wrote:
> /dev/sd*
>
>  * are cached (cache has a 3GB cap presently since the DMA pushback
>    diff not was experienced as stable and therefore rolled back), which
>    may make access appear faster

I've been careful to un- and replug before every test to avoid any
effects of caching.

>  * I think the underlying hardware access is always split to 512B (or
>    in certain cases 4096B) accesses by the file/IO subsystem,

Indeed at least for this particular SD card the optimal blocksize when
reading via sd1c is 4096B. This is also the blocksize at which reading
speeds from sd1c and rsd1c roughly coincide, which confirms your
statement.

> this
>    together with serialization via the kernel biglock and subsystem
>    design gives you a system-global cap of about 120MB/sec independent
>    of actual hardware.

What does this 120MB/s limit apply to? I'm gettng sustained 400MB/s
read speeds from files on my internal (encrypted) SSD, even with files
much larger than RAM.

> Modern SSD:s all give you about 80MB/sec access in normal sequential
> read mode. As you parallelize it you will see an linear speed increase
> with higher number of threads up to approx 10 threads, where a SATA SSD
> will give you about 500MB/sec and a PCIe NVMe SSD will give you about
> 900MB/sec.

I'm not sure I understand this. When reading from rsd1c with dd(1) I
get 500MB/s. Does that mean that dd or the kernel is parallelizing
requests?

Reply via email to