Thanks for the detailed explanation! As both you and Alexandre hinted at I was using the wrong tool. dd(1) indeed yields the expected speed on rsd1c. At least for this particular SD card a block size of 64k appears to be optimal:
$ doas dd if=/dev/rsd1c of=/dev/null bs=32k count=16k 16384+0 records in 16384+0 records out 536870912 bytes transferred in 14.867 secs (36110418 bytes/sec) $ doas dd if=/dev/rsd1c of=/dev/null bs=64k count=8k 8192+0 records in 8192+0 records out 536870912 bytes transferred in 8.494 secs (63202126 bytes/sec) $ doas dd if=/dev/rsd1c of=/dev/null bs=128k count=4k 4096+0 records in 4096+0 records out 536870912 bytes transferred in 8.434 secs (63649774 bytes/sec) On Fri, Jun 15, 2018 at 9:30 AM, Joseph Mayer <[email protected]> wrote: > /dev/sd* > > * are cached (cache has a 3GB cap presently since the DMA pushback > diff not was experienced as stable and therefore rolled back), which > may make access appear faster I've been careful to un- and replug before every test to avoid any effects of caching. > * I think the underlying hardware access is always split to 512B (or > in certain cases 4096B) accesses by the file/IO subsystem, Indeed at least for this particular SD card the optimal blocksize when reading via sd1c is 4096B. This is also the blocksize at which reading speeds from sd1c and rsd1c roughly coincide, which confirms your statement. > this > together with serialization via the kernel biglock and subsystem > design gives you a system-global cap of about 120MB/sec independent > of actual hardware. What does this 120MB/s limit apply to? I'm gettng sustained 400MB/s read speeds from files on my internal (encrypted) SSD, even with files much larger than RAM. > Modern SSD:s all give you about 80MB/sec access in normal sequential > read mode. As you parallelize it you will see an linear speed increase > with higher number of threads up to approx 10 threads, where a SATA SSD > will give you about 500MB/sec and a PCIe NVMe SSD will give you about > 900MB/sec. I'm not sure I understand this. When reading from rsd1c with dd(1) I get 500MB/s. Does that mean that dd or the kernel is parallelizing requests?

