Yes, me too, I've also been in the IT industry since 1992, and understand product spec sheet theoretical maximums.
""dd" will always give you a sequential read or write workload which will always trigger optimisation functions in the various controllers and HDD/SSD firmware. The only thing it will show you is that your numbers will somewhat align to the numbers in the datasheet." Which is what I'm initially trying to validate, but not seeing. " Furthermore, reading from /dev/zero to push a workload will for sure skew numbers. The controllers are very smart and have been for a long time. If they "see" a certain data characteristic they can change the write behaviour to the physical platter or the NAND cells." I had considered this, and if this was the case, wouldn't it give *better* performance? If a DeDup identifies a pattern for instance, wouldn't I get *better* values in my tests than what the hardware was capable of? In which circumstance would I see worse numbers when controller logic kicked in? /AH -----Original Message----- From: Erwin van Londen <er...@erwinvanlonden.net> Sent: 24 July 2025 15:57 To: Henry, Andrew <andrew.he...@eon.se>; Zdenek Kabelac <zdenek.kabe...@gmail.com>; linux-lvm@lists.linux.dev Subject: Re: striped LV and expected performance Having worked in the storage industry since around 1995 with DEC, Compaq, Hewlett Packard and Hitachi Data Systems (Vantara) I've seen a fair amount of spec sheets. Be aware, what you see on these sheets are indeed optimum values measured against optimum characteristics for that piece of hardware. These sheets are only partially written by engineers but will have a marketing sauce added resulting in some potentially skewed information. Engineering information will most often outline the conditions of these numbers whereas marketing people will most remove them as it simply looks better. "dd" will always give you a sequential read or write workload which will always trigger optimisation functions in the various controllers and HDD/SSD firmware. The only thing it will show you is that your numbers will somewhat align to the numbers in the datasheet. Various OS settings on the scheduler, dm and filesystem can have a significant influence on these numbers. I'm pretty sure that the information datasheets is about the maximum you can suck out a piece of hardware. Everything you do on your side in the various OS layers will only negatively impact the raw performance numbers, let alone having a representative application workload pushed onto it. "fio" gives you a bit more options and parameters however this will also depend significantly on how the kernel and it's IO layers such as the device mapper and filesystems interact with the hardware. Using zones for example would require insight into the way the hardware is build, especially on HDD's. If you don't have that you may as well put a wet finger in the air and go for a trial and error run. Furthermore, reading from /dev/zero to push a workload will for sure skew numbers. The controllers are very smart and have been for a long time. If they "see" a certain data characteristic they can change the write behaviour to the physical platter or the NAND cells. Everything you do that does not reflect a real life workload is a "just for shits and giggles" exercise but will not give any real meaningful outcome. Believe me, I've been through this discussion more than once. > > > > -----Original Message----- > From: Erwin van Londen <er...@erwinvanlonden.net> > Sent: 21 July 2025 05:26 > To: Zdenek Kabelac <zdenek.kabe...@gmail.com>; Henry, Andrew > <andrew.he...@eon.se>; linux-lvm@lists.linux.dev > Subject: Re: striped LV and expected performance > > > 3. Unreal cache optimisations. Using dd is by far the worst option to use for > performance tests as it will never (Ok, almost never) align with real > workloads. If you use dd for performance test you will find that this will > backfire in most cases when a normal workload is applied. The main reason is > that dd will always have a sequential workload unless you start a large > amount of dd instances to the same disk at once with different offsets. Even > then you will see an obscure number coming back. >