Yes, me too, I've also been in the IT industry since 1992, and understand 
product spec sheet theoretical maximums.

""dd" will always give you a sequential read or write workload which will 
always trigger optimisation functions in the various controllers and HDD/SSD 
firmware. The only thing it will show you is that your numbers will somewhat 
align to the numbers in the datasheet."

Which is what I'm initially trying to validate, but not seeing.

" Furthermore, reading from /dev/zero to push a workload will for sure skew 
numbers. The controllers are very smart and have been for a long time. If they 
"see" a certain data characteristic they can change the write behaviour to the 
physical platter or the NAND cells."

I had considered this, and if this was the case, wouldn't it give *better* 
performance?  If a DeDup identifies a pattern for instance, wouldn't I get 
*better* values in my tests than what the hardware was capable of?  In which 
circumstance would I see worse numbers when controller logic kicked in?

/AH


-----Original Message-----
From: Erwin van Londen <er...@erwinvanlonden.net> 
Sent: 24 July 2025 15:57
To: Henry, Andrew <andrew.he...@eon.se>; Zdenek Kabelac 
<zdenek.kabe...@gmail.com>; linux-lvm@lists.linux.dev
Subject: Re: striped LV and expected performance


Having worked in the storage industry since around 1995 with DEC, Compaq, 
Hewlett Packard and Hitachi Data Systems (Vantara) I've seen a fair amount of 
spec sheets. Be aware, what you see on these sheets are indeed optimum values 
measured against optimum characteristics for that piece of hardware. These 
sheets are only partially written by engineers but will have a marketing sauce 
added resulting in some potentially skewed information. Engineering information 
will most often outline the conditions of these numbers whereas marketing 
people will most remove them as it simply looks better.

  "dd" will always give you a sequential read or write workload which will 
always trigger optimisation functions in the various controllers and HDD/SSD 
firmware. The only thing it will show you is that your numbers will somewhat 
align to the numbers in the datasheet. Various OS settings on the scheduler, dm 
and filesystem can have a significant influence on these numbers. I'm pretty 
sure that the information datasheets is about the maximum you can suck out a 
piece of hardware. 
Everything you do on your side in the various OS layers will only negatively 
impact the raw performance numbers, let alone having a representative 
application workload pushed onto it.

"fio" gives you a bit more options and parameters however this will also depend 
significantly on how the kernel and it's IO layers such as the device mapper 
and filesystems interact with the hardware. Using zones for example would 
require insight into the way the hardware is build, especially on HDD's. If you 
don't have that you may as well put a wet finger in the air and go for a trial 
and error run.

Furthermore, reading from /dev/zero to push a workload will for sure skew 
numbers. The controllers are very smart and have been for a long time. If they 
"see" a certain data characteristic they can change the write behaviour to the 
physical platter or the NAND cells.

Everything you do that does not reflect a real life workload is a "just for 
shits and giggles" exercise but will not give any real meaningful outcome. 
Believe me, I've been through this discussion more than once.

>
>
>
> -----Original Message-----
> From: Erwin van Londen <er...@erwinvanlonden.net>
> Sent: 21 July 2025 05:26
> To: Zdenek Kabelac <zdenek.kabe...@gmail.com>; Henry, Andrew 
> <andrew.he...@eon.se>; linux-lvm@lists.linux.dev
> Subject: Re: striped LV and expected performance
>
>
> 3. Unreal cache optimisations. Using dd is by far the worst option to use for 
> performance tests as it will never (Ok, almost never) align with real 
> workloads. If you use dd for performance test you will find that this will 
> backfire in most cases when a normal workload is applied. The main reason is 
> that dd will always have a sequential workload unless you start a large 
> amount of dd instances to the same disk at once with different offsets. Even 
> then you will see an obscure number coming back.
>


Reply via email to