On 24/7/25 17:09, Henry, Andrew wrote:
> I've been thinking about the point below, about using dd as a performance 
> test.  Although I do wholeheartedly agree that real-life workload tests must 
> to be conducted (ideally where a production workload is recorded, and then 
> played back in a test environment), I still think that dd and fio provide 
> valuable insights into disk performance.
>
> On real physical hardware, where I know all of the variables, the product 
> spec sheets, and the theoretical performance ceiling of all the components, I 
> actually see results that are close to the physical capabilities of the 
> hardware involved when I use dd or fio.  To be clear, fio is showing the same 
> results as dd when using direct writes, which indicates that dd isn't that 
> bad of a quick test and is a viable replacement for fio to be able to get a 
> quick idea of where performance lies.
>
> In these cases, testing a system with dd or fio, what I'm after is a 
> validation of the physical capabilities of the hardware, as a starting point 
> in performance tuning.  If these results aren't in line with expectations, 
> then it's pointless tuning any other layer until this is resolved.
>
> Unless someone can provide evidence that dd or fio do not and simply can not 
> show true hardware speeds when testing in a virtualized environment, then I 
> can only continue to use these tools in benchmarking.
>
> Because when it comes down to it, we buy hardware for their performance 
> characteristics, and want to verify that we are seeing that same performance 
> in our environment after purchase.
>
> /AH

Having worked in the storage industry since around 1995 with DEC, 
Compaq, Hewlett Packard and Hitachi Data Systems (Vantara) I've seen a 
fair amount of spec sheets. Be aware, what you see on these sheets are 
indeed optimum values measured against optimum characteristics for that 
piece of hardware. These sheets are only partially written by engineers 
but will have a marketing sauce added resulting in some potentially 
skewed information. Engineering information will most often outline the 
conditions of these numbers whereas marketing people will most remove 
them as it simply looks better.

  "dd" will always give you a sequential read or write workload which 
will always trigger optimisation functions in the various controllers 
and HDD/SSD firmware. The only thing it will show you is that your 
numbers will somewhat align to the numbers in the datasheet. Various OS 
settings on the scheduler, dm and filesystem can have a significant 
influence on these numbers. I'm pretty sure that the information 
datasheets is about the maximum you can suck out a piece of hardware. 
Everything you do on your side in the various OS layers will only 
negatively impact the raw performance numbers, let alone having a 
representative application workload pushed onto it.

"fio" gives you a bit more options and parameters however this will also 
depend significantly on how the kernel and it's IO layers such as the 
device mapper and filesystems interact with the hardware. Using zones 
for example would require insight into the way the hardware is build, 
especially on HDD's. If you don't have that you may as well put a wet 
finger in the air and go for a trial and error run.

Furthermore, reading from /dev/zero to push a workload will for sure 
skew numbers. The controllers are very smart and have been for a long 
time. If they "see" a certain data characteristic they can change the 
write behaviour to the physical platter or the NAND cells.

Everything you do that does not reflect a real life workload is a "just 
for shits and giggles" exercise but will not give any real meaningful 
outcome. Believe me, I've been through this discussion more than once.

>
>
>
> -----Original Message-----
> From: Erwin van Londen <er...@erwinvanlonden.net>
> Sent: 21 July 2025 05:26
> To: Zdenek Kabelac <zdenek.kabe...@gmail.com>; Henry, Andrew 
> <andrew.he...@eon.se>; linux-lvm@lists.linux.dev
> Subject: Re: striped LV and expected performance
>
>
> 3. Unreal cache optimisations. Using dd is by far the worst option to use for 
> performance tests as it will never (Ok, almost never) align with real 
> workloads. If you use dd for performance test you will find that this will 
> backfire in most cases when a normal workload is applied. The main reason is 
> that dd will always have a sequential workload unless you start a large 
> amount of dd instances to the same disk at once with different offsets. Even 
> then you will see an obscure number coming back.
>



Reply via email to