On 24/7/25 17:09, Henry, Andrew wrote: > I've been thinking about the point below, about using dd as a performance > test. Although I do wholeheartedly agree that real-life workload tests must > to be conducted (ideally where a production workload is recorded, and then > played back in a test environment), I still think that dd and fio provide > valuable insights into disk performance. > > On real physical hardware, where I know all of the variables, the product > spec sheets, and the theoretical performance ceiling of all the components, I > actually see results that are close to the physical capabilities of the > hardware involved when I use dd or fio. To be clear, fio is showing the same > results as dd when using direct writes, which indicates that dd isn't that > bad of a quick test and is a viable replacement for fio to be able to get a > quick idea of where performance lies. > > In these cases, testing a system with dd or fio, what I'm after is a > validation of the physical capabilities of the hardware, as a starting point > in performance tuning. If these results aren't in line with expectations, > then it's pointless tuning any other layer until this is resolved. > > Unless someone can provide evidence that dd or fio do not and simply can not > show true hardware speeds when testing in a virtualized environment, then I > can only continue to use these tools in benchmarking. > > Because when it comes down to it, we buy hardware for their performance > characteristics, and want to verify that we are seeing that same performance > in our environment after purchase. > > /AH
Having worked in the storage industry since around 1995 with DEC, Compaq, Hewlett Packard and Hitachi Data Systems (Vantara) I've seen a fair amount of spec sheets. Be aware, what you see on these sheets are indeed optimum values measured against optimum characteristics for that piece of hardware. These sheets are only partially written by engineers but will have a marketing sauce added resulting in some potentially skewed information. Engineering information will most often outline the conditions of these numbers whereas marketing people will most remove them as it simply looks better. "dd" will always give you a sequential read or write workload which will always trigger optimisation functions in the various controllers and HDD/SSD firmware. The only thing it will show you is that your numbers will somewhat align to the numbers in the datasheet. Various OS settings on the scheduler, dm and filesystem can have a significant influence on these numbers. I'm pretty sure that the information datasheets is about the maximum you can suck out a piece of hardware. Everything you do on your side in the various OS layers will only negatively impact the raw performance numbers, let alone having a representative application workload pushed onto it. "fio" gives you a bit more options and parameters however this will also depend significantly on how the kernel and it's IO layers such as the device mapper and filesystems interact with the hardware. Using zones for example would require insight into the way the hardware is build, especially on HDD's. If you don't have that you may as well put a wet finger in the air and go for a trial and error run. Furthermore, reading from /dev/zero to push a workload will for sure skew numbers. The controllers are very smart and have been for a long time. If they "see" a certain data characteristic they can change the write behaviour to the physical platter or the NAND cells. Everything you do that does not reflect a real life workload is a "just for shits and giggles" exercise but will not give any real meaningful outcome. Believe me, I've been through this discussion more than once. > > > > -----Original Message----- > From: Erwin van Londen <er...@erwinvanlonden.net> > Sent: 21 July 2025 05:26 > To: Zdenek Kabelac <zdenek.kabe...@gmail.com>; Henry, Andrew > <andrew.he...@eon.se>; linux-lvm@lists.linux.dev > Subject: Re: striped LV and expected performance > > > 3. Unreal cache optimisations. Using dd is by far the worst option to use for > performance tests as it will never (Ok, almost never) align with real > workloads. If you use dd for performance test you will find that this will > backfire in most cases when a normal workload is applied. The main reason is > that dd will always have a sequential workload unless you start a large > amount of dd instances to the same disk at once with different offsets. Even > then you will see an obscure number coming back. >