Ok, well this is interesting. Basically it comes down to whether we want to starve read operations or whether we want to starve write operations.
The FreeBSD results starve read operations, while the DragonFly results starve write operations. That's the entirety of the difference between the two tests. The final numbers don't do justice to this... if you look at the raw numbers though it is apparent. When the blogbench test blows out system caches the read activity on FreeBSD drops into the ~600 range while on DragonFly the read activity drops to the ~25000 range. At the same time FreeBSD's write activity stays in the ~4000 range while DragonFly's write activity drops into the ~50's. I tracked the reason for the DragonFly write activity dropping. It basically comes down to the backlog of inodes in HAMMER needing reclamation. Due to the heavy concurrent read load the HAMMER flusher is constantly stuck in B-Tree locks and cannot flush inode meta-data out quickly enough to keep up with blogbench. Once it hits the inode backlog limit (25000) write throughput goes down drastically. While one can increase the limit (vfs.hammer.limit_reclaim), all that happens is that HAMMER takes a little longer before it hits it, at least in the blogbench test. For more bursty bulk write operations increasing the limit would be a good tuning parameter. Frankly both FreeBSDs and DragonFlys results are incorrect. FreeBSD is killing read performance way way way too much while DragonFly is killing write performance way way way too much. I'm not sure how it could be fixed, though. I can definitely reduce B-Tree deadlocks in HAMMER by unlocking b-tree nodes during synchronous read I/O (for meta-data), but the result that we really want is more balanced read vs write performance, not these extreme tilts that we see. Also note that blogbench's 'final' results are worthless. The read performance is mostly counting the pre-cache-blowout numbers. DragonFly's read performance is 41x FreeBSD's once the caches are blown out, while FreeBSD's write performance is 80x DragonFly's write performance once the caches are blown out. Reads tend to be less localized than writes so, generally speaking, the disk bandwidth *IS* being used fairly efficiently in both cases. But neither result is really acceptable IMHO. This is all with swapcache turned off. The only way to test in a fair manner with swapcache turned on (with a SSD) is if the FreeBSD test used a similar setup w/ZFS. -Matt Matthew Dillon <dil...@backplane.com>