Hi Thomas,

LEIBOVICI Thomas wrote:
You will find in attachment to this mail some benchmarks we made on Linux with pios over ZFS-DMU.
There are some interesting things about ZFS tuning and some ideas for breaking a bottleneck we identified in DMU.

Those are some very, very interesting benchmarks.

Regarding the "ZFS striping performance", you noticed that increasing the number of threads beyond a certain point didn't improve performance, in fact it actually decreased.
I think that was something expected due to the fact that the DMU already has a good streamlined I/O pipeline which already parallelizes I/O, and increasing the number of threads greatly beyond the number of cpus causes contention to increase which causes I/O throughput to decrease.

However, it is in fact unfortunate that more luns didn't improve performance. I wonder if you were hitting a CPU wall?
In a previous benchmark I ran, I noticed that PIOS was not getting improved throughput with more disks, even though there was still a significant percentage of available CPU time.
So I guess we still have opportunities to do good optimizations.

The "ZIO threads" is also something that I highly suspected had an impact in throughput, which is why when I benchmarked the DMU on the Thumper I increased them from 8 to 24. It is good to have hard data that confirms this.

The section about parallelizing checksums is something that the ZFS team appears to have solved already.
You can see this code section: http://www.wizy.org/mercurial/zfs-lustre/file/49c2aaa6a859/src/lib/libzfscommon/include/sys/zio_impl.h#101

If you take a look at the "ZIO_WRITE_COMMON_STAGES", you will notice that just before the "checksum generate" stage there is an "issue async" stage. This "issue async" stage basically consists in dispatching the I/O (ZIO) to the ZIO thread pool, which effectively causes them to be parallelized. The I/O dependencies are automatically tracked by the ZIO pipeline.

All in all, this was a very good report.

So far we have only done very limited benchmarking and optimization, but we are already starting to work on performance improvements. One of the tasks of our next development cycle will be doing this kind of analysis but, of course, the sooner we see this, the better :)

Great work and thanks for sharing this with us!

Best regards,
Ricardo

--


Ricardo Manuel Correia

Lustre Engineering
Sun Microsystems, Inc.
Portugal

_______________________________________________
Lustre-devel mailing list
Lustre-devel@clusterfs.com
https://mail.clusterfs.com/mailman/listinfo/lustre-devel

Reply via email to