On Tue, Feb 22, 2011 at 5:49 PM, Mark Howison <[email protected]>wrote:
> Hi Leigh, > > It is true that you need to align writes to Lustre stripe boundaries > to get reasonable performance to a single shared file. If you use > collective I/O, as Rob and Quincey have suggested, it will handles > this automatically (since mpt/3.2) by aggregating your data on a > subset of "writer" MPI tasks, then packaging the data into > stripe-sized writes. It will also try to set the number of writers to > the number of stripes. > > Alternatively, if you are writing the same amount of data from every > task, you can use an independent I/O approach that combines the HDF5 > chunking and alignment properties to guarantee stripe-sized writes. > The caveat is that your chunks will be padded with empty data out to > the stripe-size, so this potentially wastes space on disk. In some > cases, though, we have seen very good performance with independent I/O > even with up to thousands of tasks, for instance with our GCRM I/O > benchmark (based on a climate code) on Franklin and Jaguar (both Cray > XTs). You can read more about that in our "Tuning HDF5 for Lustre" > paper that you referenced in a previous email. If you go this route, > you will also want to use two other optimizations we describe in that > paper: disabling an ftruncate() call at file close that leads to > catastrophic delays on Lustre, and suspending metadata flushes until > file close (since the chunk indexing will generate considerable > metadata activity). > Do I assume correctly that using collective I/O that (quoting the "tuning hdf5 for lustre" document) phdf5 will both "select the correct stripe count" and also "align operations to stripe boundaries"? Will this apply even if I use subcommunicators to write several (or hundreds) of hdf5 files at the same time? I just want to be sure. It seems that collective I/O is the easy way to go if it takes care of the underlying decisions to optimize writing. However, do any assumptions go into this, or is HDF able to query the lfs parameters? On kraken, you can set the following parameters: number of bytes on each OST, index of the first stripe, and the number of OSTs to stripe. Seems the only parameter in question is the number of bytes per OST, and that the OST index of the first stripe should just be set to the default and that the number of OSTs should be set to the maximum value (160 on kraken). What strategy should I use to decide the number of bytes per OST? Should I try to make it roughly the chunk size I am using for 3D data? Or... ? You can set it anywhere from the kB to GB range. Leigh -- Leigh Orf Associate Professor of Atmospheric Science Department of Geology and Meteorology Central Michigan University Currently on sabbatical at the National Center for Atmospheric Research in Boulder, CO NCAR office phone: (303) 497-8200
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
