Hi, With a strip count of 1, all your access to a single file will be done through one OSS. Contrary to the multiple-file case, you won't use the whole system bandwidth. This means that the poor performance is to be expected. >From what I gather, you should write on your FS with a chunk of "stripe size" aligned on "stripe size" from "stripe count" processes to have the maximum performance.
Cheers, Matthieu 2013/9/2 Daniel Langr <[email protected]>: > Hi Albert and Mohamad, > > haven't received e-mails with your replies :( so I cannot reply to them > specifically (or do not know how). So, replying to my original... > > @Albert: > > All the used clusters uses the Lustre file system, and I believe, this file > system should be scalable, at least to some extent. Apparently, it is > scalable for the single-file-per-process strategy. > > I understand the notice about memory-to-kernel writes. However, again, I am > comparing the single/multiple-file strategy. Both give quite different > results. Moreover, the multiple-file case correspond, within my > measurements, to the listed peak I/O bandwidth of the storage systems. The > single-file case is much much worse. Obviously, this is not limited by the > memory-kernel copying. > > Thanks for the link to hd5perf, I will try it. > > @Mohamad: > > Thanks for hint, all file systems are Lustre-based, indeed with default > strip count 1. I will rerun my measurements with different strip size/count > and post the results. > > Daniel > > > > Dne 30. 8. 2013 16:05, Daniel Langr napsal(a): > >> I've run some benchmark, where within an MPI program, each process wrote >> 3 plain 1D arrays to 3 datasets of an HDF5 file. I've used the following >> writing strategies: >> >> 1) each process writes to its own file, >> 2) each process writes to the same file to its own dataset, >> 3) each process writes to the same file to a same dataset. >> >> I've tested 1)-3) for both fixed/chunked datasets (chunk size 1024), and >> I've tested 2)-3) for both independent/collective options of the MPI >> driver. I've also used 3 different clusters for measurements (all quite >> modern). >> >> As a result, the running (storage) times of the same-file strategy, i.e. >> 2) and 3), were of orders of magnitudes longer than the running times of >> the separate-files strategy. For illustration: >> >> cluster #1, 512 MPI processes, each process stores 100 MB of data, fixed >> data sets: >> >> 1) separate files: 2.73 [s] >> 2) single file, independent calls, separate data sets: 88.54[s] >> >> cluster #2, 256 MPI processes, each process stores 100 MB of data, >> chunked data sets (chunk size 1024): >> >> 1) separate files: 10.40 [s] >> 2) single file, independent calls, shared data sets: 295 [s] >> 3) single file, collective calls, shared data sets: 3275 [s] >> >> Any idea why the single-file strategy gives so poor writing performance? >> >> Daniel > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
