On Tue, Sep 17, 2013 at 11:15:02AM +0200, Daniel Langr wrote: > separate files: 1.36 [s] > single file, 1 stripe: 133.6 [s] > single file, best result: 17.2 [s] > > (I did multiple runs with various combinations of strip count and > size, presenting the best results I have obtained.) > > Increasing the number of stripes obviously helped a lot, but > comparing with the separate-files strategy, the writing time is > still more than ten times slower . Do you think it is "normal"?
It might be "normal" for Lustre, but it's not good. I wish I had more experience tuning the Cray/MPI-IO/Lustre stack, but I do not. The ADIOS folks report tuned-HDF5 to a single shared file runs about 60% slower than ADIOS to multiple files, not 10x slower, so it seems there is room for improvement. I've asked them about the kinds of things "tuned HDF5" entails, and they didn't know (!). There are quite a few settings documented in the intro_mpi(3) man page. MPICH_MPIIO_CB_ALIGN will probably be the most important thing you can try. I'm sorry to report that in my limited experience, the documentation and reality are sometimes out of sync, especially with respect to which settings are default or not. ==rob > Thanks, > Daniel > > Dne 30. 8. 2013 16:05, Daniel Langr napsal(a): > >I've run some benchmark, where within an MPI program, each process wrote > >3 plain 1D arrays to 3 datasets of an HDF5 file. I've used the following > >writing strategies: > > > >1) each process writes to its own file, > >2) each process writes to the same file to its own dataset, > >3) each process writes to the same file to a same dataset. > > > >I've tested 1)-3) for both fixed/chunked datasets (chunk size 1024), and > >I've tested 2)-3) for both independent/collective options of the MPI > >driver. I've also used 3 different clusters for measurements (all quite > >modern). > > > >As a result, the running (storage) times of the same-file strategy, i.e. > >2) and 3), were of orders of magnitudes longer than the running times of > >the separate-files strategy. For illustration: > > > >cluster #1, 512 MPI processes, each process stores 100 MB of data, fixed > >data sets: > > > >1) separate files: 2.73 [s] > >2) single file, independent calls, separate data sets: 88.54[s] > > > >cluster #2, 256 MPI processes, each process stores 100 MB of data, > >chunked data sets (chunk size 1024): > > > >1) separate files: 10.40 [s] > >2) single file, independent calls, shared data sets: 295 [s] > >3) single file, collective calls, shared data sets: 3275 [s] > > > >Any idea why the single-file strategy gives so poor writing performance? > > > >Daniel > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org -- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
