[Hdf-forum] Very poor performance of pHDF5 when using single (shared) file

Daniel Langr Fri, 30 Aug 2013 08:22:13 -0700

I've run some benchmark, where within an MPI program, each process wrote3 plain 1D arrays to 3 datasets of an HDF5 file. I've used the followingwriting strategies:


1) each process writes to its own file,
2) each process writes to the same file to its own dataset,
3) each process writes to the same file to a same dataset.

I've tested 1)-3) for both fixed/chunked datasets (chunk size 1024), andI've tested 2)-3) for both independent/collective options of the MPIdriver. I've also used 3 different clusters for measurements (all quitemodern).

As a result, the running (storage) times of the same-file strategy, i.e.2) and 3), were of orders of magnitudes longer than the running times ofthe separate-files strategy. For illustration:

cluster #1, 512 MPI processes, each process stores 100 MB of data, fixeddata sets:


1) separate files: 2.73 [s]
2) single file, independent calls, separate data sets: 88.54[s]

cluster #2, 256 MPI processes, each process stores 100 MB of data,chunked data sets (chunk size 1024):


1) separate files: 10.40 [s]
2) single file, independent calls, shared data sets: 295 [s]
3) single file, collective calls, shared data sets: 3275 [s]

Any idea why the single-file strategy gives so poor writing performance?

Daniel

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

[Hdf-forum] Very poor performance of pHDF5 when using single (shared) file

Reply via email to