Hi again, In order to help to understand what is happening, I've implemented and example using the HDF5 library.
You can see it in the following link: https://github.com/victorsndvg/XH5For/blob/master/src/examples/hdf5_performance_test/ch_unstructured_hexahedron_perf.f90 I've compiled and launched it reproducing the previous mail and I'm getting the same behavior/errors with the new code. I also compiled and linked against HDF5 1.8.17 by myself with the same results. I'm not using any tunning hint ... this could be the problem? Thanks! Víctor 2016-07-19 11:08 GMT+02:00 victor sv <[email protected]>: > Hi all, > > I'm developing XH5For <https://github.com/victorsndvg/XH5For>, an OO > fortran light-weight XDMF/HDF5 layer, and now I would like to test its > scalability, but I'm lost with an issue that i'm getting. > > I'm going to try to explain it as best as I can. > > I'm performing some weak scalability tests on Marenostrum III > <http://www.bsc.es/user-support/mn3.php> (GPFS file system) using > Collective Writing with the Contiguous HyperSlab strategy. I'm running > tests with 1, 16, 32, 64, 128, 256, 512, 1024 and 2048 MPI tasks. > > All seems to work as expected except for the 2048 MPI tasks test, where I > think I'm getting an MPI interlock (it's still running without doing > anything till the job time exceeds and the job is killed). > > After that, i try to reproduce an error with a number of MPI tasks between > 1024-2048, and I could get the following error message while launching a > smaller size job, with 1164 MPI tasks: > > HDF5-DIAG: Error detected in HDF5 (1.8.16) MPI-process 1009: >> #000: H5Dio.c line 271 in H5Dwrite(): can't prepare for writing data >> major: Dataset >> minor: Write failed >> #001: H5Dio.c line 352 in H5D__pre_write(): can't write data >> major: Dataset >> minor: Write failed >> #002: H5Dio.c line 789 in H5D__write(): can't write data >> major: Dataset >> minor: Write failed >> #003: H5Dmpio.c line 529 in H5D__contig_collective_write(): couldn't >> finish shared collective MPI-IO >> major: Low-level I/O >> minor: Write failed >> > > I have used the following libraries/versions during compilation stage: > > - intel/16.0.1 > - impi/5.1.2.150 > - HDF5/1.8.16-mpi > > Here you can see how I open the HDF5 file for Collective Writing: > > https://github.com/victorsndvg/XH5For/blob/master/src/lib/hdf5_handler/hdf5_handler.f90#L531 > > And here, how I write HyperSlabs: > > https://github.com/victorsndvg/XH5For/blob/master/src/lib/hdf5_handler/contiguous_hyperslab/hdf5_contiguous_hyperslab_handler.f90#L102 > > Note: ENABLE_MPI, ENABLE_HDF5. ENABLE_PARALLEL_HDF5 definition flags are >> enabled >> > > Could anyone give me some ligth about this? > > I would greatly appreciate your help! > > Thank you in advance, > Víctor. > > >
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
