Hi all,

I'm developing XH5For <https://github.com/victorsndvg/XH5For>, an OO
fortran light-weight XDMF/HDF5 layer, and now I would like to test its
scalability, but I'm lost with an issue that i'm getting.

I'm going to try to explain it as best as I can.

I'm performing some weak scalability tests on Marenostrum III
<http://www.bsc.es/user-support/mn3.php> (GPFS file system) using
Collective Writing with the Contiguous HyperSlab strategy. I'm running
tests with 1, 16, 32, 64, 128, 256, 512, 1024 and 2048 MPI tasks.

All seems to work as expected except for the 2048 MPI tasks test, where I
think I'm getting an MPI interlock (it's still running without doing
anything till the job time exceeds and the job is killed).

After that, i try to reproduce an error with a number of MPI tasks between
1024-2048, and I could get the following error message while launching a
smaller size job, with 1164 MPI tasks:

HDF5-DIAG: Error detected in HDF5 (1.8.16) MPI-process 1009:
>   #000: H5Dio.c line 271 in H5Dwrite(): can't prepare for writing data
>     major: Dataset
>     minor: Write failed
>   #001: H5Dio.c line 352 in H5D__pre_write(): can't write data
>     major: Dataset
>     minor: Write failed
>   #002: H5Dio.c line 789 in H5D__write(): can't write data
>     major: Dataset
>     minor: Write failed
>   #003: H5Dmpio.c line 529 in H5D__contig_collective_write(): couldn't
> finish shared collective MPI-IO
>     major: Low-level I/O
>     minor: Write failed
>

I have used the following libraries/versions during compilation stage:

   - intel/16.0.1
   - impi/5.1.2.150
   - HDF5/1.8.16-mpi

Here you can see how I open the HDF5 file for Collective Writing:
https://github.com/victorsndvg/XH5For/blob/master/src/lib/hdf5_handler/hdf5_handler.f90#L531

And here, how I write HyperSlabs:
https://github.com/victorsndvg/XH5For/blob/master/src/lib/hdf5_handler/contiguous_hyperslab/hdf5_contiguous_hyperslab_handler.f90#L102

Note: ENABLE_MPI, ENABLE_HDF5. ENABLE_PARALLEL_HDF5 definition flags are
> enabled
>

Could anyone give me some ligth about this?

I would greatly appreciate your help!

Thank you in advance,
VĂ­ctor.
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to