Hi again,

In order to help to understand what is happening,  I've implemented and
example using the HDF5 library.

You can see it in the following link:
https://github.com/victorsndvg/XH5For/blob/master/src/examples/hdf5_performance_test/ch_unstructured_hexahedron_perf.f90

I've compiled and launched it reproducing the previous mail and I'm getting
the same behavior/errors with the new code.

I also compiled and linked against HDF5 1.8.17 by myself with the same
results.

I'm not using any tunning hint ... this could be the problem?

Thanks!
Víctor



2016-07-19 11:08 GMT+02:00 victor sv <[email protected]>:

> Hi all,
>
> I'm developing XH5For <https://github.com/victorsndvg/XH5For>, an OO
> fortran light-weight XDMF/HDF5 layer, and now I would like to test its
> scalability, but I'm lost with an issue that i'm getting.
>
> I'm going to try to explain it as best as I can.
>
> I'm performing some weak scalability tests on Marenostrum III
> <http://www.bsc.es/user-support/mn3.php> (GPFS file system) using
> Collective Writing with the Contiguous HyperSlab strategy. I'm running
> tests with 1, 16, 32, 64, 128, 256, 512, 1024 and 2048 MPI tasks.
>
> All seems to work as expected except for the 2048 MPI tasks test, where I
> think I'm getting an MPI interlock (it's still running without doing
> anything till the job time exceeds and the job is killed).
>
> After that, i try to reproduce an error with a number of MPI tasks between
> 1024-2048, and I could get the following error message while launching a
> smaller size job, with 1164 MPI tasks:
>
> HDF5-DIAG: Error detected in HDF5 (1.8.16) MPI-process 1009:
>>   #000: H5Dio.c line 271 in H5Dwrite(): can't prepare for writing data
>>     major: Dataset
>>     minor: Write failed
>>   #001: H5Dio.c line 352 in H5D__pre_write(): can't write data
>>     major: Dataset
>>     minor: Write failed
>>   #002: H5Dio.c line 789 in H5D__write(): can't write data
>>     major: Dataset
>>     minor: Write failed
>>   #003: H5Dmpio.c line 529 in H5D__contig_collective_write(): couldn't
>> finish shared collective MPI-IO
>>     major: Low-level I/O
>>     minor: Write failed
>>
>
> I have used the following libraries/versions during compilation stage:
>
>    - intel/16.0.1
>    - impi/5.1.2.150
>    - HDF5/1.8.16-mpi
>
> Here you can see how I open the HDF5 file for Collective Writing:
>
> https://github.com/victorsndvg/XH5For/blob/master/src/lib/hdf5_handler/hdf5_handler.f90#L531
>
> And here, how I write HyperSlabs:
>
> https://github.com/victorsndvg/XH5For/blob/master/src/lib/hdf5_handler/contiguous_hyperslab/hdf5_contiguous_hyperslab_handler.f90#L102
>
> Note: ENABLE_MPI, ENABLE_HDF5. ENABLE_PARALLEL_HDF5 definition flags are
>> enabled
>>
>
> Could anyone give me some ligth about this?
>
> I would greatly appreciate your help!
>
> Thank you in advance,
> Víctor.
>
>
>
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to