Re: [Hdf-forum] HDF5 library hangs in call to H5DWrite

Håkon Strandenes Sun, 14 Oct 2012 13:54:22 -0700

Thanks for that. Everything is much more clear now.

As I said in my previous post, fixing this issue immediately causedanother issue to appear (we are still in the same code). However, whenthe previous issue appeared after a few processes and was quiteirreguar, this one occurs at least deterministic.

The problem is much like the previous, except that it is now a call toH5Dcreate2 that hangs. I have tried this on a few problem sizes (innumber of cells) and it seems that the crucial thing is the number ofprocesses. I have made these tests:

1: 16 processes successfully written mesh + 100 time steps, a total of12 GB (approx). No issues.

2: 32 processes writes mesh, and hangs when creating data sets for timestep no. 4. (same problem size as above).

3: 64 processes writes mesh and hangs when creating data sets for timestep no. 1 (same problem size as above).

Above 64 processes, the code hangs already when the mesh is written (themesh datasets is created in the same way as the field data datasets).

Again no error messages appear. If I attach GDB to the hangingprocesses, I find that it stops when creating exactly the same dataseteach time.

I have tried to double- and triple check that I close all resourcesafter use, and I found no errors.


Does any of you have any idea about what the error could be?

Thanks,
Håkon


On 14. okt. 2012 17:02, Mohamad Chaarawi wrote:

Hi Håkon,

On 10/14/2012 3:54 AM, Håkon Strandenes wrote:

Thanks, I had a suspicion about that. Some more problems have appeared
(H5Dcreate2 freezes/hangs after a few writes), but I will try to debug
some more before I ask you...

Anyways; I have a few questions about how HDF5 does the writes. When I
now do independent I/O (each process prints to it's own dataset), will
the actual data transfer happen in parallel, assuming the underlaying
file system is parallel (Lustre)?


Yes independent transfer just translates to independent MPI-I/O
read/write operations. So if your file system supports parallel file
access, The HDF5 operations would occur in parallel. There is ofcourse
the issue with Lustre, that data access on the same OST are serialized,
but if your datasets are large enough, it would not be a huge issue.

The reason why I ask is of course that I have a large parallel
simulation (with thousands of processes), and if each rank should wait
for the lower ranks to finish (i.e. rank 4 must wait for rank 3 to
finish, rank 3 must wait for rank 2 etc.), the I/O operations could
take a tremendous amount of time.

Since the data is domain decomposed, I also thought it would be
easiest to write each domain to a dataset, instead of trying to
"stitch" together the domains before writes (witch would require quite
a bit of communication and CPU cycles).


Yes this is one use case that we are considering supporting better
in-terms of non-collective metadata access (so you don't have to call
H5Dcreate n times). Another ongoing (but separate) work includes what I
mentioned earlier, where we have H5Dread/write_multi, where you can
access several datasets in one call collectively or independently.

Thanks,
Mohamad


Håkon


On 14. okt. 2012 03:31, Mohamad Chaarawi wrote:

Yes Mark is correct. You program is erroneous.
The current interface for reading and writing to datasets (collectively)
requires all processes to call the operation for each read/write
operation. You can correct your program by having each processes
participate with a NULL selection in the read/write operation, except
for the dataset that belongs to that process, or just use independent
I/O.

We are working on a new interface that would allow collective access to
multiple datasets simultaneously, so stay tuned :-)

Thanks,
Mohamad

On 10/13/2012 10:44 AM, Mark Miller wrote:

I think the problem may be that you are trying to execute a collective
write to separate datasets. That would explain why collective hangs and
independent succeeds.

I am a bit rusty on HDF5's parallel I/O semantics but AFAIK, a
collective write can only be done to the same dataset. That does NOT
mean each processor has to have an identical dsetID (e.g.
memcmp(&proc_1_dsetID, &proc_2_dsetID, sizeof(hid_t)) may be nonzero)
but it does mean the dataset object to which each processor's dsetID
references in the file has to be the same. In other words the name (or
path) of the dataset used in the create/open call needs to have been
the
same.

To issue writes to different datasets simultaneously in parallel, I
think you're only option is independent.

I wonder if your aiming to do collective to different datasets because
you expect that collective will be more easily 'coordinated' by the
underlying filsystem and therefore has a higher chance at better
performance than independent. If so, I don't know if that very often
turns out to be true/possible in practice.

I hope others with a little more parallel I/O experience might chime
in ;)

Mark


On Sat, 2012-10-13 at 10:48 +0200, Håkon Strandenes wrote:

Hi,

I have (yet) another problem with the HDF5 library. I am trying to
write
some data in parallel to a file, where each process writes it's
data to
it's own dataset. The datasets are first created (as collective
operations), and then H5Dwrite hangs when the data are to be
written. No
error messages are printed, the processes just hangs. I have used
GDB on
the hanging processes (all processes), and confirmed that it is
actually
H5Dwrite that hangs.

The strange thing is that this does not always happen, sometimes it
works fine. To make it even stranger, it seems that the probability of
failure increases with increased problem size and number of processes
(or is that really strange?). This writes are in a time-loop, and
sometimes a few steps finishes before one write hangs.

I have also found out that if I set the transfer mode to
H5FD_MPIO_INDEPENDENT it seems that everything is working fine.

I have tried this on two computers, one workstation and one
cluster. The
workstation uses OpenMPI with HDF5 1.8.4 and the cluster uses SGI's
MPT-MPI with HDF5 1.8.7. Based on the completely different MPI
packages
and systems, I think MPI and other system issues can be ruled out. The
resulting sources of error is then my code (probably) and HDF5 (not so
sure about that).

I have attached an example code that shows how I am doing the
HDF5-stuff. Unfortunately it is not runnable, but at least you can see
how I create and write to the dataset.

Thanks in advance for all help.

Best regards,
Håkon Strandenes
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org



_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] HDF5 library hangs in call to H5DWrite

Reply via email to