Some things to watch out for. . .

Are you by chance accidentally leaving one or more objects in the file 'open' 
(e.g. did you forget some H5Xclose() call somewhere). I cannot atest to that 
causing actual hangs in H5Fclose but I know HDF has some logic to detect 
possible infinite loop in sym-link/group structure for which it sometimes 
actually outputs a message along the lines of "…infinite loop detected while 
closing file 'foo.h5' . . .". i sometimes wind up using H5Fget_obj_count just 
prior to H5Fclose to try to debug this when it (occasionally) has happend for 
me.

You say you are running in parallel. Is the file on an actual parallel 
filesystem? Are you by chance mucking with the filesystem's metadata via calls 
to stat or mkdir or chdir at any time before or after your create or close the 
HDF5 file? If so, are you ensuring parallel sync. via MPI_barrier before 
proceeding after such calls?

The core counts you mention are small so you might be able to raise(SIGSTOP) 
just before H5Fclose and then gdb (or totalview) to several of the processes to 
see whats happening. Likewise, you mght be able to run valgrind on each process 
(sending output to separate files) to help debug too.

Sorry I don't have any other ideas. Good luck.

Mark



From: Wolf Dapp <[email protected]<mailto:[email protected]>>
Reply-To: HDF Users Discussion List 
<[email protected]<mailto:[email protected]>>
Date: Tuesday, April 7, 2015 9:30 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: [Hdf-forum] parallel HDF5: H5Fclose hangs when not using a power of 2 
number of processes

Dear hdf-forum members,

I have a problem I am hoping someone can help me with. I have a program
that outputs a 2D-array (contiguous, indexed linearly) using parallel
HDF5. When I choose a number of processors that is not a power of 2
(1,2,4,8,...) H5Fclose() hangs, inexplicably. I'm using HDF5 v.1.8.14,
and OpenMPI 1.7.2, on top of GCC 4.8 with Linux.

Can someone help me pinpoint my mistake?

I have searched the forum, and the first hit [searching for "h5fclose
hangs"] was a user mistake that I didn't make (to the best of my
knowledge). The second didn't go on beyond the initial problem
description, and didn't offer a solution.

Attached is a (maybe insufficiently bare-boned, apologies) demonstrator
program. Strangely, the hang only happens if nx >= 32. The code is
adapted from an HDF5 example program.

The demonstrator is compiled with
h5pcc test.hangs.cpp -DVERBOSE -lstdc++

( on my system, for some strange reason, MPI has been compiled with the
deprecated C++ bindings. I need to include -lmpi_cxx also, but that
shouldn't be necessary for anyone else. I hope that's not the reason for
the hang-ups. )

Thanks in advance for your help!

Wolf Dapp


--



_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to