Re: [Hdf-forum] problems with parallel I/O

Rob Latham Wed, 27 Oct 2010 07:17:19 -0700

On Tue, Oct 26, 2010 at 04:52:22PM -0600, Dave Wade-Stein wrote:

> We have a customer for whom parallel I/O is hanging, and they are
> using -id as described above. We're trying to pinpoint why parallel
> I/O is not working on their system, which is CentOS 5.5 cluster.


It would be really helpful to see the state of these processes when a
hang occurs.  Are they stuck in an i/o call?  stuck in a collective
because not everyone participated?  if they are stuck in a collective,
is it an I/O collective or a messaging collective?

How parallel is this program?  If we're talking 4-way or 8-way
parallelism then maybe one can run it in gdb and collect a backtrace
of all the processors?   (mpiexec -np 8 xterm -e gdb ...)

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] problems with parallel I/O

Reply via email to