We use hdf5 for parallel I/O in VORPAL, our laser plasma simulation code. For the most part, it works fine, but on certain machines (e.g., early Cray and BG/P) and certain types of filesystems, we've noticed that parallel I/O hangs, so we instituted a -id (individual dump) option which causes each MPI rank to dump its own hdf5 file, and once the simulation is complete, we merge the individual dump files.
We have a customer for whom parallel I/O is hanging, and they are using -id as described above. We're trying to pinpoint why parallel I/O is not working on their system, which is CentOS 5.5 cluster. In the past we ourselves have had problems with parallel I/O failing on ext3 filesystems, so we reformatted as XFS and the problem went away. Our customer did this, but the problem still persists. Anyone have any words of wisdom as to what other things could cause parallel I/O to hang? Thanks for any help! Dave _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
