On Mon, May 06, 2013 at 11:02:31AM -0400, Mehmet Belgin wrote: > Rob, > > Thanks a lot for your suggestions, I will try them and keep the list updated. > And yes, I meant mvapich2... I too heard about unconfirmed mvapich sightings > in the wild so thanks for checking :) > > Oh, also, I installed a no-romio version of mvapich2 and recompiled HDF5 with > it, but still seeing the same problem :(
no-romio? You'll need *some* implementation of MPI-IO for parallel HDF5 to work, and the problems you are seeing are manifesting themselves in ROMIO-specific error messages. I think you might have to get Panasas involved here, since the problem seems to be below the HDF5 layer in the software stack. As the ROMIO maintainer, I'll be happy to work with Panasas to improve any bugs in their ad_panfs driver. If you do contact Panasas, please keep me CCed. I'll be happy to incorporate any fixes they think are needed. ==rob > > -Mehmet > > > On May 6, 2013, at 10:47 AM, Rob Latham wrote: > > > On Fri, Apr 19, 2013 at 12:47:40PM -0400, Mehmet Belgin wrote: > >> Hello everyone, > >> > >> We cannot use parallel HDF5 on any of our systems. The processes either > >> crash or hang (and they work with sequential HDF5). > >> > >> On NFS, we are getting: > >> > >> ADIOI_Set_lock:: No locks available > >> ADIOI_Set_lock:offset 69744, length 256 > >> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 124 > >> File locking failed in ADIOI_Set_lock(fd 25,cmd F_SETLKW/7,type > >> F_WRLCK/1,whence 0) with return value FFFFFFFF and errno 25. > >> If the file system is NFS, you need to use NFS version 3, ensure that the > >> lockd daemon is running on all the machines, and mount the directory with > >> the 'noac' option (no attribute caching). > > > > NFS is tricky to get right, and often requires turning off any and all > > caching. Let's set the NFS issue aside for now. > > > >> On Panasas: > >> > >> ADIOI_PANFS_RESIZE: Rank 13: Resize failed: requested=46996328 > >> actual=9187464. > >> > >> We are using intel 12.1.4, mvapich1.6 (tested with 1.8 and 1.9 as well) > >> and HDF5 1.8.10. > >> > >> Is this a known problem, and do you know any workarounds without turning > >> of the parallel capabilities of HDF5? > > > > Parallel HDF5 works on a lot of other environments, but I don't have > > any experience with Panasas or the Panasas-contributed ADIO driver. > > > >> Any suggestions you may have will be appreciated! > > > > The simplest workaround will be to select other ROMIO drivers. > > - When accessing the Panasas file system, try prefixing the file name > > you pass to HDF5 with "ufs:". This will turn off any > > panasas-specific optimizations, unfortunately, but lots of folks use > > the default "unix file system" > > driver. > > > > Also, nvapic1.6 is based on an ancient version of ROMIO. If you've > > got any way to use mvapich2, there are undoubtedly some fixes that > > might make your life better. > > > > Perhaps you meant 'mvapich2' .. I've seen mvapich 1 "in the wild" > > enough times, though, that I thought I should double-check. > > > > ==rob > > > > -- > > Rob Latham > > Mathematics and Computer Science Division > > Argonne National Lab, IL USA > > > > _______________________________________________ > > Hdf-forum is for HDF software users discussion. > > [email protected] > > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org -- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
