Hi everyone! I'm trying to run simulations at a nehalem-cluster which is using a lustre system for I/O. My code uses parallel HDF5 output writing timestep-groups with 3D data in one file, thus leading to one big outputfile when the simulation is done. The problem is, that parallel HDF5 needs file locking which must be provided by the lustre via some daemon or something else. This daemon will lead to massive performance losses of the lustre up to 50%. This is why the cluster-admins refuse to enable file locking at their lustre system. Consequently I will not be able to write anything. When I try it stops with:
File locking failed in ADIOI_Set_lock(fd 18,cmd F_SETLKW/7,type F_WRLCK/1,whence 0) with return value FFFFFFFF and errno 26. If the file system is NFS, you need to use NFS version 3, ensure that the lockd daemon is running on all the machines, and mount the directory with the 'noac' option (no attribute caching). ADIOI_Set_lock:: Function not implemented ADIOI_Set_lock:offset 6488, length 96 and so on. Is there any workaround for this or is HDF5 reliant on file locking respectively? Otherwise I will be not able to use this cluster. Thanks and best regards Sebastian -- View this message in context: http://hdf-forum.184993.n3.nabble.com/File-locking-of-parallel-HDF5-on-lustre-without-file-locking-support-tp2553896p2553896.html Sent from the hdf-forum mailing list archive at Nabble.com. _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
