We have a distributed application that uses HDF5 files on a NFS host. Multiple 
hosts all connect to the single NFS host and read and write HDF5 files.  We 
have limited the ability for concurrent write access to a single file. What we 
have encountered is several instances of low-level HDF5 file corruption. My 
initial determination was that network interruptions or partitions cause 
corruption.  In the sequence of events for updating a dataset or adding an 
attribute to a dataset there are multiple seek and write operations at the low 
level sec2 driver. There is no transactional support or atomicity in any data 
mutations that occur.

My question is this, is there an option in the HDF5 API that can support 
transactions and eliminate any corruption in the event of a network 
interruption (basically if write returns -1 at any point). Whether it be atomic 
write operations, or a cleverly ordered set of writes that result in file 
consistency at an atomic level?  Or maybe I should be using a different driver 
for access.

Have anyone else experienced these kinds of issues with HDF5 on top of NFS?  I 
spoke with the h5py developer and he informed me that a common solution is to 
copy the file to a local hard disk, make the changes and copy it back (and then 
move it, which is atomic).

Thanks!


Luke Campbell
Software Engineer
RPS ASA
55 Village Square Drive
South Kingstown  RI  02879-8248  USA
Tel: +1 (401) 789-6224 ext 359
Cell: (860) 381-0387

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Reply via email to