We have a distributed application that uses HDF5 files on a NFS host. Multiple hosts all connect to the single NFS host and read and write HDF5 files. We have limited the ability for concurrent write access to a single file. What we have encountered is several instances of low-level HDF5 file corruption. My initial determination was that network interruptions or partitions cause corruption. In the sequence of events for updating a dataset or adding an attribute to a dataset there are multiple seek and write operations at the low level sec2 driver. There is no transactional support or atomicity in any data mutations that occur.
My question is this, is there an option in the HDF5 API that can support transactions and eliminate any corruption in the event of a network interruption (basically if write returns -1 at any point). Whether it be atomic write operations, or a cleverly ordered set of writes that result in file consistency at an atomic level? Or maybe I should be using a different driver for access. Have anyone else experienced these kinds of issues with HDF5 on top of NFS? I spoke with the h5py developer and he informed me that a common solution is to copy the file to a local hard disk, make the changes and copy it back (and then move it, which is atomic). Thanks! Luke Campbell Software Engineer RPS ASA 55 Village Square Drive South Kingstown RI 02879-8248 USA Tel: +1 (401) 789-6224 ext 359 Cell: (860) 381-0387
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
