On Wed, 2011-02-23 at 14:41, Quincey Koziol wrote: > > > > Ah, yes, that may be a good segue into this two-pass feature. I've > been thinking about this feature and wondering about how to implement > it. Something that occurs to me would be to construct it like a > "transaction", where the application opens a transaction, the HDF5 > library just records those operations performed with API routines, > then when the application closes the transaction, they are replayed > twice: once to record the results of all the operations, and then a > second pass that actually performs all the I/O. That would help to > reduce the overhead from the collective metadata modification overhead > also. > > BTW, if we go down this "transaction" path, it allows the HDF5 > library to push the fault tolerance up to the application level - the > library could guarantee that the atomicity of what was "visible" in > the file was an entire checkpoint, rather than the atomicity being on > a per-API call basis.
Hmm. Thats only true if 'transaction' is whole file scope, right? I mean aren't you going to allow application to decide what 'granularity' a transaction should be; a single dataset, a bunch of datasets in a group in the file, etc. If scope of 'transaction' is only a whole-file, then... I may be misunderstanding your notions here but I don't think you'd want to design this around the assumption that a 'transaction' could embody something that included all buffer pointers passed into HDF5 by caller and then HDF5 could automagically FINISH the transaction on behalf of the application without returning control back to the application. I think there are going to be too many situations where applications unwind their own internal data structures placing data into temporary buffers that are then handed off to HDF5 for I/O and freed. And, for a given HDF5 file, this likely happens again and again as different parts of the application's internal data is spit out to HDF5. But, not to worry. My idea included the notion the application would have to re-engage in all such 'data prep for I/O' processes a second time. I assume time to complete such process, relative to actual I/O time, is small enough that it doesn't matter to the application that it has to do it twice. I think for most applications, that would be true and relatively easy to engineer to engage in the work in two passes. Mark -- Mark C. Miller, Lawrence Livermore National Laboratory ================!!LLNL BUSINESS ONLY!!================ [email protected] urgent: [email protected] T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511 _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
