Re: [Hdf-forum] New "Chunking in HDF5" document

Mark Miller Wed, 23 Feb 2011 15:44:43 -0800

On Wed, 2011-02-23 at 14:41, Quincey Koziol wrote:
> > 
> >     Ah, yes, that may be a good segue into this two-pass feature.  I've
> been thinking about this feature and wondering about how to implement
> it.  Something that occurs to me would be to construct it like a
> "transaction", where the application opens a transaction,  the HDF5
> library just records those operations performed with API routines,
> then when the application closes the transaction, they are replayed
> twice: once to record the results of all the operations, and then a
> second pass that actually performs all the I/O.  That would help to
> reduce the overhead from the collective metadata modification overhead
> also.
> 
>       BTW, if we go down this "transaction" path, it allows the HDF5
> library to push the fault tolerance up to the application level - the
> library could guarantee that the atomicity of what was "visible" in
> the file was an entire checkpoint, rather than the atomicity being on
> a per-API call basis.


Hmm. Thats only true if 'transaction' is whole file scope, right? I mean
aren't you going to allow application to decide what 'granularity' a
transaction should be; a single dataset, a bunch of datasets in a group
in the file, etc.

If scope of 'transaction' is only a whole-file, then...

I may be misunderstanding your notions here but I don't think you'd want
to design this around the assumption that a 'transaction' could embody
something that included all buffer pointers passed into HDF5 by caller
and then HDF5 could automagically FINISH the transaction on behalf of
the application without returning control back to the application.

I think there are going to be too many situations where applications
unwind their own internal data structures placing data into temporary
buffers that are then handed off to HDF5 for I/O and freed. And, for a
given HDF5 file, this likely happens again and again as different parts
of the application's internal data is spit out to HDF5. But, not to
worry.

My idea included the notion the application would have to re-engage in
all such 'data prep for I/O' processes a second time. I assume time to
complete such process, relative to actual I/O time, is small enough that
it doesn't matter to the application that it has to do it twice. I think
for most applications, that would be true and relatively easy to
engineer to engage in the work in two passes.

Mark





-- 
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
[email protected]      urgent: [email protected]
T:8-6 (925)-423-5901    M/W/Th:7-12,2-7 (530)-753-8511


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] New "Chunking in HDF5" document

Reply via email to