Re: [Hdf-forum] New "Chunking in HDF5" document

Quincey Koziol Wed, 23 Feb 2011 14:14:22 -0800

Hi Mark,

On Feb 22, 2011, at 4:42 PM, Mark Miller wrote:


> On Tue, 2011-02-22 at 14:06, Quincey Koziol wrote:
> 
>> 
>>      Well, as I say above, with this approach, you push the space
>> allocation problem to the dataset creation step (which has it's own
>> set of problems),
> 
> Yeah, but those 'problems' aren't new to parallel I/O issues. Anyone
> that is currently doing concurrent parallel I/O with HDF5 has had to
> already deal with this part of the problem -- space allocation at
> dataset creation -- right? The point is the caller of HDF5 then knows
> how big it will be after its been compressed and HDF5 doesn't have to
> 'discover' that during H5Dwrite. Hmm puzzling...

        True, yes.

> I am recalling my suggestion of a '2-pass-planning' VFD where the caller
> executes slew of HDF5 operations on a file TWICE. The first pass, HDF5
> doesn't do any of the actual raw data I/O but just records all the
> information about it for a 'repeat performance' second pass. In the
> second pass, HDF5 knows everything about what is 'about to happen' and
> then can plan accordingly.

        Ah, yes, that may be a good segue into this two-pass feature.  I've 
been thinking about this feature and wondering about how to implement it.  
Something that occurs to me would be to construct it like a "transaction", 
where the application opens a transaction,  the HDF5 library just records those 
operations performed with API routines, then when the application closes the 
transaction, they are replayed twice: once to record the results of all the 
operations, and then a second pass that actually performs all the I/O.  That 
would help to reduce the overhead from the collective metadata modification 
overhead also.

> What about maybe doing that on a dataset-at-a-time basis? I mean, what
> if you set dxpl props to indicate either 'pass 1' or 'pass 2' of a
> 2-pass H5Dwrite operation. On pass 1, between H5Dopen and H5Dclose,
> H5Dwrites don't do any of the raw data I/O but do apply filters and
> compute sizes of things it will eventually write. On H5Dclose of pass 1,
> all the information of chunk sizes is recorded. Caller then does
> everything again, a second time but sets 'pass' to 'pass 2' in dxpl for
> H5Dwrite calls and everything 'works' because all processors know
> everything they need to know.

        Ah, I like this also!

>>  Maybe HDF5 could expose an API routine that the application could
>> call, to pre-compress the data by passing it through the I/O filters?
> 
> I think that could be useful in any case. Like its now possible to apply
> type conversion to a buffer of bytes, it probably ought to be possible
> to apply any 'filter' to a buffer of bytes. The second half of this
> though would involve smartening HDF5 then to 'pass-through' pre-filtered
> data so result is 'as if' HDF5 had done the filtering work itself during
> H5Dwrite. Not sure how easy that would be ;) But, you asked for
> comments/input.

        Yes, that's the direction I was thinking about going.

        I think the transaction idea I mentioned above might be the most 
general and have the highest payoff.  It could even be implemented with poor 
man's parallel I/O, when the transaction concluded.

        Quincey


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] New "Chunking in HDF5" document

Reply via email to