Re: [Hdf-forum] New "Chunking in HDF5" document

Quincey Koziol Wed, 23 Feb 2011 14:44:03 -0800

Hi John,

On Feb 23, 2011, at 6:54 AM, Biddiscombe, John A. wrote:


> Replying to multiple comments at once.
> 
> Quincey : "multiple processes may be writing into each chunk, which MPI-I/O 
> can handle when the data is not compressed, but since compressed data is 
> context-sensitive" 
> My initial use case would be much simpler. A chunk would be aligned with the 
> boundaries of the domain decomposition and each process would write one chunk 
> - one at a time - A compression filter would be applied by the process owning 
> the data and then it would be written to disk (much like Marks' suggestion).
> a) lossless. Problem understood, chunks varying in size, nasty metadata 
> synchronization, sparse files, issues.
> b) lossy. Seems feasible. We were in fact considering a wavelet type 
> compression as a first pass (pun intended). "It's great from the perspective 
> that it completely eliminates the space allocation problem". Absolutely. All 
> chunks are known to be of size X beforehand, so nothing changes except for 
> the indexing and actual chunk storage/retrieval + de/compression.

        Yup.  (Although it's not impossible for collective I/O)

> I also like the idea of using a lossless compression and having the IO 
> operation fail if the data doesn't fit. Would give the user the chance to try 
> their best to compress with some knowledge of the data type and if it doesn't 
> fit the allocated space, to abort.

        OK, at least one other person thinks this is reasonable. :-)

> Mark : Multi-pass VFD.
> I like this too. It potentially allows a very flexible approach where even if 
> collective IO is writing to the same chunk, the collection/compression phase 
> can do the sums and transmit the info into the hdf5 metadata layer. We'd 
> certainly need to extend the chunking interface to handle variable sized 
> chunks to allow for more/less compression in different areas of the data 
> (actually this would be true for any option involving lossless compression). 
> I think the chunk hashing relies on all chunks being the same size, so any 
> change to that is going to be a huge compatibility breaker. Also, the 
> chunking layer sits on top of the VFD, so I'm not sure if the VFD would be 
> able to manipulate the chunks in the way desired. Perhaps I'm mistaken and 
> the VFD does see the chunks. Correct me anyway.

        If we go with the multi-pass/transaction idea, I don't think we need to 
worry about the chunks being different sizes.

        You are correct in that the VFD layer doesn't see the chunk 
information.  (And I think it would be bad to make it so :-)

> Quincey : One idea I had and which I think Mark also expounded on is ... each 
> process takes its own data and compresses it as it sees fit, then the 
> processes do a synchronization step to tell each other how much (new 
> compressed) data they have got - and then a dataset create is called - using 
> the size of the compressed data. Now each process creates a hyperslab for its 
> piece of compressed data and writes into the file using collective IO. We now 
> add an array of extent information and compression algorithm info to the 
> dataset as an attribute where each entry has a start and end index of the 
> data for each process. 
> 
> Now the only trouble is that reading the data back requires a double step of 
> reading the attributes and decompressing the desired piece- quite nasty when 
> odd slices are being requested.

        Maybe.  (Icky if so)

> Now I start to think that Marks double VFD suggestion would do basically this 
> (in one way or another), but maintaining the normal data layout rather than 
> writing a special dataset representing the compressed data. 
> step 1 : Data is collected into chunks (if already aligned with domain 
> decomposition, no-op), chunks are compressed.
> step 2 : Sizes of chunks are exchanged and space is allocated in the file for 
> all the chunks.
> step 3 : chunks of compressed data are written
> not sure two passes are actually needed, as long as the 3 steps are followed.
> 
> ...but variable chunk sizes are not allowed in hdf (true or false?) - this 
> seems like a showstopper.
> Aha. I understand. The actual written data can/could vary in size, as long as 
> the chunk indices as referring to the original dataspace are regular. yes?

        Yes.

> JB
> Please forgive my thinking out aloud

        Not a problem - please continue to participate!

                Quincey

> 
> 
> 
> 
> 
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] 
> On Behalf Of Mark Miller
> Sent: 22 February 2011 23:43
> To: HDF Users Discussion List
> Subject: Re: [Hdf-forum] New "Chunking in HDF5" document
> 
> On Tue, 2011-02-22 at 14:06, Quincey Koziol wrote:
> 
>> 
>>      Well, as I say above, with this approach, you push the space
>> allocation problem to the dataset creation step (which has it's own
>> set of problems),
> 
> Yeah, but those 'problems' aren't new to parallel I/O issues. Anyone
> that is currently doing concurrent parallel I/O with HDF5 has had to
> already deal with this part of the problem -- space allocation at
> dataset creation -- right? The point is the caller of HDF5 then knows
> how big it will be after its been compressed and HDF5 doesn't have to
> 'discover' that during H5Dwrite. Hmm puzzling...
> 
> I am recalling my suggestion of a '2-pass-planning' VFD where the caller
> executes slew of HDF5 operations on a file TWICE. The first pass, HDF5
> doesn't do any of the actual raw data I/O but just records all the
> information about it for a 'repeat performance' second pass. In the
> second pass, HDF5 knows everything about what is 'about to happen' and
> then can plan accordingly.
> 
> What about maybe doing that on a dataset-at-a-time basis? I mean, what
> if you set dxpl props to indicate either 'pass 1' or 'pass 2' of a
> 2-pass H5Dwrite operation. On pass 1, between H5Dopen and H5Dclose,
> H5Dwrites don't do any of the raw data I/O but do apply filters and
> compute sizes of things it will eventually write. On H5Dclose of pass 1,
> all the information of chunk sizes is recorded. Caller then does
> everything again, a second time but sets 'pass' to 'pass 2' in dxpl for
> H5Dwrite calls and everything 'works' because all processors know
> everything they need to know.
> 
>>  Maybe HDF5 could expose an API routine that the application could
>> call, to pre-compress the data by passing it through the I/O filters?
> 
> I think that could be useful in any case. Like its now possible to apply
> type conversion to a buffer of bytes, it probably ought to be possible
> to apply any 'filter' to a buffer of bytes. The second half of this
> though would involve smartening HDF5 then to 'pass-through' pre-filtered
> data so result is 'as if' HDF5 had done the filtering work itself during
> H5Dwrite. Not sure how easy that would be ;) But, you asked for
> comments/input.
> 
>> 
>>      Quincey
>> 
>> 
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> -- 
> Mark C. Miller, Lawrence Livermore National Laboratory
> ================!!LLNL BUSINESS ONLY!!================
> [email protected]      urgent: [email protected]
> T:8-6 (925)-423-5901    M/W/Th:7-12,2-7 (530)-753-8511
> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] New "Chunking in HDF5" document

Reply via email to