Hi John,
On Feb 23, 2011, at 6:54 AM, Biddiscombe, John A. wrote:
> Replying to multiple comments at once.
>
> Quincey : "multiple processes may be writing into each chunk, which MPI-I/O
> can handle when the data is not compressed, but since compressed data is
> context-sensitive"
> My initial use case would be much simpler. A chunk would be aligned with the
> boundaries of the domain decomposition and each process would write one chunk
> - one at a time - A compression filter would be applied by the process owning
> the data and then it would be written to disk (much like Marks' suggestion).
> a) lossless. Problem understood, chunks varying in size, nasty metadata
> synchronization, sparse files, issues.
> b) lossy. Seems feasible. We were in fact considering a wavelet type
> compression as a first pass (pun intended). "It's great from the perspective
> that it completely eliminates the space allocation problem". Absolutely. All
> chunks are known to be of size X beforehand, so nothing changes except for
> the indexing and actual chunk storage/retrieval + de/compression.
Yup. (Although it's not impossible for collective I/O)
> I also like the idea of using a lossless compression and having the IO
> operation fail if the data doesn't fit. Would give the user the chance to try
> their best to compress with some knowledge of the data type and if it doesn't
> fit the allocated space, to abort.
OK, at least one other person thinks this is reasonable. :-)
> Mark : Multi-pass VFD.
> I like this too. It potentially allows a very flexible approach where even if
> collective IO is writing to the same chunk, the collection/compression phase
> can do the sums and transmit the info into the hdf5 metadata layer. We'd
> certainly need to extend the chunking interface to handle variable sized
> chunks to allow for more/less compression in different areas of the data
> (actually this would be true for any option involving lossless compression).
> I think the chunk hashing relies on all chunks being the same size, so any
> change to that is going to be a huge compatibility breaker. Also, the
> chunking layer sits on top of the VFD, so I'm not sure if the VFD would be
> able to manipulate the chunks in the way desired. Perhaps I'm mistaken and
> the VFD does see the chunks. Correct me anyway.
If we go with the multi-pass/transaction idea, I don't think we need to
worry about the chunks being different sizes.
You are correct in that the VFD layer doesn't see the chunk
information. (And I think it would be bad to make it so :-)
> Quincey : One idea I had and which I think Mark also expounded on is ... each
> process takes its own data and compresses it as it sees fit, then the
> processes do a synchronization step to tell each other how much (new
> compressed) data they have got - and then a dataset create is called - using
> the size of the compressed data. Now each process creates a hyperslab for its
> piece of compressed data and writes into the file using collective IO. We now
> add an array of extent information and compression algorithm info to the
> dataset as an attribute where each entry has a start and end index of the
> data for each process.
>
> Now the only trouble is that reading the data back requires a double step of
> reading the attributes and decompressing the desired piece- quite nasty when
> odd slices are being requested.
Maybe. (Icky if so)
> Now I start to think that Marks double VFD suggestion would do basically this
> (in one way or another), but maintaining the normal data layout rather than
> writing a special dataset representing the compressed data.
> step 1 : Data is collected into chunks (if already aligned with domain
> decomposition, no-op), chunks are compressed.
> step 2 : Sizes of chunks are exchanged and space is allocated in the file for
> all the chunks.
> step 3 : chunks of compressed data are written
> not sure two passes are actually needed, as long as the 3 steps are followed.
>
> ...but variable chunk sizes are not allowed in hdf (true or false?) - this
> seems like a showstopper.
> Aha. I understand. The actual written data can/could vary in size, as long as
> the chunk indices as referring to the original dataspace are regular. yes?
Yes.
> JB
> Please forgive my thinking out aloud
Not a problem - please continue to participate!
Quincey
>
>
>
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> On Behalf Of Mark Miller
> Sent: 22 February 2011 23:43
> To: HDF Users Discussion List
> Subject: Re: [Hdf-forum] New "Chunking in HDF5" document
>
> On Tue, 2011-02-22 at 14:06, Quincey Koziol wrote:
>
>>
>> Well, as I say above, with this approach, you push the space
>> allocation problem to the dataset creation step (which has it's own
>> set of problems),
>
> Yeah, but those 'problems' aren't new to parallel I/O issues. Anyone
> that is currently doing concurrent parallel I/O with HDF5 has had to
> already deal with this part of the problem -- space allocation at
> dataset creation -- right? The point is the caller of HDF5 then knows
> how big it will be after its been compressed and HDF5 doesn't have to
> 'discover' that during H5Dwrite. Hmm puzzling...
>
> I am recalling my suggestion of a '2-pass-planning' VFD where the caller
> executes slew of HDF5 operations on a file TWICE. The first pass, HDF5
> doesn't do any of the actual raw data I/O but just records all the
> information about it for a 'repeat performance' second pass. In the
> second pass, HDF5 knows everything about what is 'about to happen' and
> then can plan accordingly.
>
> What about maybe doing that on a dataset-at-a-time basis? I mean, what
> if you set dxpl props to indicate either 'pass 1' or 'pass 2' of a
> 2-pass H5Dwrite operation. On pass 1, between H5Dopen and H5Dclose,
> H5Dwrites don't do any of the raw data I/O but do apply filters and
> compute sizes of things it will eventually write. On H5Dclose of pass 1,
> all the information of chunk sizes is recorded. Caller then does
> everything again, a second time but sets 'pass' to 'pass 2' in dxpl for
> H5Dwrite calls and everything 'works' because all processors know
> everything they need to know.
>
>> Maybe HDF5 could expose an API routine that the application could
>> call, to pre-compress the data by passing it through the I/O filters?
>
> I think that could be useful in any case. Like its now possible to apply
> type conversion to a buffer of bytes, it probably ought to be possible
> to apply any 'filter' to a buffer of bytes. The second half of this
> though would involve smartening HDF5 then to 'pass-through' pre-filtered
> data so result is 'as if' HDF5 had done the filtering work itself during
> H5Dwrite. Not sure how easy that would be ;) But, you asked for
> comments/input.
>
>>
>> Quincey
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> --
> Mark C. Miller, Lawrence Livermore National Laboratory
> ================!!LLNL BUSINESS ONLY!!================
> [email protected] urgent: [email protected]
> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org