Replying to multiple comments at once.

Quincey : "multiple processes may be writing into each chunk, which MPI-I/O can 
handle when the data is not compressed, but since compressed data is 
context-sensitive" 
My initial use case would be much simpler. A chunk would be aligned with the 
boundaries of the domain decomposition and each process would write one chunk - 
one at a time - A compression filter would be applied by the process owning the 
data and then it would be written to disk (much like Marks' suggestion).
a) lossless. Problem understood, chunks varying in size, nasty metadata 
synchronization, sparse files, issues.
b) lossy. Seems feasible. We were in fact considering a wavelet type 
compression as a first pass (pun intended). "It's great from the perspective 
that it completely eliminates the space allocation problem". Absolutely. All 
chunks are known to be of size X beforehand, so nothing changes except for the 
indexing and actual chunk storage/retrieval + de/compression.

I also like the idea of using a lossless compression and having the IO 
operation fail if the data doesn't fit. Would give the user the chance to try 
their best to compress with some knowledge of the data type and if it doesn't 
fit the allocated space, to abort.

Mark : Multi-pass VFD. I like this too. It potentially allows a very flexible 
approach where even if collective IO is writing to the same chunk, the 
collection/compression phase can do the sums and transmit the info into the 
hdf5 metadata layer. We'd certainly need to extend the chunking interface to 
handle variable seized chunks to allow for more/less compression in different 
areas of the data (actually this would be true for any option involving 
lossless compression). I think the chunk hashing relies on all chunks being the 
same size, so any change to that is going to be a huge compatibility breaker. 
Also, the chunking layer sits on top of the VFD, so I'm not sure if the VFD 
would be able to manipulate the chunks in the way desired. Perhaps I'm mstaked 
and the VFD does see the chunks. Correct me anyway.

Quincey : One idea I had and which I think Mark also expounded on is ... each 
process takes its own data and compresses it as it sees fit, then the processes 
do a synchronization step to tell each other how much (new compressed) data 
they have got - and then a dataset create is called - using the size of the 
compressed data. Now each process creates a hyperslab for its piece of 
compressed data and writes into the file using collective IO. We now add an 
array of extent information and compression algorithm info to the dataset as an 
attribute where each entry has a start and end index of the data for each 
process. 

Now the only trouble is that reading the data back requires a double step of 
reading the attributes and decompressing the desired piece- quite nasty when 
odd slices are being requested.

Now I start to think that Marks double VFD suggestion would do basically this 
(in one way or another), but maintaining the normal data layout rather than 
writing a special dataset representing the compressed data. 
step 1 : Data is collected into chunks (if already aligned with domain 
decomposition, no-op), chunks are compressed.
step 2 : Sizes of chunks are exchanged and space is allocated in the file for 
all the chunks.
step 3 : chunks of compressed data are written
not sure two passes are actually needed, as long as the 3 steps are followed.

...but variable chunk sizes are not allowed in hdf (true or false?) - this 
seems like a showstopper.
Aha. I understand. The actual written data can/could vary in size, as long as 
the chunk indices as referring to the original dataspace are regular. yes?

JB
Please forgive my thinking out aloud 





-----Original Message-----
From: [email protected] [mailto:[email protected]] On 
Behalf Of Mark Miller
Sent: 22 February 2011 23:43
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] New "Chunking in HDF5" document

On Tue, 2011-02-22 at 14:06, Quincey Koziol wrote:

> 
>       Well, as I say above, with this approach, you push the space
> allocation problem to the dataset creation step (which has it's own
> set of problems),

Yeah, but those 'problems' aren't new to parallel I/O issues. Anyone
that is currently doing concurrent parallel I/O with HDF5 has had to
already deal with this part of the problem -- space allocation at
dataset creation -- right? The point is the caller of HDF5 then knows
how big it will be after its been compressed and HDF5 doesn't have to
'discover' that during H5Dwrite. Hmm puzzling...

I am recalling my suggestion of a '2-pass-planning' VFD where the caller
executes slew of HDF5 operations on a file TWICE. The first pass, HDF5
doesn't do any of the actual raw data I/O but just records all the
information about it for a 'repeat performance' second pass. In the
second pass, HDF5 knows everything about what is 'about to happen' and
then can plan accordingly.

What about maybe doing that on a dataset-at-a-time basis? I mean, what
if you set dxpl props to indicate either 'pass 1' or 'pass 2' of a
2-pass H5Dwrite operation. On pass 1, between H5Dopen and H5Dclose,
H5Dwrites don't do any of the raw data I/O but do apply filters and
compute sizes of things it will eventually write. On H5Dclose of pass 1,
all the information of chunk sizes is recorded. Caller then does
everything again, a second time but sets 'pass' to 'pass 2' in dxpl for
H5Dwrite calls and everything 'works' because all processors know
everything they need to know.

>   Maybe HDF5 could expose an API routine that the application could
> call, to pre-compress the data by passing it through the I/O filters?

I think that could be useful in any case. Like its now possible to apply
type conversion to a buffer of bytes, it probably ought to be possible
to apply any 'filter' to a buffer of bytes. The second half of this
though would involve smartening HDF5 then to 'pass-through' pre-filtered
data so result is 'as if' HDF5 had done the filtering work itself during
H5Dwrite. Not sure how easy that would be ;) But, you asked for
comments/input.

> 
>       Quincey
> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
-- 
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
[email protected]      urgent: [email protected]
T:8-6 (925)-423-5901    M/W/Th:7-12,2-7 (530)-753-8511


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to