Hi Mark,
On Feb 22, 2011, at 3:01 PM, Mark Miller wrote:
> On Tue, 2011-02-22 at 12:29, Quincey Koziol wrote:
>
>> The problem with the collective I/O [write] operations is that
>> multiple processes may be writing into each chunk, which MPI-I/O can
>> handle when the data is not compressed, but since compressed data is
>> context-sensitive, straightforward collective I/O won't work for
>> compressed chunks. Perhaps a two-phase approach where the data for
>> each chunk was shipped to a single process, which updated the data in
>> the chunk and compressed it, followed by 1+ passes of collective
>> writes of compressed chunks.
>>
>> The problem with independent I/O [write] operations is that
>> compressed chunks [almost always] change size when the data in the
>> chunk is written (either initially, or when the data is overwritten),
>> and since all the processes aren't available, communicating the space
>> allocation is a problem. Each process needs to allocate space in the
>> file, but since the other processes aren't "listening", it can't let
>> them know that some space in the file has been used. A possible
>> solution to this might involve just appending data to the end of the
>> file, but that's prone to race conditions between processes (although
>> maybe the "shared file pointer" I/O mode in MPI-I/O would help this).
>> Also, if each process moves a chunk around in the file (because it
>> resized it), how will other processes learn where that chunk is, if
>> they need to read from it?
>
> Something that puzzles me here is that if my parallel app. applied szip,
> gzip or whatever compression to my data on each processor BEFORE ever
> passing it to HDF5, I can then successfully engage in write operations
> to HDF5 treating the data as an opaque array of bytes of known size
> using collective or independent parallel I/O just as any other
> 'ordinary' HDF5 dataset (using either chunked or contig layouts).
Yes, but this would have to be to different datasets, with the
appropriate overhead. (Along with the lack of self-description that you
mention below) You are just pushing the space allocation problem to the
dataset creation step, in this case. Also, this approach would only work for
independent I/O, and possibly only for a subset of those...
> The problem, of course, is that the HDF5 library would not be 'aware' of
> the data's true nature (either its original pre-compressed type or the
> fact that it had been compressed and by which algorithm(s) etc.).
> Subsequent readers would have to 'know' what to do with it, etc.
>
> So, why can't we fix the second half of this problem and invent a way to
> hand HDF5 'pre-filtered' data, and bypass any subsequent attempts in
> HDF5 to filter it (or chunks thereof) on write. On the read end, enough
> information would be available to the library to 'do the right' thing.
>
> I guess another way of saying this is that HDF5's chunking is specified
> in terms of the dataset's 'native shape'. For compressed data, why not
> 'turn that around' and handle chunking as buckets of a fixed number of
> compressed bytes of the dataset (where number of bytes is chosen to
> equate to #bytes of a chunk as specified in the dataset's 'native'
> layout) but when uncompressed yields a variable sized 'chunk' in the
> native layout?
Well, as I say above, with this approach, you push the space allocation
problem to the dataset creation step (which has it's own set of problems), and
then performing the I/O would be equivalent to the idea of using a lossless
compressor with a [pre-computed by compressing the data] fixed upper limit on
the size of the data. I'm concerned about having the application perform the
compression directly... Maybe HDF5 could expose an API routine that the
application could call, to pre-compress the data by passing it through the I/O
filters?
Quincey
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org