Re: [Hdf-forum] multi-pass IO (was chunking)

Biddiscombe, John A. Mon, 07 Mar 2011 06:48:43 -0800

Quincey, Mark


            Mark and I have kicked around the idea of creating a "virtual" 
dataset, which is composed of other datasets in the file, stitched together and 
presented as a single dataset to the application.  That way, applications could 
access the underlying piece (either directly, by reading from the underlying 
dataset; or through a selection of the virtual dataset) or, access the virtual 
dataset as if it was a single large dataset.  This would be a looser form of 
chunking, in an abstract sense.

Suppose I modify my H5MB utility to create one dataset per process - and 
compress them individually, then write them out - but what I'd really like to 
do is

a)      ensure all blocks are the same size, do some padding if necessary

b)      promote these blocks from datasets to chunks, so that the hdf library 
was responsible for the virtual addressing and did all the real work at 
retrieval time.

it seems like hdf already does everything we want if we had b) in place. once 
the chunks are on disk and indexed correctly, a user selecting a slab will 
trigger retrieval of the chunks and as long as the decompression filter is 
available, handle that too. There'd be no need for a virtual dataset to map 
access to the sub-datasets underneath.

As you (Quincey) know, I already have some practice of messing about with the 
hdf internals. If I wanted to do b), is it feasible that instead of each 
process writing a dataset, I could get hold of the metadata directly and 
manipulate it to write the pieces as chunks.

I can spend some time on this if I can get decent compression working on 
parallel IO

Regards

JB
PS. Mark, I followed links to your silo/hdf5 wiki stuff. Interesting. it looks 
like we're both looking at very similar problems. I will have to play with your 
PMPIO stuff too. - Are you also looking at other libraries like ADIOS for 
example. (off topic, you can reply off list if you don't feel it appropriate 
for here).

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] multi-pass IO (was chunking)

Reply via email to