Re: [Hdf-forum] multi-pass IO (was chunking)

Quincey Koziol Mon, 07 Mar 2011 19:50:07 -0800

Hi John,

On Mar 7, 2011, at 8:47 AM, Biddiscombe, John A. wrote:


> Quincey, Mark
>  
>  
>>             Mark and I have kicked around the idea of creating a "virtual" 
>> dataset, which is composed of other datasets in the file, stitched together 
>> and presented as a single dataset to the application.  That way, 
>> applications could access the underlying piece (either directly, by reading 
>> from the underlying dataset; or through a selection of the virtual dataset) 
>> or, access the virtual dataset as if it was a single large dataset.  This 
>> would be a looser form of chunking, in an abstract sense.
>  
> Suppose I modify my H5MB utility to create one dataset per process – and 
> compress them individually, then write them out – but what I’d really like to 
> do is
> a)      ensure all blocks are the same size, do some padding if necessary
> b)      promote these blocks from datasets to chunks, so that the hdf library 
> was responsible for the virtual addressing and did all the real work at 
> retrieval time.
>  
> it seems like hdf already does everything we want if we had b) in place. once 
> the chunks are on disk and indexed correctly, a user selecting a slab will 
> trigger retrieval of the chunks and as long as the decompression filter is 
> available, handle that too. There’d be no need for a virtual dataset to map 
> access to the sub-datasets underneath.

        Hmm, so you'd have some new "bind" operation that took as input a bunch 
of datasets and bound them together as a new dataset?

> As you (Quincey) know, I already have some practice of messing about with the 
> hdf internals. If I wanted to do b), is it feasible that instead of each 
> process writing a dataset, I could get hold of the metadata directly and 
> manipulate it to write the pieces as chunks.
>  
> I can spend some time on this if I can get decent compression working on 
> parallel IO

        I think it could be done, but it's going to be a fairly intensive bit 
of coding...

                Quincey

> Regards
>  
> JB
> PS. Mark, I followed links to your silo/hdf5 wiki stuff. Interesting. it 
> looks like we’re both looking at very similar problems. I will have to play 
> with your PMPIO stuff too. – Are you also looking at other libraries like 
> ADIOS for example. (off topic, you can reply off list if you don’t feel it 
> appropriate for here).
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] multi-pass IO (was chunking)

Reply via email to