Re: [Hdf-forum] multi-pass IO (was chunking)

Quincey Koziol Tue, 01 Mar 2011 07:44:42 -0800

Hi John,

On Feb 25, 2011, at 4:11 AM, Biddiscombe, John A. wrote:


> Matthieu
>  
> > what would be the best way of writing one file from all the processes 
> > together (in terms of write latency), knowing the data layout (regular 2D 
> > arrays),
>  
> if all processes are writing one piece of a single dataset (assuming I 
> understood your question correctly), then the usual collective create of the 
> dataset, followed by a hyperslab selection on each process and a write of 
> individual pieces.
> > write by hyperslabs/chunks/patterns
> is I think what you want.
>  
> Writing one dataset per process was something I wanted – an example of why 
> might be most illustrative...
>  
> Suppose I’m working in paraview and have done some work in parallel on 
> multi-block data, each process has a different block from the multi-block 
> structure. They might be geometrically diverse (eg. tetrahedral on one 
> process, prisms on another). I want to write out my current state, but don’t 
> want to do a collective write to one dataset. I really want to write each 
> block out independently, but all to the same file.
> Because each process has no idea what the others have got, I needed a way to 
> gather the info and creat the ‘structure’ then write.
>  
> In the general case it’ll be slower (physically more writes to disk), but for 
> the purposes of organisation, much tidier.

        Mark and I have kicked around the idea of creating a "virtual" dataset, 
which is composed of other datasets in the file, stitched together and 
presented as a single dataset to the application.  That way, applications could 
access the underlying piece (either directly, by reading from the underlying 
dataset; or through a selection of the virtual dataset) or, access the virtual 
dataset as if it was a single large dataset.  This would be a looser form of 
chunking, in an abstract sense.

        Quincey

> JB
>  
>  
> From: [email protected] [mailto:[email protected]] 
> On Behalf Of Matthieu Dorier
> Sent: 25 February 2011 10:10
> To: HDF Users Discussion List
> Subject: Re: [Hdf-forum] multi-pass IO (was chunking)
>  
> Hello John (and others, since maybe other people can answer the following 
> questions)
> 
> Your library seems very interesting and I will probably use it in my project. 
> Yet I have a question: what would be the best way of writing one file from 
> all the processes together (in terms of write latency), knowing the data 
> layout (regular 2D arrays), 
> - using the classic PHDF5 library and write by hyperslabs/chunks/patterns 
> - or using your library to split a dataset into "/procNNN/dataset"? It seems 
> to me that writing regular patterns can benefit from MPI-IO's particular 
> optimizations, but maybe I misunderstood the goal of your library?
> 
> Thank you,
> 
> Matthieu
> 
> 2011/2/25 Mark Miller <[email protected]>
> John,
> 
> This is awesome! Thanks so much for putting it up.
> 
> I really wish the HDF5 Group had decided a long while ago to make this
> kind of thing available UNDER the HDF5 API via...
>    a) adding either a H5Xcreate_deferred for an part, X, of the API or
>       adding a property to X's create property list to indicate a
>       desire for deferred creation
>       Any object so created cannot be acted upon until subsequent
>       H5Xsync_deferred()...
>    b) H5Xsync_deferred() function to synchronize all deferred created
>       objects.
> But, in spite of numerous suggestions over many years that it'd be good
> for parallel applications to be able to do this, it still hasn't found
> its way into the HDF5 library proper ;)
> 
> Its so nice to see someone offer a suitable alternative ;)
> 
> Mark
> 
> 
> 
> On Thu, 2011-02-24 at 14:39, Biddiscombe, John A. wrote:
> > The discussion about chunking and two pass VFDs reminded me that I intended 
> > to make a small library for doing independent dataset creates, on a per 
> > process basis, available. It was created some time ago and used extensively 
> > on one project, but currently not in use.
> >
> > I've tidied the code up a bit and uploaded it to the following page
> > https://hpcforge.org/plugins/mediawiki/wiki/libh5mb/index.php/Main_Page
> > the source code is available via the SCM link.
> >
> > Some brief notes on the library are shown on the wiki page, but the actual 
> > API is probably best described in the H5MButil.h file. I created the wiki 
> > page very quickly so apologies if the content is unclear, please let me 
> > know if it needs improvement.
> >
> > Hopefully someone will find the code useful.
> >
> > JB
> --
> Mark C. Miller, Lawrence Livermore National Laboratory
> ================!!LLNL BUSINESS ONLY!!================
> [email protected]      urgent: [email protected]
> T:8-6 (925)-423-5901    M/W/Th:7-12,2-7 (530)-753-8511
> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> 
> 
> 
> -- 
> Matthieu Dorier
> ENS Cachan, antenne de Bretagne
> Département informatique et télécommunication
> http://perso.eleves.bretagne.ens-cachan.fr/~mdori307/wiki/
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] multi-pass IO (was chunking)

Reply via email to