Hi John,
On Feb 25, 2011, at 4:11 AM, Biddiscombe, John A. wrote:
> Matthieu
>
> > what would be the best way of writing one file from all the processes
> > together (in terms of write latency), knowing the data layout (regular 2D
> > arrays),
>
> if all processes are writing one piece of a single dataset (assuming I
> understood your question correctly), then the usual collective create of the
> dataset, followed by a hyperslab selection on each process and a write of
> individual pieces.
> > write by hyperslabs/chunks/patterns
> is I think what you want.
>
> Writing one dataset per process was something I wanted – an example of why
> might be most illustrative...
>
> Suppose I’m working in paraview and have done some work in parallel on
> multi-block data, each process has a different block from the multi-block
> structure. They might be geometrically diverse (eg. tetrahedral on one
> process, prisms on another). I want to write out my current state, but don’t
> want to do a collective write to one dataset. I really want to write each
> block out independently, but all to the same file.
> Because each process has no idea what the others have got, I needed a way to
> gather the info and creat the ‘structure’ then write.
>
> In the general case it’ll be slower (physically more writes to disk), but for
> the purposes of organisation, much tidier.
Mark and I have kicked around the idea of creating a "virtual" dataset,
which is composed of other datasets in the file, stitched together and
presented as a single dataset to the application. That way, applications could
access the underlying piece (either directly, by reading from the underlying
dataset; or through a selection of the virtual dataset) or, access the virtual
dataset as if it was a single large dataset. This would be a looser form of
chunking, in an abstract sense.
Quincey
> JB
>
>
> From: [email protected] [mailto:[email protected]]
> On Behalf Of Matthieu Dorier
> Sent: 25 February 2011 10:10
> To: HDF Users Discussion List
> Subject: Re: [Hdf-forum] multi-pass IO (was chunking)
>
> Hello John (and others, since maybe other people can answer the following
> questions)
>
> Your library seems very interesting and I will probably use it in my project.
> Yet I have a question: what would be the best way of writing one file from
> all the processes together (in terms of write latency), knowing the data
> layout (regular 2D arrays),
> - using the classic PHDF5 library and write by hyperslabs/chunks/patterns
> - or using your library to split a dataset into "/procNNN/dataset"? It seems
> to me that writing regular patterns can benefit from MPI-IO's particular
> optimizations, but maybe I misunderstood the goal of your library?
>
> Thank you,
>
> Matthieu
>
> 2011/2/25 Mark Miller <[email protected]>
> John,
>
> This is awesome! Thanks so much for putting it up.
>
> I really wish the HDF5 Group had decided a long while ago to make this
> kind of thing available UNDER the HDF5 API via...
> a) adding either a H5Xcreate_deferred for an part, X, of the API or
> adding a property to X's create property list to indicate a
> desire for deferred creation
> Any object so created cannot be acted upon until subsequent
> H5Xsync_deferred()...
> b) H5Xsync_deferred() function to synchronize all deferred created
> objects.
> But, in spite of numerous suggestions over many years that it'd be good
> for parallel applications to be able to do this, it still hasn't found
> its way into the HDF5 library proper ;)
>
> Its so nice to see someone offer a suitable alternative ;)
>
> Mark
>
>
>
> On Thu, 2011-02-24 at 14:39, Biddiscombe, John A. wrote:
> > The discussion about chunking and two pass VFDs reminded me that I intended
> > to make a small library for doing independent dataset creates, on a per
> > process basis, available. It was created some time ago and used extensively
> > on one project, but currently not in use.
> >
> > I've tidied the code up a bit and uploaded it to the following page
> > https://hpcforge.org/plugins/mediawiki/wiki/libh5mb/index.php/Main_Page
> > the source code is available via the SCM link.
> >
> > Some brief notes on the library are shown on the wiki page, but the actual
> > API is probably best described in the H5MButil.h file. I created the wiki
> > page very quickly so apologies if the content is unclear, please let me
> > know if it needs improvement.
> >
> > Hopefully someone will find the code useful.
> >
> > JB
> --
> Mark C. Miller, Lawrence Livermore National Laboratory
> ================!!LLNL BUSINESS ONLY!!================
> [email protected] urgent: [email protected]
> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
>
> --
> Matthieu Dorier
> ENS Cachan, antenne de Bretagne
> Département informatique et télécommunication
> http://perso.eleves.bretagne.ens-cachan.fr/~mdori307/wiki/
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org