Hi Mark, All dataspaces are 1D. Currently, the datasets are contiguous. The size of each dataset is available before the writes occur.
There is a phase later where a large MPI communicator performs parallel reads of the data, which is why we are using the parallel version of the library. I think that the VFDs you are suggesting are only available in the serial library, but I could be mistaken. Thanks, Mark On Tue, May 11, 2010 at 4:33 PM, Mark Miller <[email protected]> wrote: > Hi Mark, > > Since you didn't explicitly describe the H5Dcreate/H5Dwrite calls, I'll > probably wind up asking some silly questions, but... > > How big are the dataspaces being written in H5Dwrite? > > Are the datasets being created with chunked or contiguous storage? > > Why are you even bothering with MPI-IO in this case? Since each > processor is writing to its own file, why not use sec2 vfd or maybe even > stdio vfd, or mpiposix? Or, you could try split vfd and use 'core' vfd > for metadata and either sec2, stdio or mpiposix vfd for raw. That > results in two actual 'files' on disk for every 'file' a task creates > but if this is for out-of-core, you'll soon be deleting them anyways. > Using the split vfd in this way means that all metadata will get held in > memory (in the core vfd) until file is closed and then it'll get written > in one large I/O request. Raw data gets handled as usual. > > Well, thats some options to try at least. > > Good luck. > > Mark > > What version of HDF5 is this? > On Tue, 2010-05-11 at 16:23 -0700, Mark Howison wrote: >> Hi, >> >> I'm helping a user at NERSC modify an out-of-core matrix calculation >> code to use HDF5 for temporary storage. Each of his 30 MPI tasks is >> writing to its own file using the MPI-IO VFD in independent mode with >> the MPI_COMM_SELF communicator. He is creating about 20,000 datasets >> and writing anywhere from 4KB to 32MB to each one. In IO profiles, we >> are seeing a huge spike in <1KB writes (about 100,000). My questions >> are: >> >> * Are these small writes we are seeing associated with dataset metadata? >> >> * Is there a "best practice" for handling this number of datasets? For >> instance, is it better to pre-allocate the datasets before writing to >> them? >> >> Thanks >> Mark >> >> _______________________________________________ >> Hdf-forum is for HDF software users discussion. >> [email protected] >> http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >> > -- > Mark C. Miller, Lawrence Livermore National Laboratory > ================!!LLNL BUSINESS ONLY!!================ > [email protected] urgent: [email protected] > T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511 > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
