Re: [Hdf-forum] Efficiently creating and writing to 20, 000 datasets

Mark Howison Wed, 12 May 2010 12:07:33 -0700

Hi Mark,

All dataspaces are 1D. Currently, the datasets are contiguous. The
size of each dataset is available before the writes occur.


There is a phase later where a large MPI communicator performs
parallel reads of the data, which is why we are using the parallel
version of the library. I think that the VFDs you are suggesting are
only available in the serial library, but I could be mistaken.

Thanks,
Mark

On Tue, May 11, 2010 at 4:33 PM, Mark Miller <[email protected]> wrote:
> Hi Mark,
>
> Since you didn't explicitly describe the H5Dcreate/H5Dwrite calls, I'll
> probably wind up asking some silly questions, but...
>
> How big are the dataspaces being written in H5Dwrite?
>
> Are the datasets being created with chunked or contiguous storage?
>
> Why are you even bothering with MPI-IO in this case? Since each
> processor is writing to its own file, why not use sec2 vfd or maybe even
> stdio vfd, or mpiposix? Or, you could try split vfd and use 'core' vfd
> for metadata and either sec2, stdio or mpiposix vfd for raw. That
> results in two actual 'files' on disk for every 'file' a task creates
> but if this is for out-of-core, you'll soon be deleting them anyways.
> Using the split vfd in this way means that all metadata will get held in
> memory (in the core vfd) until file is closed and then it'll get written
> in one large I/O request. Raw data gets handled as usual.
>
> Well, thats some options to try at least.
>
> Good luck.
>
> Mark
>
> What version of HDF5 is this?
> On Tue, 2010-05-11 at 16:23 -0700, Mark Howison wrote:
>> Hi,
>>
>> I'm helping a user at NERSC modify an out-of-core matrix calculation
>> code to use HDF5 for temporary storage. Each of his 30 MPI tasks is
>> writing to its own file using the MPI-IO VFD in independent mode with
>> the MPI_COMM_SELF communicator. He is creating about 20,000 datasets
>> and writing anywhere from 4KB to 32MB to each one. In IO profiles, we
>> are seeing a huge spike in <1KB writes (about 100,000). My questions
>> are:
>>
>> * Are these small writes we are seeing associated with dataset metadata?
>>
>> * Is there a "best practice" for handling this number of datasets? For
>> instance, is it better to pre-allocate the datasets before writing to
>> them?
>>
>> Thanks
>> Mark
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
> --
> Mark C. Miller, Lawrence Livermore National Laboratory
> ================!!LLNL BUSINESS ONLY!!================
> [email protected]      urgent: [email protected]
> T:8-6 (925)-423-5901    M/W/Th:7-12,2-7 (530)-753-8511
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] Efficiently creating and writing to 20, 000 datasets

Reply via email to