Hi Mark & Mark, :-)
On May 12, 2010, at 2:13 PM, Mark Miller wrote:
> Hi Mark,
>
>
> On Wed, 2010-05-12 at 12:01, Mark Howison wrote:
>> Hi Mark,
>>
>> All dataspaces are 1D. Currently, the datasets are contiguous. The
>> size of each dataset is available before the writes occur.
>>
>> There is a phase later where a large MPI communicator performs
>> parallel reads of the data, which is why we are using the parallel
>> version of the library. I think that the VFDs you are suggesting are
>> only available in the serial library, but I could be mistaken.
>
> Well, for any given libhdf5.a, the other vfds are generally always
> available. I think direct and mpi-related vfds are the only ones which
> might not be available depending on how HDF5 was configured prior to
> installation. So, if they are suitable for your needs, you should be
> able to use those other vfds, even from a parallel application.
Yes, parallel HDF5 is a superset of serial HDF5 and all the VFDs are
available.
Is each individual file created in the first phase accessed in parallel
later? If so, it might be reasonable to use the core VFD for creating the
files, then close all the files and re-open them with the MPI-IO VFD.
Quincey
> Mark
>
>
>>
>> Thanks,
>> Mark
>>
>> On Tue, May 11, 2010 at 4:33 PM, Mark Miller <[email protected]> wrote:
>>> Hi Mark,
>>>
>>> Since you didn't explicitly describe the H5Dcreate/H5Dwrite calls, I'll
>>> probably wind up asking some silly questions, but...
>>>
>>> How big are the dataspaces being written in H5Dwrite?
>>>
>>> Are the datasets being created with chunked or contiguous storage?
>>>
>>> Why are you even bothering with MPI-IO in this case? Since each
>>> processor is writing to its own file, why not use sec2 vfd or maybe even
>>> stdio vfd, or mpiposix? Or, you could try split vfd and use 'core' vfd
>>> for metadata and either sec2, stdio or mpiposix vfd for raw. That
>>> results in two actual 'files' on disk for every 'file' a task creates
>>> but if this is for out-of-core, you'll soon be deleting them anyways.
>>> Using the split vfd in this way means that all metadata will get held in
>>> memory (in the core vfd) until file is closed and then it'll get written
>>> in one large I/O request. Raw data gets handled as usual.
>>>
>>> Well, thats some options to try at least.
>>>
>>> Good luck.
>>>
>>> Mark
>>>
>>> What version of HDF5 is this?
>>> On Tue, 2010-05-11 at 16:23 -0700, Mark Howison wrote:
>>>> Hi,
>>>>
>>>> I'm helping a user at NERSC modify an out-of-core matrix calculation
>>>> code to use HDF5 for temporary storage. Each of his 30 MPI tasks is
>>>> writing to its own file using the MPI-IO VFD in independent mode with
>>>> the MPI_COMM_SELF communicator. He is creating about 20,000 datasets
>>>> and writing anywhere from 4KB to 32MB to each one. In IO profiles, we
>>>> are seeing a huge spike in <1KB writes (about 100,000). My questions
>>>> are:
>>>>
>>>> * Are these small writes we are seeing associated with dataset metadata?
>>>>
>>>> * Is there a "best practice" for handling this number of datasets? For
>>>> instance, is it better to pre-allocate the datasets before writing to
>>>> them?
>>>>
>>>> Thanks
>>>> Mark
>>>>
>>>> _______________________________________________
>>>> Hdf-forum is for HDF software users discussion.
>>>> [email protected]
>>>> http://**mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>>
>>> --
>>> Mark C. Miller, Lawrence Livermore National Laboratory
>>> ================!!LLNL BUSINESS ONLY!!================
>>> [email protected] urgent: [email protected]
>>> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511
>>>
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> [email protected]
>>> http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> --
> Mark C. Miller, Lawrence Livermore National Laboratory
> ================!!LLNL BUSINESS ONLY!!================
> [email protected] urgent: [email protected]
> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org