Hi Xunlei,
On May 13, 2010, at 8:43 AM, Dr. X wrote:
> Hi Quincey,
> My understanding on parallel HDF5 is that it depends on the availability of
> parallel file system, i.e. GPFS. For instance, I am out of luck whether I am
> using Windows XP/7 or Windows server (2008), right?
Yes - we don't support the parallel I/O VFDs (MPI-IO and MPI-POSIX) on
Windows currently.
> As for Linux (kernel > 2.4), according to
> ftp://ftp.hdfgroup.org/HDF5/current/src/unpacked/release_docs/INSTALL_parallel
> even on a multi-core laptop, I should be able to access PHDF5 functionalities.
> Is this correct? Thanks a lot.
Yes, I test parallel I/O on my MacBookPro all the time. :-)
Quincey
> Best,
> xunlei
>
>
> On 5/13/2010 8:08 AM, Quincey Koziol wrote:
>> Hi Mark& Mark, :-)
>>
>> On May 12, 2010, at 2:13 PM, Mark Miller wrote:
>>
>>
>>> Hi Mark,
>>>
>>>
>>> On Wed, 2010-05-12 at 12:01, Mark Howison wrote:
>>>
>>>> Hi Mark,
>>>>
>>>> All dataspaces are 1D. Currently, the datasets are contiguous. The
>>>> size of each dataset is available before the writes occur.
>>>>
>>>> There is a phase later where a large MPI communicator performs
>>>> parallel reads of the data, which is why we are using the parallel
>>>> version of the library. I think that the VFDs you are suggesting are
>>>> only available in the serial library, but I could be mistaken.
>>>>
>>> Well, for any given libhdf5.a, the other vfds are generally always
>>> available. I think direct and mpi-related vfds are the only ones which
>>> might not be available depending on how HDF5 was configured prior to
>>> installation. So, if they are suitable for your needs, you should be
>>> able to use those other vfds, even from a parallel application.
>>>
>> Yes, parallel HDF5 is a superset of serial HDF5 and all the VFDs are
>> available.
>>
>> Is each individual file created in the first phase accessed in parallel
>> later? If so, it might be reasonable to use the core VFD for creating the
>> files, then close all the files and re-open them with the MPI-IO VFD.
>>
>> Quincey
>>
>>
>>> Mark
>>>
>>>
>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>> On Tue, May 11, 2010 at 4:33 PM, Mark Miller<[email protected]> wrote:
>>>>
>>>>> Hi Mark,
>>>>>
>>>>> Since you didn't explicitly describe the H5Dcreate/H5Dwrite calls, I'll
>>>>> probably wind up asking some silly questions, but...
>>>>>
>>>>> How big are the dataspaces being written in H5Dwrite?
>>>>>
>>>>> Are the datasets being created with chunked or contiguous storage?
>>>>>
>>>>> Why are you even bothering with MPI-IO in this case? Since each
>>>>> processor is writing to its own file, why not use sec2 vfd or maybe even
>>>>> stdio vfd, or mpiposix? Or, you could try split vfd and use 'core' vfd
>>>>> for metadata and either sec2, stdio or mpiposix vfd for raw. That
>>>>> results in two actual 'files' on disk for every 'file' a task creates
>>>>> but if this is for out-of-core, you'll soon be deleting them anyways.
>>>>> Using the split vfd in this way means that all metadata will get held in
>>>>> memory (in the core vfd) until file is closed and then it'll get written
>>>>> in one large I/O request. Raw data gets handled as usual.
>>>>>
>>>>> Well, thats some options to try at least.
>>>>>
>>>>> Good luck.
>>>>>
>>>>> Mark
>>>>>
>>>>> What version of HDF5 is this?
>>>>> On Tue, 2010-05-11 at 16:23 -0700, Mark Howison wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm helping a user at NERSC modify an out-of-core matrix calculation
>>>>>> code to use HDF5 for temporary storage. Each of his 30 MPI tasks is
>>>>>> writing to its own file using the MPI-IO VFD in independent mode with
>>>>>> the MPI_COMM_SELF communicator. He is creating about 20,000 datasets
>>>>>> and writing anywhere from 4KB to 32MB to each one. In IO profiles, we
>>>>>> are seeing a huge spike in<1KB writes (about 100,000). My questions
>>>>>> are:
>>>>>>
>>>>>> * Are these small writes we are seeing associated with dataset metadata?
>>>>>>
>>>>>> * Is there a "best practice" for handling this number of datasets? For
>>>>>> instance, is it better to pre-allocate the datasets before writing to
>>>>>> them?
>>>>>>
>>>>>> Thanks
>>>>>> Mark
>>>>>>
>>>>>> _______________________________________________
>>>>>> Hdf-forum is for HDF software users discussion.
>>>>>> [email protected]
>>>>>> http://**mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>>>>
>>>>>>
>>>>> --
>>>>> Mark C. Miller, Lawrence Livermore National Laboratory
>>>>> ================!!LLNL BUSINESS ONLY!!================
>>>>> [email protected] urgent: [email protected]
>>>>> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Hdf-forum is for HDF software users discussion.
>>>>> [email protected]
>>>>> http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Hdf-forum is for HDF software users discussion.
>>>> [email protected]
>>>> http://*mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>>
>>> --
>>> Mark C. Miller, Lawrence Livermore National Laboratory
>>> ================!!LLNL BUSINESS ONLY!!================
>>> [email protected] urgent: [email protected]
>>> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-851
>>>
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> [email protected]
>>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org