Hi Leigh,
        As Mark says below, you currently need to perform all metadata 
modifications from all processes.  That includes writing new data values to 
attributes, as well as other, more obvious operations like creating or deleting 
objects.

        Quincey

On Jan 19, 2011, at 7:27 PM, Mark Howison wrote:

> Hi Leigh,
> 
> I'm not familiar with the F90 API, but here is an example in C that only 
> writes from rank 0:
> 
> -----
> 
> if (rank == 0) {
>    H5Sselect_all(diskspace);
> } else {
>    H5Sselect_none(diskspace);
> }
> 
> H5Dwrite(dataset, TYPE, memspace, diskspace, dxpl, buffer);
> 
> -----
> 
> Notice that all tasks call H5Dwrite (as required for a collective write) even 
> though only rank 0 has actually selected a region to write to in the disk 
> space.
> 
> If you have a single integer, you probably want to write it as an attribute. 
> Almost all calls except H5Dwrite, including attribute and metadata 
> operations, are assumed to be collective, and expect the same values across 
> all tasks. There is a handy reference here to confirm this for individual 
> calls:
> 
> http://www.hdfgroup.org/HDF5/doc/RM/CollectiveCalls.html
> 
> So you don't have to manually tell HDF5 to only write an attribute from rank 
> 0, for instance. I believe that all metadata is cached across all ranks, so 
> each rank will need the actual value anyway (otherwise it would have to be 
> broadcast from rank 0 if you only wrote from there).
> 
> The metadata is written to disk as it is evicted from the metadata cache. It 
> used to be that this was only done from rank 0, which has an identical copy 
> of the metadata cache as every other task. But we recently collaborated with 
> the HDF Group to modify this to round-robin writes across MPI tasks to 
> improve performance on parallel file systems that expect many-to-one file 
> access patterns (such as Lustre or GPFS). The eventual goal is to have a 
> paging mechanism that will aggregate metadata into large chunks that align to 
> file system boundaries, then write only from rank 0 or a subset of writers 
> (as in collective buffering algorithms found in MPI-IO implementations). 
> Quincey knows more about that and how it will be implemented, but it will 
> probably require MPI communication to maintain cache coherency.
> 
> So anyway, the point is that you only have to worry about empty selections 
> for dataset writes.
> 
> Hope that helps,
> 
> Mark
> 
> On Wed, Jan 19, 2011 at 6:36 PM, Leigh Orf <[email protected]> wrote:
> Mark,
> 
> Could you give me an example of a call to H5Dwrite (fortran90 api) where an 
> "empty selection" is passed? I don't know which argument you mean.
> 
> There are many cases (with metadata for instance) where I need only one 
> member of a group to write the metadata. I am finding that weird things are 
> happening with some of my code as I work with pHDF5 but I think it's because 
> I don't entirely understand what pHDF5 expects.
> 
> For instance, if I have a single integer that is common amongst all ranks in 
> a collective group writing to one file, do I just pick the root rank to do 
> the write and have all other ranks pass some dummy variable?
> 
> I can understand the paradigm where you are writing data that is different on 
> each rank and you need to specify dims and offsets etc. (the example codes 
> show this) but the "easier" case is throwing me.
> 
> Thanks,
> 
> Leigh
> 
> On Tue, Jan 18, 2011 at 5:15 AM, Mark Howison <[email protected]> wrote:
> Hi Leigh,
> 
> Yes, it is only a small difference in code between collective and independent 
> mode for the MPI-IO VFD. To enable collective I/O, you pass a dataset 
> transfer property list to H5Dwrite like this:
> dxpl_id = H5Pcreate(H5P_DATASET_XFER);
> H5Pset_dxpl_mpio(dxpl_id, H5FD_MPIO_COLLECTIVE);
> 
> 
> 
> H5Dwrite(dset_id, H5T_NATIVE_FLOAT, memspace, filespace, dxpl_id, somedata0);
> 
> 
> 
> One additional constraint with collective I/O, though, is that all MPI tasks 
> must call H5Dwrite. If not, your program will stall in a barrier. In 
> contrast, with independent I/O you can execute writes with no coordination 
> among MPI tasks.
> 
> If you do want only a subset of MPI tasks to write in collective mode, you 
> can pass an empty selection to H5Dwrite for the non-writing tasks.
> 
> Mark
> 
> 
> On Tue, Jan 18, 2011 at 12:45 AM, Leigh Orf <[email protected]> wrote:
> Elena,
> 
> That is good news, indeed this was with 1.8.5-patch1.
> 
> Is code written with using independent IO structured significantly different 
> than with collective IO? I would like to get moving with pHDF5 and as I am 
> currently not too familiar with it, want to make sure that I am not going to 
> have to do a rewrite after the collective code works. It does seem to all 
> occur behind the scenes with the h5dwrite command, so I presume I am safe.
> 
> Thanks,
> 
> Leigh
> 
> On Mon, Jan 17, 2011 at 4:59 PM, Elena Pourmal <[email protected]> wrote:
> Leigh,
> 
> I am writing to confirm that the bug you reported does exist in 1.8.5-patch1, 
> but is fixed in 1.8.6 (coming soon).
> 
> Elena
> On Jan 16, 2011, at 3:47 PM, Leigh Orf wrote:
> 
>> I managed to build pHDF5 on blueprint.ncsa.uiuc.edu (IBM AIX Power 6). I 
>> compiled the hyperslab_by_chunk.f90 test program found at 
>> http://www.hdfgroup.org/HDF5/Tutor/phypechk.html without error. When I run 
>> it, however, I get the following output:
>> 
>> ATTENTION: 0031-408  4 tasks allocated by LoadLeveler, continuing...
>> ERROR: 0032-110 Attempt to free a predefined datatype  (2) in MPI_Type_free, 
>> task 0
>> ERROR: 0032-110 Attempt to free a predefined datatype  (2) in MPI_Type_free, 
>> task 1
>> ERROR: 0032-110 Attempt to free a predefined datatype  (2) in MPI_Type_free, 
>> task 2
>> ERROR: 0032-110 Attempt to free a predefined datatype  (2) in MPI_Type_free, 
>> task 3
>> HDF5: infinite loop closing library
>>       
>> D,S,T,D,S,F,D,G,S,T,F,AC,FD,P,FD,P,FD,P,E,E,SL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL,FL
>> HDF5: infinite loop closing library
>> 
>> The line which causes the grief is:
>> 
>>     CALL h5dwrite_f(dset_id, H5T_NATIVE_INTEGER, data, dimsfi, error, &
>>                     file_space_id = filespace, mem_space_id = memspace, 
>> xfer_prp = plist_id)
>> 
>> If I replace that call with the one that is commented out in the program, it 
>> runs without a problem. That line is:
>> 
>> CALL h5dwrite_f(dset_id, H5T_NATIVE_INTEGER, data, dimsfi,error, &
>>                       file_space_id = filespace, mem_space_id = memspace)
>> 
>> Any ideas? I definitely want to take advantage of doing collective I/O if 
>> possible.
>> 
>> Leigh
>> 
>> --
>> Leigh Orf
>> Associate Professor of Atmospheric Science
>> Department of Geology and Meteorology
>> Central Michigan University
>> Currently on sabbatical at the National Center for Atmospheric Research in 
>> Boulder, CO
>> NCAR office phone: (303) 497-8200
>> 
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> 
> 
> 
> 
> -- 
> Leigh Orf
> Associate Professor of Atmospheric Science
> Department of Geology and Meteorology
> Central Michigan University
> Currently on sabbatical at the National Center for Atmospheric Research in 
> Boulder, CO
> NCAR office phone: (303) 497-8200
> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> 
> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> 
> 
> 
> 
> -- 
> Leigh Orf
> Associate Professor of Atmospheric Science
> Department of Geology and Meteorology
> Central Michigan University
> Currently on sabbatical at the National Center for Atmospheric Research in 
> Boulder, CO
> NCAR office phone: (303) 497-8200
> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to