Hi Mohamad,
First, thank you for the info. Below are a few followup questions.

Le 2015-01-28 11:24, Mohamad Chaarawi a écrit :
Hi Maxime,

H5Dwrite is for writing raw data, and unlike HDF5 metadata operations the 
library does not require them to be collective unless you ask for it.
For a list of HDF5 function calls which are required to be collective, look 
here:
http://www.hdfgroup.org/HDF5/doc/RM/CollectiveCalls.html
Thanks. Is there such a list for the "lite interface" ?
For raw data, we do not detect whether you are writing to the same position in 
the file or not, and so we just pass the data down onto MPI to write 
collectively or independently. The concept of collective I/O in MPI is to have 
all processes work together to write different portions of the file, not the 
same portion. So if you have 2 processes writing values X and Y to the same 
offset in the file, both writes will happen and the result is really undefined 
in MPI semantics.  Now if you do collective I/O with 2 processes writing X and 
Y to two adjacent positions in the file, mpi io would internally have 1 rank 
combine the two writes and execute it itself rather than have 2 smaller writes 
from different processes to the parallel file system. This is a very simple 
example of what collective I/O is. Of course things get more complicated with 
more processes and more data :-)

Not that it matters here, but note that in your code, you set your dxpl to use 
independent I/O and not collective:
H5Pset_dxpl_mpio(md_plist_id, H5FD_MPIO_INDEPENDENT);
For the big datasets, I do use H5FD_MPIO_COLECTIVE. I aslo create the file with those MPI info parameters :

void HDF5DataStore::createMPIInfo()
{
    MPI_Info_create(&info);
    int comm_size;
    MPI_Comm_size(MPI_COMM_WORLD,&comm_size);

MPI_Info_set(info,"striping_factor",const_cast<char *>(std::to_string(comm_size/2).c_str())); MPI_Info_set(info,"romio_lustre_coll_threshold",const_cast<char *>(std::to_string(32*1024*1024).c_str()));
}

Maybe I should revisit those parameters, but they resulted in good enough performance for the main dataset during my tests.

As for the attribute, it depends on what you need. I don’t know what sort of metadata you 
want to store. If that metadata is related to every large dataset that you write, then 
you should create that attribute on every large dataset. If it is metadata for the entire 
file, then you can just create it on the root group "/" (note this is not a 
dataset, but a group object.. those are 2 different HDF5 objects. Look into the HDF5 user 
guide if you need more information). Note that attribute operations are regarded as HDF5 
metadata operations, unlike H5Dread and H5Dwrite, and are always required to be 
collective, and should be called with the same parameters and values from all processes. 
HDF5 internally manages the metadata cache operations to the file system in that case so 
you don't end up writing multiple times to the file as was the case with what you were 
doing with raw data writes with H5Dwrite.
Note also that If you call H5LTset_attribute_string() twice with the same 
attribute name, the older one is overwritten. So it really depends what you 
want to store as metadata and how..
I will create the attributes on the root groups from now on, and elimite the fake dataset. The metadata basically contains the values of input parameters, that is mostly the name of the input file as well as a few integers.

Thanks a lot. Everything makes much sense now.

Maxime


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to