Hi Mohamad, Thanks for the tip. I already open the file in parallel and set up the sets collectively with the proper dimensions and sizes, so I guess at least on this part I'm OK! My only worry is when I write (and read in the future) chunks independently from the different ranks that opened the file. All the slides and tutorials tell me that this is not the most efficient scheme and yet it is the only one I can actually use reliably.
Regards, Matthieu 2013/7/17 Mohamad Chaarawi <[email protected]>: > Hi Matthieu, > > You probably know that already, but since HDF5 structures your data, there > is no such thing as accessing an offset in the file in HDF5. You will > create/open objects in the HDF5 file and access raw data in HDF5 Datasets > through dataspace selections. > > Now to your question, you can create/open the file using a sub communicator > containing the I/O processes. Then you can write your data, in independent > mode, once each I/O process is ready to write its buffer. > > One thing you have to note that creating the HDF5 objects that will organize > your file and hold your data (groups, datasets, etc...) has to be done > collectively (i.e. all processes have to participate in those calls). If you > know that information beforehand, you can do all of that on initialization, > like when you create the file. If you don't, then there has to be a > synchronization phase where all I/O processes get together and create those > objects whenever needed. Writing your raw data to the file can be done > independently or collectively. > > Hope this helps, > Mohamad > > -----Original Message----- > From: Hdf-forum [mailto:[email protected]] On Behalf Of > Matthieu Brucher > Sent: Wednesday, July 17, 2013 4:51 AM > To: [email protected] > Subject: [Hdf-forum] Parallel HDF5 and independent I/O > > Hi, > > I'm starting looking into HDF5 for structured output files, and I have seen > a lot of slides showing that collective I/O was far better than independent > I/O. > My application is a little bit different and doesn't fit the collective I/O > pattern. I have an unstructured grid split on several nodes, and when I > write a result, I need to reorder the data I want to write so that it > follows growing node indices (requirement of the viewers, requirements of > the post-processing...). > As the grid is unstructured and split with parmetis, I can't directly write > the data to disk, and I'm doing asynchronous gathers (using NBC) of all data > to a subset of process that will actually write. Due to how the gathering is > done (all ranks fill a chunk of indices with the data they have and they are > reduced at the writing rank), I can't have buffers for all gathers at the > same time, so I have a list of buffers that I'm using and as soon as a > gather is done, I call a callback on the corresponding rank to write down > everything. > This works really nice when I want to write one piece of data in a binary > file with pwrite, and even if the asynchronous gather process is not > optimized yet, I have improvements in the used I/O bandwidth as soon as I > use several blades (2 to 3 times better for using 2 or 4 blades). > > Now I'd like to do the same with HDF5. So I will get new chunks with the > offset in the file each time a gather is done. If the chunks are properly > organized, I will have overlapping communication and overlapping I/O as each > chunk will be at another offset in the HDF5. > > Will the HDF5 library behave properly, meaning will it directly write on the > disk waiting for other processes? > > I can provide the test case by private email if someone has a clue! > > Regards, > > Matthieu Brucher > -- > Information System Engineer, Ph.D. > Blog: http://matt.eifelle.com > LinkedIn: http://www.linkedin.com/in/matthieubrucher > Music band: http://liliejay.com/ > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
