Thanks a lot Mark. Collective communication may not always be possible as two processors may access to the same data. Are you suggesting that one needs to separate I/O requests into collective and independent modes? Can one do a collective I/O even when the data is non-contiguous?
Regards Suman On Tue, Apr 9, 2013 at 11:06 AM, Miller, Mark C. <[email protected]> wrote: > Hi Suman, > > I think you are hinting at the fact that unstructured data is harder to > manage with true concurrent parallel I/O to a single, shared file, right? > Yeah, it is. > > Honestly, I don't have much experience with that. My impression has > always been that the more information you can give HDF5 in the > H5Dwrite/H5Dread calls, the better it and the layers below it (MPI-IO, > luster/gpfs, etc.) can take advantage of things and 'do the right/best > thing'. > > So, I guess one piece of advice is to try to characterize the > application's I/O needs in requests that are 'as large as possible'. And, > if its practical to do so, try to allow processors to 'coordinate' their > I/O by issuing collective instead of independent requests. If parts of some > large array are being read onto many processors, let each processor define > the its piece(s) of that array using HDF5 data selections and then have all > of them do a collective H5Dread. In theory, that lets MPI-IO and lustre do > whatever 'magic' is possible to aggregate all the data into a few I/O > requests to the filesystem but buffer and disperse it to whatever > processors need it efficiently. At least that is how I think > HDF5/MPI-IO/Luster will operate in theory. There may be a lot of issues to > making it efficient in practice. > > Good luck. > > Mark > > -- > Mark C. Miller, Lawrence Livermore National Laboratory > ================!!LLNL BUSINESS ONLY!!================ > [email protected] urgent: [email protected] > T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511 > > From: Suman Vajjala <[email protected]> > Date: Monday, April 8, 2013 10:25 PM > To: HDF Users Discussion List <[email protected]>, "Miller, Mark C." > <[email protected]> > Subject: Re: [Hdf-forum] Performance query > > Hi Mark, > > Thank you for answering my question. You are right about the > practicality issue. This isn't a serious issue if the data is stored in a > distributed sense with some indicators specifying which processor has the > data. This data is treated as a background database and it doesn't move > during the transfer process. > > The reason I asked this question is that I've seen HDF5 taking quite > sometime to read data in an unstructured manner while the MPI calls just > zip through. On an another note, are there any guidelines for parallel I/O > in case of unstructured data? > > Regards > Suman > > > On Tue, Apr 9, 2013 at 10:39 AM, Miller, Mark C. <[email protected]>wrote: > >> Hmm. If I understand the question, I really cannot imagine a scenario >> where parallel I/O would be "faster" than MPI send/recv calls. >> >> However, there may be a practicality issue here. It may be the case >> that the data processor k needs is on processor j but processor k doesn't >> know that processor j has it and processor j doesn't know that processor k >> needs it. So, there has to be some communication to for the processors to >> learn that. >> >> And, if those processors are 'somewhere else' in their execution, then >> you have a significant issue in programming to take advantage of the fact >> that you could use MPI send/recv to move the data anyways. In the end, it >> just might be more practical to just read data from the file, even if it is >> quite a bit slower. >> >> I am not sure I answered the question you asked though ;) >> >> Mark >> >> -- >> Mark C. Miller, Lawrence Livermore National Laboratory >> ================!!LLNL BUSINESS ONLY!!================ >> [email protected] urgent: [email protected] >> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511 >> >> From: Suman Vajjala <[email protected]> >> Reply-To: HDF Users Discussion List <[email protected]> >> Date: Monday, April 8, 2013 9:38 PM >> To: HDF Users Discussion List <[email protected]> >> Subject: [Hdf-forum] Performance query >> >> Hi, >> >> I have a question regarding the performance of parallel I/O vs MPI >> communication based calls. I have data which needs to be accessed by >> different processors. If the data is in memory then MPI calls (Send/Recv) >> does the job. In an another scenario the data is written to a H5 file and >> different processors access the respective data using parallel I/O. Would >> MPI calls be faster than HDF5 parallel I/O? (data access could be >> unstructured) >> >> Regards >> Suman Vajjala >> >> _______________________________________________ >> Hdf-forum is for HDF software users discussion. >> [email protected] >> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >> >> >
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
