Thanks once again for the advice. Regards Suman
On Tue, Apr 9, 2013 at 11:23 AM, Miller, Mark C. <mille...@llnl.gov> wrote: > > > Thanks a lot Mark. Collective communication may not always be possible > as two processors may access to the same data. Are you suggesting that one > needs to separate I/O requests into collective and independent modes? > > > Yes, I guess I am. I mean, if its practical to do so. > > Can one do a collective I/O even when the data is non-contiguous? > > > You mean non-contiguous in the file? Yes, you can do collective I/O even > if the data is not contiguous in the file. That’s what data selections are > good at. But every processor (in the communicator used to open the file) > must participate in a collective operation. > > > > Regards > Suman > > > On Tue, Apr 9, 2013 at 11:06 AM, Miller, Mark C. <mille...@llnl.gov>wrote: > >> Hi Suman, >> >> I think you are hinting at the fact that unstructured data is harder to >> manage with true concurrent parallel I/O to a single, shared file, right? >> Yeah, it is. >> >> Honestly, I don't have much experience with that. My impression has >> always been that the more information you can give HDF5 in the >> H5Dwrite/H5Dread calls, the better it and the layers below it (MPI-IO, >> luster/gpfs, etc.) can take advantage of things and 'do the right/best >> thing'. >> >> So, I guess one piece of advice is to try to characterize the >> application's I/O needs in requests that are 'as large as possible'. And, >> if its practical to do so, try to allow processors to 'coordinate' their >> I/O by issuing collective instead of independent requests. If parts of some >> large array are being read onto many processors, let each processor define >> the its piece(s) of that array using HDF5 data selections and then have all >> of them do a collective H5Dread. In theory, that lets MPI-IO and lustre do >> whatever 'magic' is possible to aggregate all the data into a few I/O >> requests to the filesystem but buffer and disperse it to whatever >> processors need it efficiently. At least that is how I think >> HDF5/MPI-IO/Luster will operate in theory. There may be a lot of issues to >> making it efficient in practice. >> >> Good luck. >> >> Mark >> >> -- >> Mark C. Miller, Lawrence Livermore National Laboratory >> ================!!LLNL BUSINESS ONLY!!================ >> mille...@llnl.gov urgent: markcmille...@gmail.com >> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511 >> >> From: Suman Vajjala <suman.g...@gmail.com> >> Date: Monday, April 8, 2013 10:25 PM >> To: HDF Users Discussion List <hdf-forum@hdfgroup.org>, "Miller, Mark >> C." <mille...@llnl.gov> >> Subject: Re: [Hdf-forum] Performance query >> >> Hi Mark, >> >> Thank you for answering my question. You are right about the >> practicality issue. This isn't a serious issue if the data is stored in a >> distributed sense with some indicators specifying which processor has the >> data. This data is treated as a background database and it doesn't move >> during the transfer process. >> >> The reason I asked this question is that I've seen HDF5 taking >> quite sometime to read data in an unstructured manner while the MPI calls >> just zip through. On an another note, are there any guidelines for parallel >> I/O in case of unstructured data? >> >> Regards >> Suman >> >> >> On Tue, Apr 9, 2013 at 10:39 AM, Miller, Mark C. <mille...@llnl.gov>wrote: >> >>> Hmm. If I understand the question, I really cannot imagine a scenario >>> where parallel I/O would be "faster" than MPI send/recv calls. >>> >>> However, there may be a practicality issue here. It may be the case >>> that the data processor k needs is on processor j but processor k doesn't >>> know that processor j has it and processor j doesn't know that processor k >>> needs it. So, there has to be some communication to for the processors to >>> learn that. >>> >>> And, if those processors are 'somewhere else' in their execution, then >>> you have a significant issue in programming to take advantage of the fact >>> that you could use MPI send/recv to move the data anyways. In the end, it >>> just might be more practical to just read data from the file, even if it is >>> quite a bit slower. >>> >>> I am not sure I answered the question you asked though ;) >>> >>> Mark >>> >>> -- >>> Mark C. Miller, Lawrence Livermore National Laboratory >>> ================!!LLNL BUSINESS ONLY!!================ >>> mille...@llnl.gov urgent: markcmille...@gmail.com >>> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511 >>> >>> From: Suman Vajjala <suman.g...@gmail.com> >>> Reply-To: HDF Users Discussion List <hdf-forum@hdfgroup.org> >>> Date: Monday, April 8, 2013 9:38 PM >>> To: HDF Users Discussion List <hdf-forum@hdfgroup.org> >>> Subject: [Hdf-forum] Performance query >>> >>> Hi, >>> >>> I have a question regarding the performance of parallel I/O vs MPI >>> communication based calls. I have data which needs to be accessed by >>> different processors. If the data is in memory then MPI calls (Send/Recv) >>> does the job. In an another scenario the data is written to a H5 file and >>> different processors access the respective data using parallel I/O. Would >>> MPI calls be faster than HDF5 parallel I/O? (data access could be >>> unstructured) >>> >>> Regards >>> Suman Vajjala >>> >>> _______________________________________________ >>> Hdf-forum is for HDF software users discussion. >>> Hdf-forum@hdfgroup.org >>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >>> >>> >> >
_______________________________________________ Hdf-forum is for HDF software users discussion. Hdf-forum@hdfgroup.org http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org