The overhead of cleanup doesn't go away; the MPI runtime would need to create a similar cleanup list and process it. It looks to me like the performance problem might actually be caused by the Ibarrier not making asynchronous progress when application stuff is happening.
~Jim. ~Jim. On Fri, Aug 14, 2020 at 12:33 PM Quincey Koziol via mpi-forum < mpi-forum@lists.mpi-forum.org> wrote: > Hi Dan, > I believe that Pavan was referring to my conversation with him about > MPI_Request_free. Here’s my situation: I’d like to use MPI_Ibarrier as a > form of “memory fence” between some of the metadata reads and writes in > HDF5. Here’s some [very] simplified pseudocode for what I’d like to do: > > =============================== > > <open HDF5 file> // sets up a communicator for internal HDF5 > communication about this file > > do { > MPI_Ibarrier(<file’s communicator>, &request); > > <application stuff> > > // HDF5 operation: > if(<operation is read or write>) { > MPI_Wait(&request); > <perform read / write> > } > else { // operation is a file close > MPI_Request_free(&request); > MPI_File_close(…); > MPI_Comm_free(<file’s communicator>); > } > } while (<file is open>); > > =============================== > > What I am really trying to avoid is calling MPI_Wait at file close, since > it is semantically unnecessary and only increases the latency from the > application’s perspective. If I can’t call MPI_Request_free on a > nonblocking collective operation’s request (and it looks like I can’t, > right now), I will have to put the request and file’s communicator into a > “cleanup” list that is polled periodically [on each rank] with MPI_Test and > disposed of when the nonblocking barrier completes locally. > > So, I’d really like to be able to call MPI_Request_free on the nonblocking > barrier’s request. > > Thoughts? > > Quincey > > > On Aug 13, 2020, at 9:07 AM, HOLMES Daniel via mpi-forum < > mpi-forum@lists.mpi-forum.org> wrote: > > Hi Jim, > > To be clear, I think that MPI_CANCEL is evil and should be removed from > the MPI Standard entirely at the earliest convenience. > > I am certainly not arguing that it be permitted for more MPI operations. > > I thought the discussion was focused on MPI_REQUEST_FREE and whether or > not it can/should be used on an active request. > > If a particular MPI implementation does not keep a reference to the > request between MPI_RPUT and MPI_REQUEST_FREE, but needs that reference to > process the completion event, then that MPI implementation would be > required to keep a reference to the request from MPI_REQUEST_FREE until > that important task had been done, perhaps until the close epoch call. This > requires no new memory because the user is giving up their reference to the > request, so MPI can safely use the request it is passed in MPI_REQUEST_FREE > without copying it. As you say, MPI takes over the responsibility for > processing the completion event. > > Your question about why the implementation should be required to take on > this complexity is a good one. That, I guess, is why freeing any active > request is a bad idea. MPI is required to differentiate completion of > individual operations (so it can implement MPI_WAIT) but that means > something must process completion at some point for each individual > operation. In RMA, that responsibility can be discharged earlier than in > other parts of the MPI interface, but the real question is “why should MPI > offer to take on this responsibility in the first place?” > > Thanks, that helps (me at least). > > Cheers, > Dan. > — > Dr Daniel Holmes PhD > Architect (HPC Research) > d.hol...@epcc.ed.ac.uk > Phone: +44 (0) 131 651 3465 > Mobile: +44 (0) 7940 524 088 > Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh, > EH8 9BT > — > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. > — > > On 13 Aug 2020, at 14:43, Jim Dinan <james.di...@gmail.com> wrote: > > The two cases you mentioned would have the same behavior at an application > level. However, there may be important differences in the implementation of > each operation. For example, an MPI_Put operation may be configured to not > generate a completion event, whereas an MPI_Rput would. The library may be > relying on the user to make a call on the request to process the event and > clean up resources. The implementation can take over this responsibility if > the user cancels the request, but why should we ask implementers to take on > this complexity and overhead? > > My $0.02 is that MPI_Cancel is subtle and complicated, and we should be > very careful about where we allow it. I don't see the benefit to the > programming model outweighing the complexity and overhead in the MPI > runtime for the case of MPI_Rput. I also don't know that we were careful > enough in specifying the RMA memory model that a canceled request-based RMA > operation will still have well-defined behavior. My understanding is that > MPI_Cancel is required primarily for canceling receive requests to meet > MPI's quiescent shutdown requirement. > > ~Jim. > > On Thu, Aug 13, 2020 at 8:11 AM HOLMES Daniel via mpi-forum < > mpi-forum@lists.mpi-forum.org> wrote: > >> Hi all, >> >> To increase my own understanding of RMA, what is the difference (if any) >> between a request-based RMA operation where the request is freed without >> being completed and before the epoch is closed and a “normal” RMA operation? >> >> MPI_LOCK() ! or any other "open epoch at origin" procedure call >> doUserWorkBefore() >> MPI_RPUT(&req) >> MPI_REQUEST_FREE(&req) >> doUserWorkAfter() >> MPI_UNLOCK() ! or the matching “close epoch at origin" procedure call >> >> vs: >> >> MPI_LOCK() ! or any other "open epoch at origin" procedure call >> doUserWorkBefore() >> MPI_PUT() >> doUserWorkAfter() >> MPI_UNLOCK() ! or the matching “close epoch at origin" procedure call >> >> Is this a source-to-source translation that is always safe in either >> direction? >> >> In RMA, in contrast to the rest of MPI, there are two opportunities for >> MPI to “block” and do non-local work to complete an RMA operation: 1) >> during MPI_WAIT for the request (if any - the user may not be given a >> request or the user may choose to free the request without calling MPI_WAIT >> or the user might call nonblocking MPI_TEST) and 2) during the close epoch >> procedure, which is always permitted to be sufficiently non-local to >> guarantee that the RMA operation is complete and its freeing stage has been >> done. It seems that a request-based RMA operation becomes identical to a >> “normal” RMA operation if the user calls MPI_REQUEST_FREE on the request. >> This is like “freeing" the request from a nonblocking point-to-point >> operation but without the guarantee of a later synchronisation procedure >> that can actually complete the operation and actually do the freeing stage >> of the operation. >> >> In collectives, there is no “ensure all operations so far are now done” >> procedure call because there is no concept of epoch for collectives. >> In point-to-point, there is no “ensure all operations so far are now >> done” procedure call because there is no concept of epoch >> for point-to-point. >> In file operations, there is no “ensure all operations so far are now >> done” procedure call because there is no concept of epoch for file >> operations. (There is MPI_FILE_SYNC but it is optional so MPI cannot rely >> on it being called.) >> In these cases, the only non-local procedure that is guaranteed to happen >> is MPI_FINALIZE, hence all outstanding non-local work needed by the “freed” >> operation might be delayed until that procedure is called. >> >> The issue with copying parameters is also moot because all of them are >> passed-by-value (implicitly copied) or are data-buffers and covered by >> “conflicting accesses” RMA rules. >> >> Thus, to me it seems to me that RMA is a very special case - it could >> support different semantics, but that does not provide a good basis for >> claiming that the rest of the MPI Standard can support those different >> semantics - unless we introduce an epoch concept into the rest of the MPI >> Standard. This is not unreasonable: the notifications in GASPI, for >> example, guarantee completion of not just the operation they are attached >> to but *all* operations issued in the “queue” they represent since the last >> notification. Their queue concept serves the purpose of an epoch. I’m sure >> there are other examples in other APIs. It seems to me likely that the >> proposal for MPI_PSYNC for partitioned communication operations is moving >> in the direction of an epoch, although limited to remote completion of all >> the partitions in a single operation, which accidentally guarantees that >> the operation can be freed locally using a local procedure. >> >> Cheers, >> Dan. >> — >> Dr Daniel Holmes PhD >> Architect (HPC Research) >> d.hol...@epcc.ed.ac.uk >> Phone: +44 (0) 131 651 3465 >> Mobile: +44 (0) 7940 524 088 >> Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh, >> EH8 9BT >> — >> The University of Edinburgh is a charitable body, registered in Scotland, >> with registration number SC005336. >> — >> >> On 13 Aug 2020, at 01:40, Skjellum, Anthony via mpi-forum < >> mpi-forum@lists.mpi-forum.org> wrote: >> >> FYI, one argument (also used to force us to add restrictions on MPI >> persistent collective initialization to be blocking)... The >> MPI_Request_free on an NBC poses a problem for the cases where there are >> array types >> posed (e.g., Alltoallv/w)... It will not be knowable to the application >> if the vectors are in use by MPI still after >> the free on an active request. We do *not* mandate that the MPI >> implementation copy such arrays currently, so they are effectively "held as >> unfreeable" by the MPI implementation till MPI_Finalize. The user >> cannot deallocate them in a correct program till after MPI_Finalize. >> >> Another effect for NBC of releasing an active request, IMHO, is that you >> don't know when send buffers are free to be deallocated or receive buffers >> are free to be deallocated... since you don't know when the transfer is >> complete OR the buffers are no longer used by MPI (till after MPI_Finalize). >> >> Tony >> >> >> >> >> Anthony Skjellum, PhD >> Professor of Computer Science and Chair of Excellence >> Director, SimCenter >> University of Tennessee at Chattanooga (UTC) >> tony-skjel...@utc.edu [or skjel...@gmail.com] >> cell: 205-807-4968 >> >> ------------------------------ >> *From:* mpi-forum <mpi-forum-boun...@lists.mpi-forum.org> on behalf of >> Jeff Hammond via mpi-forum <mpi-forum@lists.mpi-forum.org> >> *Sent:* Saturday, August 8, 2020 12:07 PM >> *To:* Main MPI Forum mailing list <mpi-forum@lists.mpi-forum.org> >> *Cc:* Jeff Hammond <jeff.scie...@gmail.com> >> *Subject:* Re: [Mpi-forum] MPI_Request_free restrictions >> >> We should fix the RMA chapter with an erratum. I care less about NBC but >> share your ignorance of why it was done that way. >> >> Sent from my iPhone >> >> On Aug 8, 2020, at 6:51 AM, Balaji, Pavan via mpi-forum < >> mpi-forum@lists.mpi-forum.org> wrote: >> >> Folks, >> >> Does someone remember why we disallowed users from calling >> MPI_Request_free on nonblocking collective requests? I remember the >> reasoning for not allowing cancel (i.e., the operation might have completed >> on some processes, but not all), but not for Request_free. AFAICT, >> allowing the users to free the request doesn’t make any difference to the >> MPI library. The MPI library would simply maintain its own refcount to the >> request and continue forward till the operation completes. One of our >> users would like to free NBC requests so they don’t have to wait for the >> operation to complete in some situations. >> >> Unfortunately, when I added the Rput/Rget operations in the RMA chapter, >> I copy-pasted that text into RMA as well without thinking too hard about >> it. My bad! Either the RMA committee missed it too, or they thought of a >> reason that I can’t think of now. >> >> Can someone clarify or remind me what the reason was? >> >> Regards, >> >> — Pavan >> >> MPI-3.1 standard, page 197, lines 26-27: >> >> “It is erroneous to call MPI_REQUEST_FREE or MPI_CANCEL for a request >> associated with a nonblocking collective operation.” >> >> _______________________________________________ >> mpi-forum mailing list >> mpi-forum@lists.mpi-forum.org >> https://lists.mpi-forum.org/mailman/listinfo/mpi-forum >> >> _______________________________________________ >> mpi-forum mailing list >> mpi-forum@lists.mpi-forum.org >> https://lists.mpi-forum.org/mailman/listinfo/mpi-forum >> >> >> _______________________________________________ >> mpi-forum mailing list >> mpi-forum@lists.mpi-forum.org >> https://lists.mpi-forum.org/mailman/listinfo/mpi-forum >> > > _______________________________________________ > mpi-forum mailing list > mpi-forum@lists.mpi-forum.org > https://lists.mpi-forum.org/mailman/listinfo/mpi-forum > > > _______________________________________________ > mpi-forum mailing list > mpi-forum@lists.mpi-forum.org > https://lists.mpi-forum.org/mailman/listinfo/mpi-forum >
_______________________________________________ mpi-forum mailing list mpi-forum@lists.mpi-forum.org https://lists.mpi-forum.org/mailman/listinfo/mpi-forum