Hi Dan, I believe that Pavan was referring to my conversation with him about MPI_Request_free. Here’s my situation: I’d like to use MPI_Ibarrier as a form of “memory fence” between some of the metadata reads and writes in HDF5. Here’s some [very] simplified pseudocode for what I’d like to do:
=============================== <open HDF5 file> // sets up a communicator for internal HDF5 communication about this file do { MPI_Ibarrier(<file’s communicator>, &request); <application stuff> // HDF5 operation: if(<operation is read or write>) { MPI_Wait(&request); <perform read / write> } else { // operation is a file close MPI_Request_free(&request); MPI_File_close(…); MPI_Comm_free(<file’s communicator>); } } while (<file is open>); =============================== What I am really trying to avoid is calling MPI_Wait at file close, since it is semantically unnecessary and only increases the latency from the application’s perspective. If I can’t call MPI_Request_free on a nonblocking collective operation’s request (and it looks like I can’t, right now), I will have to put the request and file’s communicator into a “cleanup” list that is polled periodically [on each rank] with MPI_Test and disposed of when the nonblocking barrier completes locally. So, I’d really like to be able to call MPI_Request_free on the nonblocking barrier’s request. Thoughts? Quincey > On Aug 13, 2020, at 9:07 AM, HOLMES Daniel via mpi-forum > <mpi-forum@lists.mpi-forum.org> wrote: > > Hi Jim, > > To be clear, I think that MPI_CANCEL is evil and should be removed from the > MPI Standard entirely at the earliest convenience. > > I am certainly not arguing that it be permitted for more MPI operations. > > I thought the discussion was focused on MPI_REQUEST_FREE and whether or not > it can/should be used on an active request. > > If a particular MPI implementation does not keep a reference to the request > between MPI_RPUT and MPI_REQUEST_FREE, but needs that reference to process > the completion event, then that MPI implementation would be required to keep > a reference to the request from MPI_REQUEST_FREE until that important task > had been done, perhaps until the close epoch call. This requires no new > memory because the user is giving up their reference to the request, so MPI > can safely use the request it is passed in MPI_REQUEST_FREE without copying > it. As you say, MPI takes over the responsibility for processing the > completion event. > > Your question about why the implementation should be required to take on this > complexity is a good one. That, I guess, is why freeing any active request is > a bad idea. MPI is required to differentiate completion of individual > operations (so it can implement MPI_WAIT) but that means something must > process completion at some point for each individual operation. In RMA, that > responsibility can be discharged earlier than in other parts of the MPI > interface, but the real question is “why should MPI offer to take on this > responsibility in the first place?” > > Thanks, that helps (me at least). > > Cheers, > Dan. > — > Dr Daniel Holmes PhD > Architect (HPC Research) > d.hol...@epcc.ed.ac.uk <mailto:d.hol...@epcc.ed.ac.uk> > Phone: +44 (0) 131 651 3465 > Mobile: +44 (0) 7940 524 088 > Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh, EH8 > 9BT > — > The University of Edinburgh is a charitable body, registered in Scotland, > with registration number SC005336. > — > >> On 13 Aug 2020, at 14:43, Jim Dinan <james.di...@gmail.com >> <mailto:james.di...@gmail.com>> wrote: >> >> The two cases you mentioned would have the same behavior at an application >> level. However, there may be important differences in the implementation of >> each operation. For example, an MPI_Put operation may be configured to not >> generate a completion event, whereas an MPI_Rput would. The library may be >> relying on the user to make a call on the request to process the event and >> clean up resources. The implementation can take over this responsibility if >> the user cancels the request, but why should we ask implementers to take on >> this complexity and overhead? >> >> My $0.02 is that MPI_Cancel is subtle and complicated, and we should be very >> careful about where we allow it. I don't see the benefit to the programming >> model outweighing the complexity and overhead in the MPI runtime for the >> case of MPI_Rput. I also don't know that we were careful enough in >> specifying the RMA memory model that a canceled request-based RMA operation >> will still have well-defined behavior. My understanding is that MPI_Cancel >> is required primarily for canceling receive requests to meet MPI's quiescent >> shutdown requirement. >> >> ~Jim. >> >> On Thu, Aug 13, 2020 at 8:11 AM HOLMES Daniel via mpi-forum >> <mpi-forum@lists.mpi-forum.org <mailto:mpi-forum@lists.mpi-forum.org>> wrote: >> Hi all, >> >> To increase my own understanding of RMA, what is the difference (if any) >> between a request-based RMA operation where the request is freed without >> being completed and before the epoch is closed and a “normal” RMA operation? >> >> MPI_LOCK() ! or any other "open epoch at origin" procedure call >> doUserWorkBefore() >> MPI_RPUT(&req) >> MPI_REQUEST_FREE(&req) >> doUserWorkAfter() >> MPI_UNLOCK() ! or the matching “close epoch at origin" procedure call >> >> vs: >> >> MPI_LOCK() ! or any other "open epoch at origin" procedure call >> doUserWorkBefore() >> MPI_PUT() >> doUserWorkAfter() >> MPI_UNLOCK() ! or the matching “close epoch at origin" procedure call >> >> Is this a source-to-source translation that is always safe in either >> direction? >> >> In RMA, in contrast to the rest of MPI, there are two opportunities for MPI >> to “block” and do non-local work to complete an RMA operation: 1) during >> MPI_WAIT for the request (if any - the user may not be given a request or >> the user may choose to free the request without calling MPI_WAIT or the user >> might call nonblocking MPI_TEST) and 2) during the close epoch procedure, >> which is always permitted to be sufficiently non-local to guarantee that the >> RMA operation is complete and its freeing stage has been done. It seems that >> a request-based RMA operation becomes identical to a “normal” RMA operation >> if the user calls MPI_REQUEST_FREE on the request. This is like “freeing" >> the request from a nonblocking point-to-point operation but without the >> guarantee of a later synchronisation procedure that can actually complete >> the operation and actually do the freeing stage of the operation. >> >> In collectives, there is no “ensure all operations so far are now done” >> procedure call because there is no concept of epoch for collectives. >> In point-to-point, there is no “ensure all operations so far are now done” >> procedure call because there is no concept of epoch for point-to-point. >> In file operations, there is no “ensure all operations so far are now done” >> procedure call because there is no concept of epoch for file operations. >> (There is MPI_FILE_SYNC but it is optional so MPI cannot rely on it being >> called.) >> In these cases, the only non-local procedure that is guaranteed to happen is >> MPI_FINALIZE, hence all outstanding non-local work needed by the “freed” >> operation might be delayed until that procedure is called. >> >> The issue with copying parameters is also moot because all of them are >> passed-by-value (implicitly copied) or are data-buffers and covered by >> “conflicting accesses” RMA rules. >> >> Thus, to me it seems to me that RMA is a very special case - it could >> support different semantics, but that does not provide a good basis for >> claiming that the rest of the MPI Standard can support those different >> semantics - unless we introduce an epoch concept into the rest of the MPI >> Standard. This is not unreasonable: the notifications in GASPI, for example, >> guarantee completion of not just the operation they are attached to but >> *all* operations issued in the “queue” they represent since the last >> notification. Their queue concept serves the purpose of an epoch. I’m sure >> there are other examples in other APIs. It seems to me likely that the >> proposal for MPI_PSYNC for partitioned communication operations is moving in >> the direction of an epoch, although limited to remote completion of all the >> partitions in a single operation, which accidentally guarantees that the >> operation can be freed locally using a local procedure. >> >> Cheers, >> Dan. >> — >> Dr Daniel Holmes PhD >> Architect (HPC Research) >> d.hol...@epcc.ed.ac.uk <mailto:d.hol...@epcc.ed.ac.uk> >> Phone: +44 (0) 131 651 3465 >> Mobile: +44 (0) 7940 524 088 >> Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh, EH8 >> 9BT >> — >> The University of Edinburgh is a charitable body, registered in Scotland, >> with registration number SC005336. >> — >> >>> On 13 Aug 2020, at 01:40, Skjellum, Anthony via mpi-forum >>> <mpi-forum@lists.mpi-forum.org <mailto:mpi-forum@lists.mpi-forum.org>> >>> wrote: >>> >>> FYI, one argument (also used to force us to add restrictions on MPI >>> persistent collective initialization to be blocking)... The >>> MPI_Request_free on an NBC poses a problem for the cases where there are >>> array types >>> posed (e.g., Alltoallv/w)... It will not be knowable to the application if >>> the vectors are in use by MPI still after >>> the free on an active request. We do *not* mandate that the MPI >>> implementation copy such arrays currently, so they are effectively "held as >>> unfreeable" by the MPI implementation till MPI_Finalize. The user cannot >>> deallocate them in a correct program till after MPI_Finalize. >>> >>> Another effect for NBC of releasing an active request, IMHO, is that you >>> don't know when send buffers are free to be deallocated or receive buffers >>> are free to be deallocated... since you don't know when the transfer is >>> complete OR the buffers are no longer used by MPI (till after MPI_Finalize). >>> >>> Tony >>> >>> >>> >>> >>> Anthony Skjellum, PhD >>> Professor of Computer Science and Chair of Excellence >>> Director, SimCenter >>> University of Tennessee at Chattanooga (UTC) >>> tony-skjel...@utc.edu <mailto:tony-skjel...@utc.edu> [or >>> skjel...@gmail.com <mailto:skjel...@gmail.com>] >>> cell: 205-807-4968 >>> >>> From: mpi-forum <mpi-forum-boun...@lists.mpi-forum.org >>> <mailto:mpi-forum-boun...@lists.mpi-forum.org>> on behalf of Jeff Hammond >>> via mpi-forum <mpi-forum@lists.mpi-forum.org >>> <mailto:mpi-forum@lists.mpi-forum.org>> >>> Sent: Saturday, August 8, 2020 12:07 PM >>> To: Main MPI Forum mailing list <mpi-forum@lists.mpi-forum.org >>> <mailto:mpi-forum@lists.mpi-forum.org>> >>> Cc: Jeff Hammond <jeff.scie...@gmail.com <mailto:jeff.scie...@gmail.com>> >>> Subject: Re: [Mpi-forum] MPI_Request_free restrictions >>> >>> We should fix the RMA chapter with an erratum. I care less about NBC but >>> share your ignorance of why it was done that way. >>> >>> Sent from my iPhone >>> >>>> On Aug 8, 2020, at 6:51 AM, Balaji, Pavan via mpi-forum >>>> <mpi-forum@lists.mpi-forum.org <mailto:mpi-forum@lists.mpi-forum.org>> >>>> wrote: >>>> >>>> Folks, >>>> >>>> Does someone remember why we disallowed users from calling >>>> MPI_Request_free on nonblocking collective requests? I remember the >>>> reasoning for not allowing cancel (i.e., the operation might have >>>> completed on some processes, but not all), but not for Request_free. >>>> AFAICT, allowing the users to free the request doesn’t make any difference >>>> to the MPI library. The MPI library would simply maintain its own >>>> refcount to the request and continue forward till the operation completes. >>>> One of our users would like to free NBC requests so they don’t have to >>>> wait for the operation to complete in some situations. >>>> >>>> Unfortunately, when I added the Rput/Rget operations in the RMA chapter, I >>>> copy-pasted that text into RMA as well without thinking too hard about it. >>>> My bad! Either the RMA committee missed it too, or they thought of a >>>> reason that I can’t think of now. >>>> >>>> Can someone clarify or remind me what the reason was? >>>> >>>> Regards, >>>> >>>> — Pavan >>>> >>>> MPI-3.1 standard, page 197, lines 26-27: >>>> >>>> “It is erroneous to call MPI_REQUEST_FREE or MPI_CANCEL for a request >>>> associated with a nonblocking collective operation.” >>>> >>>> _______________________________________________ >>>> mpi-forum mailing list >>>> mpi-forum@lists.mpi-forum.org <mailto:mpi-forum@lists.mpi-forum.org> >>>> https://lists.mpi-forum.org/mailman/listinfo/mpi-forum >>>> <https://lists.mpi-forum.org/mailman/listinfo/mpi-forum> >>> _______________________________________________ >>> mpi-forum mailing list >>> mpi-forum@lists.mpi-forum.org <mailto:mpi-forum@lists.mpi-forum.org> >>> https://lists.mpi-forum.org/mailman/listinfo/mpi-forum >>> <https://lists.mpi-forum.org/mailman/listinfo/mpi-forum> >> _______________________________________________ >> mpi-forum mailing list >> mpi-forum@lists.mpi-forum.org <mailto:mpi-forum@lists.mpi-forum.org> >> https://lists.mpi-forum.org/mailman/listinfo/mpi-forum >> <https://lists.mpi-forum.org/mailman/listinfo/mpi-forum> > > _______________________________________________ > mpi-forum mailing list > mpi-forum@lists.mpi-forum.org > https://lists.mpi-forum.org/mailman/listinfo/mpi-forum
_______________________________________________ mpi-forum mailing list mpi-forum@lists.mpi-forum.org https://lists.mpi-forum.org/mailman/listinfo/mpi-forum