Re: [Mpi-forum] MPI_Request_free restrictions

Quincey Koziol via mpi-forum Fri, 14 Aug 2020 09:34:17 -0700

Hi Dan,
        I believe that Pavan was referring to my conversation with him about 
MPI_Request_free.  Here’s my situation: I’d like to use MPI_Ibarrier as a form 
of “memory fence” between some of the metadata reads and writes in HDF5.   
Here’s some [very] simplified pseudocode for what I’d like to do:


===============================

<open HDF5 file>   // sets up a communicator for internal HDF5 communication 
about this file

do {
        MPI_Ibarrier(<file’s communicator>, &request);

        <application stuff>

        // HDF5 operation:
        if(<operation is read or write>) {
                MPI_Wait(&request);
                <perform read / write>
        }
        else {  // operation is a file close
                MPI_Request_free(&request);
                MPI_File_close(…);
                MPI_Comm_free(<file’s communicator>);
        }
} while (<file is open>);

===============================

        What I am really trying to avoid is calling MPI_Wait at file close, 
since it is semantically unnecessary and only increases the latency from the 
application’s perspective.   If I can’t call MPI_Request_free on a nonblocking 
collective operation’s request (and it looks like I can’t, right now), I will 
have to put the request and file’s communicator into a “cleanup” list that is 
polled periodically [on each rank] with MPI_Test and disposed of when the 
nonblocking barrier completes locally.

        So, I’d really like to be able to call MPI_Request_free on the 
nonblocking barrier’s request.

        Thoughts?

                Quincey


> On Aug 13, 2020, at 9:07 AM, HOLMES Daniel via mpi-forum 
> <mpi-forum@lists.mpi-forum.org> wrote:
> 
> Hi Jim,
> 
> To be clear, I think that MPI_CANCEL is evil and should be removed from the 
> MPI Standard entirely at the earliest convenience.
> 
> I am certainly not arguing that it be permitted for more MPI operations.
> 
> I thought the discussion was focused on MPI_REQUEST_FREE and whether or not 
> it can/should be used on an active request.
> 
> If a particular MPI implementation does not keep a reference to the request 
> between MPI_RPUT and MPI_REQUEST_FREE, but needs that reference to process 
> the completion event, then that MPI implementation would be required to keep 
> a reference to the request from MPI_REQUEST_FREE until that important task 
> had been done, perhaps until the close epoch call. This requires no new 
> memory because the user is giving up their reference to the request, so MPI 
> can safely use the request it is passed in MPI_REQUEST_FREE without copying 
> it. As you say, MPI takes over the responsibility for processing the 
> completion event.
> 
> Your question about why the implementation should be required to take on this 
> complexity is a good one. That, I guess, is why freeing any active request is 
> a bad idea. MPI is required to differentiate completion of individual 
> operations (so it can implement MPI_WAIT) but that means something must 
> process completion at some point for each individual operation. In RMA, that 
> responsibility can be discharged earlier than in other parts of the MPI 
> interface, but the real question is “why should MPI offer to take on this 
> responsibility in the first place?”
> 
> Thanks, that helps (me at least).
> 
> Cheers,
> Dan.
> —
> Dr Daniel Holmes PhD
> Architect (HPC Research)
> d.hol...@epcc.ed.ac.uk <mailto:d.hol...@epcc.ed.ac.uk>
> Phone: +44 (0) 131 651 3465
> Mobile: +44 (0) 7940 524 088
> Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh, EH8 
> 9BT
> —
> The University of Edinburgh is a charitable body, registered in Scotland, 
> with registration number SC005336.
> —
> 
>> On 13 Aug 2020, at 14:43, Jim Dinan <james.di...@gmail.com 
>> <mailto:james.di...@gmail.com>> wrote:
>> 
>> The two cases you mentioned would have the same behavior at an application 
>> level. However, there may be important differences in the implementation of 
>> each operation. For example, an MPI_Put operation may be configured to not 
>> generate a completion event, whereas an MPI_Rput would. The library may be 
>> relying on the user to make a call on the request to process the event and 
>> clean up resources. The implementation can take over this responsibility if 
>> the user cancels the request, but why should we ask implementers to take on 
>> this complexity and overhead?
>> 
>> My $0.02 is that MPI_Cancel is subtle and complicated, and we should be very 
>> careful about where we allow it. I don't see the benefit to the programming 
>> model outweighing the complexity and overhead in the MPI runtime for the 
>> case of MPI_Rput. I also don't know that we were careful enough in 
>> specifying the RMA memory model that a canceled request-based RMA operation 
>> will still have well-defined behavior. My understanding is that MPI_Cancel 
>> is required primarily for canceling receive requests to meet MPI's quiescent 
>> shutdown requirement.
>> 
>>  ~Jim.
>> 
>> On Thu, Aug 13, 2020 at 8:11 AM HOLMES Daniel via mpi-forum 
>> <mpi-forum@lists.mpi-forum.org <mailto:mpi-forum@lists.mpi-forum.org>> wrote:
>> Hi all,
>> 
>> To increase my own understanding of RMA, what is the difference (if any) 
>> between a request-based RMA operation where the request is freed without 
>> being completed and before the epoch is closed and a “normal” RMA operation?
>> 
>> MPI_LOCK() ! or any other "open epoch at origin" procedure call
>> doUserWorkBefore()
>> MPI_RPUT(&req)
>> MPI_REQUEST_FREE(&req)
>> doUserWorkAfter()
>> MPI_UNLOCK() ! or the matching “close epoch at origin" procedure call
>> 
>> vs:
>> 
>> MPI_LOCK() ! or any other "open epoch at origin" procedure call
>> doUserWorkBefore()
>> MPI_PUT()
>> doUserWorkAfter()
>> MPI_UNLOCK() ! or the matching “close epoch at origin" procedure call
>> 
>> Is this a source-to-source translation that is always safe in either 
>> direction?
>> 
>> In RMA, in contrast to the rest of MPI, there are two opportunities for MPI 
>> to “block” and do non-local work to complete an RMA operation: 1) during 
>> MPI_WAIT for the request (if any - the user may not be given a request or 
>> the user may choose to free the request without calling MPI_WAIT or the user 
>> might call nonblocking MPI_TEST) and 2) during the close epoch procedure, 
>> which is always permitted to be sufficiently non-local to guarantee that the 
>> RMA operation is complete and its freeing stage has been done. It seems that 
>> a request-based RMA operation becomes identical to a “normal” RMA operation 
>> if the user calls MPI_REQUEST_FREE on the request. This is like “freeing" 
>> the request from a nonblocking point-to-point operation but without the 
>> guarantee of a later synchronisation procedure that can actually complete 
>> the operation and actually do the freeing stage of the operation.
>> 
>> In collectives, there is no “ensure all operations so far are now done” 
>> procedure call because there is no concept of epoch for collectives.
>> In point-to-point, there is no “ensure all operations so far are now done” 
>> procedure call because there is no concept of epoch for point-to-point.
>> In file operations, there is no “ensure all operations so far are now done” 
>> procedure call because there is no concept of epoch for file operations. 
>> (There is MPI_FILE_SYNC but it is optional so MPI cannot rely on it being 
>> called.)
>> In these cases, the only non-local procedure that is guaranteed to happen is 
>> MPI_FINALIZE, hence all outstanding non-local work needed by the “freed” 
>> operation might be delayed until that procedure is called.
>> 
>> The issue with copying parameters is also moot because all of them are 
>> passed-by-value (implicitly copied) or are data-buffers and covered by 
>> “conflicting accesses” RMA rules.
>> 
>> Thus, to me it seems to me that RMA is a very special case - it could 
>> support different semantics, but that does not provide a good basis for 
>> claiming that the rest of the MPI Standard can support those different 
>> semantics - unless we introduce an epoch concept into the rest of the MPI 
>> Standard. This is not unreasonable: the notifications in GASPI, for example, 
>> guarantee completion of not just the operation they are attached to but 
>> *all* operations issued in the “queue” they represent since the last 
>> notification. Their queue concept serves the purpose of an epoch. I’m sure 
>> there are other examples in other APIs. It seems to me likely that the 
>> proposal for MPI_PSYNC for partitioned communication operations is moving in 
>> the direction of an epoch, although limited to remote completion of all the 
>> partitions in a single operation, which accidentally guarantees that the 
>> operation can be freed locally using a local procedure.
>> 
>> Cheers,
>> Dan.
>> —
>> Dr Daniel Holmes PhD
>> Architect (HPC Research)
>> d.hol...@epcc.ed.ac.uk <mailto:d.hol...@epcc.ed.ac.uk>
>> Phone: +44 (0) 131 651 3465
>> Mobile: +44 (0) 7940 524 088
>> Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh, EH8 
>> 9BT
>> —
>> The University of Edinburgh is a charitable body, registered in Scotland, 
>> with registration number SC005336.
>> —
>> 
>>> On 13 Aug 2020, at 01:40, Skjellum, Anthony via mpi-forum 
>>> <mpi-forum@lists.mpi-forum.org <mailto:mpi-forum@lists.mpi-forum.org>> 
>>> wrote:
>>> 
>>> FYI, one argument (also used to force us to add restrictions on MPI 
>>> persistent collective initialization to be blocking)... The 
>>> MPI_Request_free on an NBC poses a problem for the cases where there are 
>>> array types
>>> posed (e.g., Alltoallv/w)... It will not be knowable to the application if 
>>> the vectors are in use by MPI still after 
>>> the  free on an active request.  We do *not* mandate that the MPI 
>>> implementation copy such arrays currently, so they are effectively "held as 
>>> unfreeable" by the MPI implementation till MPI_Finalize.  The user cannot 
>>> deallocate them in a correct program till after MPI_Finalize.  
>>> 
>>> Another effect for NBC of releasing an active request, IMHO,  is that you 
>>> don't know when send buffers are free to be deallocated or receive buffers 
>>> are free to be deallocated... since you don't know when the transfer is 
>>> complete OR the buffers are no longer used by MPI (till after MPI_Finalize).
>>> 
>>> Tony
>>> 
>>> 
>>> 
>>> 
>>> Anthony Skjellum, PhD
>>> Professor of Computer Science and Chair of Excellence
>>> Director, SimCenter
>>> University of Tennessee at Chattanooga (UTC)
>>> tony-skjel...@utc.edu <mailto:tony-skjel...@utc.edu>  [or 
>>> skjel...@gmail.com <mailto:skjel...@gmail.com>]
>>> cell: 205-807-4968
>>> 
>>> From: mpi-forum <mpi-forum-boun...@lists.mpi-forum.org 
>>> <mailto:mpi-forum-boun...@lists.mpi-forum.org>> on behalf of Jeff Hammond 
>>> via mpi-forum <mpi-forum@lists.mpi-forum.org 
>>> <mailto:mpi-forum@lists.mpi-forum.org>>
>>> Sent: Saturday, August 8, 2020 12:07 PM
>>> To: Main MPI Forum mailing list <mpi-forum@lists.mpi-forum.org 
>>> <mailto:mpi-forum@lists.mpi-forum.org>>
>>> Cc: Jeff Hammond <jeff.scie...@gmail.com <mailto:jeff.scie...@gmail.com>>
>>> Subject: Re: [Mpi-forum] MPI_Request_free restrictions
>>>  
>>> We should fix the RMA chapter with an erratum. I care less about NBC but 
>>> share your ignorance of why it was done that way. 
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Aug 8, 2020, at 6:51 AM, Balaji, Pavan via mpi-forum 
>>>> <mpi-forum@lists.mpi-forum.org <mailto:mpi-forum@lists.mpi-forum.org>> 
>>>> wrote:
>>>> 
>>>>  Folks,
>>>> 
>>>> Does someone remember why we disallowed users from calling 
>>>> MPI_Request_free on nonblocking collective requests?  I remember the 
>>>> reasoning for not allowing cancel (i.e., the operation might have 
>>>> completed on some processes, but not all), but not for Request_free.  
>>>> AFAICT, allowing the users to free the request doesn’t make any difference 
>>>> to the MPI library.  The MPI library would simply maintain its own 
>>>> refcount to the request and continue forward till the operation completes. 
>>>>  One of our users would like to free NBC requests so they don’t have to 
>>>> wait for the operation to complete in some situations.
>>>> 
>>>> Unfortunately, when I added the Rput/Rget operations in the RMA chapter, I 
>>>> copy-pasted that text into RMA as well without thinking too hard about it. 
>>>>  My bad!  Either the RMA committee missed it too, or they thought of a 
>>>> reason that I can’t think of now.
>>>> 
>>>> Can someone clarify or remind me what the reason was?
>>>> 
>>>> Regards,
>>>> 
>>>>   — Pavan
>>>> 
>>>> MPI-3.1 standard, page 197, lines 26-27:
>>>> 
>>>> “It is erroneous to call MPI_REQUEST_FREE or MPI_CANCEL for a request 
>>>> associated with a nonblocking collective operation.”
>>>> 
>>>> _______________________________________________
>>>> mpi-forum mailing list
>>>> mpi-forum@lists.mpi-forum.org <mailto:mpi-forum@lists.mpi-forum.org>
>>>> https://lists.mpi-forum.org/mailman/listinfo/mpi-forum 
>>>> <https://lists.mpi-forum.org/mailman/listinfo/mpi-forum>
>>> _______________________________________________
>>> mpi-forum mailing list
>>> mpi-forum@lists.mpi-forum.org <mailto:mpi-forum@lists.mpi-forum.org>
>>> https://lists.mpi-forum.org/mailman/listinfo/mpi-forum 
>>> <https://lists.mpi-forum.org/mailman/listinfo/mpi-forum>
>> _______________________________________________
>> mpi-forum mailing list
>> mpi-forum@lists.mpi-forum.org <mailto:mpi-forum@lists.mpi-forum.org>
>> https://lists.mpi-forum.org/mailman/listinfo/mpi-forum 
>> <https://lists.mpi-forum.org/mailman/listinfo/mpi-forum>
> 
> _______________________________________________
> mpi-forum mailing list
> mpi-forum@lists.mpi-forum.org
> https://lists.mpi-forum.org/mailman/listinfo/mpi-forum

_______________________________________________
mpi-forum mailing list
mpi-forum@lists.mpi-forum.org
https://lists.mpi-forum.org/mailman/listinfo/mpi-forum

Re: [Mpi-forum] MPI_Request_free restrictions

Reply via email to