Re: [Mpi-forum] MPI_Request_free restrictions

HOLMES Daniel via mpi-forum Sat, 15 Aug 2020 03:46:57 -0700

Hi Jim,

Consider a simple test that does an MPI_Isend that has no matching recv, frees 
the request, and then calls MPI_Finalize. Does the above text say this should 
work? Or not?


IMHO, this program is erroneous because the “receiver” process does not comply 
with the requirements of MPI_FINALIZE, i.e. it must initiate all MPI calls 
needed to complete its involvement in MPI communications - it must initiate a 
matching receive operation (or the sender must cancel their send).

The actual behaviour is undefined - MPI might raise an error (if it notices), 
it might hang in MPI_FINALIZE at the sender process (e.g. because it has a 
large send buffer that it is waiting for a receiver to drain before it releases 
it), or it may seem to complete successfully (e.g. if the message is small, was 
sent eagerly, and the “receiver” doesn’t look at its unexpected message queue 
because it has no reason to do so). This ambiguity doesn’t matter because the 
program is erroneous - it can do anything - be happy if it doesn’t set the data 
centre on fire.

Cheers,
Dan.
—
Dr Daniel Holmes PhD
Architect (HPC Research)
d.hol...@epcc.ed.ac.uk<mailto:d.hol...@epcc.ed.ac.uk>
Phone: +44 (0) 131 651 3465
Mobile: +44 (0) 7940 524 088
Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh, EH8 9BT
—
The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336.
—

On 14 Aug 2020, at 18:28, Jim Dinan 
<james.di...@gmail.com<mailto:james.di...@gmail.com>> wrote:

Sorry, we seem to have lost the mailing list for the last couple messages below 
(my fault).

The text on MPI_FINALIZE does not mandate “no pending communication”, it 
requires “all MPI calls needed to complete its involvement …"
"Before an MPI process invokes MPI_FINALIZE, the process must perform all MPI 
calls needed to complete its involvement in MPI communications associated with 
the World Model. It must locally complete all MPI operations that it initiated 
and must execute matching calls needed to complete MPI communications initiated 
by other processes. For example, if the process executed a nonblocking send, it 
must eventually call MPI_WAIT, MPI_TEST, MPI_REQUEST_FREE, or any derived 
function” §10.2.2 in MPI-4.0

Consider a simple test that does an MPI_Isend that has no matching recv, frees 
the request, and then calls MPI_Finalize.

Does the above text say this should work? Or not?

 ~Jim.

On Fri, Aug 14, 2020 at 9:28 AM HOLMES Daniel 
<d.hol...@epcc.ed.ac.uk<mailto:d.hol...@epcc.ed.ac.uk>> wrote:
Hi Jim,

If the user releases their reference, the MPI library will need to add this 
handle to some internal data structure. IIRC, never requiring MPI to do this 
was a design guideline for MPI 3.0.

This is, I guess, the design choice that supports the current prohibition in 
the RMA chapter, i.e. calling MPI_REQUEST_FREE for a request-based RMA 
operation is erroneous. It’s a small overhead, but there is no trade-off 
(AFAIK) that could mitigate/outweigh it.

Freeing an active request seems like it would leak application memory. For 
example, if you free an active send/recv request, how can the user safely 
access the send/recv buffer?

This is the reason that freeing an active point-to-point request is discouraged 
in the MPI Standard (and should, IMHO, be prohibited).
“It is preferable, in general, to free requests when they are inactive.” §3.9

Arguments like “but I can discover remote completion of the operation” do not 
provide a guarantee of local completion and/or freeing of local resources. That 
issue is mentioned in the MPI Standard to justify the discouragement, but it 
could equally well justify a strict prohibition.
“Active receive requests should not be freed. Otherwise, it will not be 
possible to check that the receive has completed.” §3.9

The MPI Forum is unlikely to vote for upgrading the discouragement to a 
prohibition for point-to-point (because back-compat, sigh).

Is it effectively leaked (i.e. never returned back to the user by the MPI 
library)?

It is effectively leaked until MPI_FINALIZE returns.

And how will the user meet the no-pending-communication requirement of 
MPI_Finalize?

The text on MPI_FINALIZE does not mandate “no pending communication”, it 
requires “all MPI calls needed to complete its involvement …"
"Before an MPI process invokes MPI_FINALIZE, the process must perform all MPI 
calls needed to complete its involvement in MPI communications associated with 
the World Model. It must locally complete all MPI operations that it initiated 
and must execute matching calls needed to complete MPI communications initiated 
by other processes. For example, if the process executed a nonblocking send, it 
must eventually call MPI_WAIT, MPI_TEST, MPI_REQUEST_FREE, or any derived 
function” §10.2.2 in MPI-4.0

The "execute matching calls needed to complete MPI communications initiated by 
other processes” bit is easy - just initiate (meaning MPI_Isend/MPI_Irecv or 
MPI_START) the matching point-to-point MPI procedure at the other MPI process. 
The progress rule in §3.5 guarantees that “If a pair of matching send and 
receives have been initiated then at least one of these two operations will 
complete, independently of other actions in the system” and “[each] will 
complete, unless the [other] is satisfied by another message.” So, in a correct 
MPI program, where all sends have a matching receive and vice versa, all those 
point-to-point communication operations will complete (eventually, possibly 
during MPI_FINALIZE).

The “locally complete” bit is what you’re really asking about. Of course, 
strictly, MPI_REQUEST_FREE does not “locally complete” and so it should not be 
relevant in this pre-finalise instruction; it is listed here precisely because 
of the historical exception permitting freeing of active point-to-point 
requests. Thus, “MPI_ISEND, MPI_REQUEST_FREE, MPI_FINALIZE” is an explicitly 
allowed exception, even though it would otherwise breach the rule.

Cheers,
Dan.
—
Dr Daniel Holmes PhD
Architect (HPC Research)
d.hol...@epcc.ed.ac.uk<mailto:d.hol...@epcc.ed.ac.uk>
Phone: +44 (0) 131 651 3465
Mobile: +44 (0) 7940 524 088
Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh, EH8 9BT
—
The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336.
—

On 13 Aug 2020, at 20:17, Jim Dinan 
<james.di...@gmail.com<mailto:james.di...@gmail.com>> wrote:

Sorry, I got my wires crossed there. Apply what I wrote to MPI_Request_free on 
an active request.

Assume that the MPI library allocates space on the heap for the internal 
request object and returns a handle (e.g. pointer) through the MPI_Request 
object. The user is required to hang onto this handle and wait/test on it in 
the future, so MPI doesn't need to hold a reference. If the user releases their 
reference, the MPI library will need to add this handle do some internal data 
structure. IIRC, never requiring MPI do this was a design guideline for MPI 3.0.

But also, freeing an active request seems like it would leak application 
memory. For example, if you free an active send/recv request, how can the user 
safely access the send/recv buffer? Is it effectively leaked (i.e. never 
returned back to the user by the MPI library)? And how will the user meet the 
no-pending-communication requirement of MPI_Finalize?

 ~Jim.

On Thu, Aug 13, 2020 at 10:07 AM HOLMES Daniel 
<d.hol...@epcc.ed.ac.uk<mailto:d.hol...@epcc.ed.ac.uk>> wrote:
Hi Jim,

To be clear, I think that MPI_CANCEL is evil and should be removed from the MPI 
Standard entirely at the earliest convenience.

I am certainly not arguing that it be permitted for more MPI operations.

I thought the discussion was focused on MPI_REQUEST_FREE and whether or not it 
can/should be used on an active request.

If a particular MPI implementation does not keep a reference to the request 
between MPI_RPUT and MPI_REQUEST_FREE, but needs that reference to process the 
completion event, then that MPI implementation would be required to keep a 
reference to the request from MPI_REQUEST_FREE until that important task had 
been done, perhaps until the close epoch call. This requires no new memory 
because the user is giving up their reference to the request, so MPI can safely 
use the request it is passed in MPI_REQUEST_FREE without copying it. As you 
say, MPI takes over the responsibility for processing the completion event.

Your question about why the implementation should be required to take on this 
complexity is a good one. That, I guess, is why freeing any active request is a 
bad idea. MPI is required to differentiate completion of individual operations 
(so it can implement MPI_WAIT) but that means something must process completion 
at some point for each individual operation. In RMA, that responsibility can be 
discharged earlier than in other parts of the MPI interface, but the real 
question is “why should MPI offer to take on this responsibility in the first 
place?”

Thanks, that helps (me at least).

Cheers,
Dan.
—
Dr Daniel Holmes PhD
Architect (HPC Research)
d.hol...@epcc.ed.ac.uk<mailto:d.hol...@epcc.ed.ac.uk>
Phone: +44 (0) 131 651 3465
Mobile: +44 (0) 7940 524 088
Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh, EH8 9BT
—
The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336.
—

On 13 Aug 2020, at 14:43, Jim Dinan 
<james.di...@gmail.com<mailto:james.di...@gmail.com>> wrote:

The two cases you mentioned would have the same behavior at an application 
level. However, there may be important differences in the implementation of 
each operation. For example, an MPI_Put operation may be configured to not 
generate a completion event, whereas an MPI_Rput would. The library may be 
relying on the user to make a call on the request to process the event and 
clean up resources. The implementation can take over this responsibility if the 
user cancels the request, but why should we ask implementers to take on this 
complexity and overhead?

My $0.02 is that MPI_Cancel is subtle and complicated, and we should be very 
careful about where we allow it. I don't see the benefit to the programming 
model outweighing the complexity and overhead in the MPI runtime for the case 
of MPI_Rput. I also don't know that we were careful enough in specifying the 
RMA memory model that a canceled request-based RMA operation will still have 
well-defined behavior. My understanding is that MPI_Cancel is required 
primarily for canceling receive requests to meet MPI's quiescent shutdown 
requirement.

 ~Jim.

On Thu, Aug 13, 2020 at 8:11 AM HOLMES Daniel via mpi-forum 
<mpi-forum@lists.mpi-forum.org<mailto:mpi-forum@lists.mpi-forum.org>> wrote:
Hi all,

To increase my own understanding of RMA, what is the difference (if any) 
between a request-based RMA operation where the request is freed without being 
completed and before the epoch is closed and a “normal” RMA operation?

MPI_LOCK() ! or any other "open epoch at origin" procedure call
doUserWorkBefore()
MPI_RPUT(&req)
MPI_REQUEST_FREE(&req)
doUserWorkAfter()
MPI_UNLOCK() ! or the matching “close epoch at origin" procedure call

vs:

MPI_LOCK() ! or any other "open epoch at origin" procedure call
doUserWorkBefore()
MPI_PUT()
doUserWorkAfter()
MPI_UNLOCK() ! or the matching “close epoch at origin" procedure call

Is this a source-to-source translation that is always safe in either direction?

In RMA, in contrast to the rest of MPI, there are two opportunities for MPI to 
“block” and do non-local work to complete an RMA operation: 1) during MPI_WAIT 
for the request (if any - the user may not be given a request or the user may 
choose to free the request without calling MPI_WAIT or the user might call 
nonblocking MPI_TEST) and 2) during the close epoch procedure, which is always 
permitted to be sufficiently non-local to guarantee that the RMA operation is 
complete and its freeing stage has been done. It seems that a request-based RMA 
operation becomes identical to a “normal” RMA operation if the user calls 
MPI_REQUEST_FREE on the request. This is like “freeing" the request from a 
nonblocking point-to-point operation but without the guarantee of a later 
synchronisation procedure that can actually complete the operation and actually 
do the freeing stage of the operation.

In collectives, there is no “ensure all operations so far are now done” 
procedure call because there is no concept of epoch for collectives.
In point-to-point, there is no “ensure all operations so far are now done” 
procedure call because there is no concept of epoch for point-to-point.
In file operations, there is no “ensure all operations so far are now done” 
procedure call because there is no concept of epoch for file operations. (There 
is MPI_FILE_SYNC but it is optional so MPI cannot rely on it being called.)
In these cases, the only non-local procedure that is guaranteed to happen is 
MPI_FINALIZE, hence all outstanding non-local work needed by the “freed” 
operation might be delayed until that procedure is called.

The issue with copying parameters is also moot because all of them are 
passed-by-value (implicitly copied) or are data-buffers and covered by 
“conflicting accesses” RMA rules.

Thus, to me it seems to me that RMA is a very special case - it could support 
different semantics, but that does not provide a good basis for claiming that 
the rest of the MPI Standard can support those different semantics - unless we 
introduce an epoch concept into the rest of the MPI Standard. This is not 
unreasonable: the notifications in GASPI, for example, guarantee completion of 
not just the operation they are attached to but *all* operations issued in the 
“queue” they represent since the last notification. Their queue concept serves 
the purpose of an epoch. I’m sure there are other examples in other APIs. It 
seems to me likely that the proposal for MPI_PSYNC for partitioned 
communication operations is moving in the direction of an epoch, although 
limited to remote completion of all the partitions in a single operation, which 
accidentally guarantees that the operation can be freed locally using a local 
procedure.

Cheers,
Dan.
—
Dr Daniel Holmes PhD
Architect (HPC Research)
d.hol...@epcc.ed.ac.uk<mailto:d.hol...@epcc.ed.ac.uk>
Phone: +44 (0) 131 651 3465
Mobile: +44 (0) 7940 524 088
Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh, EH8 9BT
—
The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336.
—

On 13 Aug 2020, at 01:40, Skjellum, Anthony via mpi-forum 
<mpi-forum@lists.mpi-forum.org<mailto:mpi-forum@lists.mpi-forum.org>> wrote:

FYI, one argument (also used to force us to add restrictions on MPI persistent 
collective initialization to be blocking)... The MPI_Request_free on an NBC 
poses a problem for the cases where there are array types
posed (e.g., Alltoallv/w)... It will not be knowable to the application if the 
vectors are in use by MPI still after
the  free on an active request.  We do *not* mandate that the MPI 
implementation copy such arrays currently, so they are effectively "held as 
unfreeable" by the MPI implementation till MPI_Finalize.  The user cannot 
deallocate them in a correct program till after MPI_Finalize.

Another effect for NBC of releasing an active request, IMHO,  is that you don't 
know when send buffers are free to be deallocated or receive buffers are free 
to be deallocated... since you don't know when the transfer is complete OR the 
buffers are no longer used by MPI (till after MPI_Finalize).

Tony




Anthony Skjellum, PhD
Professor of Computer Science and Chair of Excellence
Director, SimCenter
University of Tennessee at Chattanooga (UTC)
tony-skjel...@utc.edu<mailto:tony-skjel...@utc.edu>  [or 
skjel...@gmail.com<mailto:skjel...@gmail.com>]
cell: 205-807-4968

________________________________
From: mpi-forum 
<mpi-forum-boun...@lists.mpi-forum.org<mailto:mpi-forum-boun...@lists.mpi-forum.org>>
 on behalf of Jeff Hammond via mpi-forum 
<mpi-forum@lists.mpi-forum.org<mailto:mpi-forum@lists.mpi-forum.org>>
Sent: Saturday, August 8, 2020 12:07 PM
To: Main MPI Forum mailing list 
<mpi-forum@lists.mpi-forum.org<mailto:mpi-forum@lists.mpi-forum.org>>
Cc: Jeff Hammond <jeff.scie...@gmail.com<mailto:jeff.scie...@gmail.com>>
Subject: Re: [Mpi-forum] MPI_Request_free restrictions

We should fix the RMA chapter with an erratum. I care less about NBC but share 
your ignorance of why it was done that way.

Sent from my iPhone

On Aug 8, 2020, at 6:51 AM, Balaji, Pavan via mpi-forum 
<mpi-forum@lists.mpi-forum.org<mailto:mpi-forum@lists.mpi-forum.org>> wrote:

 Folks,

Does someone remember why we disallowed users from calling MPI_Request_free on 
nonblocking collective requests?  I remember the reasoning for not allowing 
cancel (i.e., the operation might have completed on some processes, but not 
all), but not for Request_free.  AFAICT, allowing the users to free the request 
doesn’t make any difference to the MPI library.  The MPI library would simply 
maintain its own refcount to the request and continue forward till the 
operation completes.  One of our users would like to free NBC requests so they 
don’t have to wait for the operation to complete in some situations.

Unfortunately, when I added the Rput/Rget operations in the RMA chapter, I 
copy-pasted that text into RMA as well without thinking too hard about it.  My 
bad!  Either the RMA committee missed it too, or they thought of a reason that 
I can’t think of now.

Can someone clarify or remind me what the reason was?

Regards,

  — Pavan

MPI-3.1 standard, page 197, lines 26-27:

“It is erroneous to call MPI_REQUEST_FREE or MPI_CANCEL for a request 
associated with a nonblocking collective operation.”

_______________________________________________
mpi-forum mailing list
mpi-forum@lists.mpi-forum.org<mailto:mpi-forum@lists.mpi-forum.org>
https://lists.mpi-forum.org/mailman/listinfo/mpi-forum
_______________________________________________
mpi-forum mailing list
mpi-forum@lists.mpi-forum.org<mailto:mpi-forum@lists.mpi-forum.org>
https://lists.mpi-forum.org/mailman/listinfo/mpi-forum

_______________________________________________
mpi-forum mailing list
mpi-forum@lists.mpi-forum.org<mailto:mpi-forum@lists.mpi-forum.org>
https://lists.mpi-forum.org/mailman/listinfo/mpi-forum

_______________________________________________
mpi-forum mailing list
mpi-forum@lists.mpi-forum.org
https://lists.mpi-forum.org/mailman/listinfo/mpi-forum

Re: [Mpi-forum] MPI_Request_free restrictions

Reply via email to