On Fri, Apr 15, 2011 at 12:02, Tobias Burnus <bur...@net-b.de> wrote:
> Dear all,
>
> I have question how one can and should tell the middle end about
> asynchonous/single-sided memory access; the goal is to produce fast but
> race-free code. All the following is about Fortran 2003 (asynchronous) and
> Fortran 2008 (coarrays), but the problem itself should occur with all(?)
> supported languages. It definitely occurs when using C with MPI or using C
> with the asynchronous I/O functions, though I do not know in how far one
> currently relys on luck.
>
> There are two issues, which need to be solved by informing the middle end -
> either by variable attributes or by inserted function calls:
> - Prohibiting some code movements
> - Making assumptions about the memory content
>
> a) ASYNCHRONOUS attribute and asynchronous I/O
>
> Fortran allows asynchronous I/O, which means for the programmer that between
> initiating the asynchronous reading/writing and the finishing read/write,
> the variable may not be accessed (for READ) or not be changed (for WRITE).
> The compiler needs to make sure that it does not move code such that this
> constraint is violated. All variables involved in asynchronous operations
> are marked as ASYNCHRONOUS.
>
> Thus, for asynchronous operations, code movements involving opaque function
> calls should not happen - but contrary to VOLATILE, there is no need to take
> the value all time from the memory if it is still in the register.
>
> Example:
>
>   integer, ASYNCHRONOUS :: async_int
>
>   WRITE (unit, ASYNCHRONOUS='yes') async_int
>   ! ...
>   WAIT (unit)
>   a = async_int
>   do i = 1, 10
>     b(i) = async_int + 1
>   end do
>
> Here, "a = async_int" may not be moved before the WAIT line. However,
> contrary to VOLATILE,  one can move the "async_int + 1" before the loop and
> use the value from the registry in the loop. Note additionally that the
> initiation of an asynchronous operation (WRITE statement above) is known at
> compile time; however, it is not known when it ends - the
> WAIT can be in a different translation unit. See also PR 25829.
>
> The Fortran 2008 standard is not very explicit about the ASYNCHRONOUS
> attribute itself; it simply states that it is for asynchronous I/O.
> (However, it describes then how async I/O works,
> including WAIT, INQUIRE, and what a programmer may do until the async I/O is
> finished.) The closed to an ASYNCHRONOUS definition is the non-normative
> note 5.4 of Fortran 2008:
>
> "The ASYNCHRONOUS attribute specifies the variables that might be associated
> with a pending input/output storage sequence (the actual memory locations on
> which asynchronous input/output is being performed) while the scoping unit
> is in execution. This information could be used by the compiler to disable
> certain code motion optimizations."
>
> Seemingly intended, but not that clear in the F2003/F2008 standard, is to
> allow for asynchronous user operations; this will presumbly refined in TR
> 29113 which is currently being drafted - and/or in an interpretation
> request. The main requestee for this feature is the MPI Forum, which works
> on MPI3. In any case the following should work analogously and "buf" should
> not be moved before the "MPI_Wait" line:
>
>  CALL MPI_Irecv(buf, rq)
>  CALL MPI_Wait(rq)
>  xnew=buf
>
> Hereby, "buf" and (maybe?) the first dummy argument of MPI_Irecv have the
> ASYNCHRONOUS attribute.
>
> My question is now: How to properly tell this the middle end?
>
> VOLATILE seems to be wrong as it prevents way too many optimizations and I
> think it does not completely prevent code moving. Using a call to some
> built-in function does not work as in principle the end of an asynchronous
> operation is not known. It could end with a WAIT - possibly also wrapped in
> a function, which is in a different translation unit - or also with an
> INQUIRE(..., PENDING=aio_pending) if "aio_pending" gets assigned a .false.
>
> (Frankly, I am not 100% sure about the exact semantics of ASYNCHRONOUS; I
> think might be implemented by preventing all code movements which involve
> swapping an ASYNCHRONOUS variable with a function call, which is not pure.
> Otherwise, in terms of the variable value, it acts like a normal variable,
> i.e. if one does: "a = 7" and does not set "a" afterwards (assignment or via
> function calls), it remains 7. The changing of the variable is explicit -
> even if it only becomes effective with some delay.)

A compiler memory barrier in GCC is

asm volatile("" ::: "memory");

That being said, does the middle end really move loads and stores past
function calls? If not, a call is effectively also a compiler memory
barrier, no?

> B) COARRAYS
>
> The memory model of coarrays is that all memory is private to the image -
> except for coarrays. Coarrays exists on all images. For "integer ::
> coarray(:)[*]", local accesses are "coarray = ..." or "coarray(4) = ..."
> while remote accesses are "coarray(:)[7] = ..." or "a = coarray(3)[2]",
> where the data is set on image 7 or pulled from image 2.
>
> Let's start directly with an example:
>
>   module m
>     integer, save :: caf_int[*]  ! Global variable
>   end module m
>
>   subroutine foo()
>     use m
>     caf_int = 7  ! Set local variable to 7 (effectively: on image 1 only)
>     SYNC ALL ! Memory barrier/fence
>     SYNC ALL
>     ! caf_int should now be 8, cf. below; thus the following if shall
>     ! neither optimized way not be executed at run time.
>     if (caf_int == 7) call abort()
>   end subroutine foo
>
>   subroutine bar()
>     use m
>     SYNC ALL
>     caf_int[1] = 8 ! Set variable on image 1 to 8
>     SYNC ALL
>   end subroutine bar
>
>   program caf_example
>     if (this_image() == 1) CALL foo()
>     if (this_image() == 2) CALL bar()
>   end program caf_example
>
> Notes:
> - The coarray "caf_int" will be registered in the communication library at
> startup of the main program.
> - For image 1 one always accesses "caf_int" in local memory. The variable
> also does not alias with anything - except that the value might change via
> single-sided communication.
>
> Thus: SYNC ALL acts as memory fence - for coarrays only. In principle, all
> other variables might be moved across the fence. Besides preventing code
> moves, the value of the variable cannot be assumed to be the same as before
> the fence. I think a simple call to "__sync_synchronize()" (alias
> BUILT_IN_SYNCHRONIZE) should take care of this, but I want to
> confirm that it indeed does so. I assume I can still keep all coarrays as
> restricted pointers - even though there is single-sided communication.
>
> Q1: Is __sync_synchronize() sufficient?

I don't think this is correct. __sync_synchronize() just issues a
hardware memory fence instruction. That is, it prevents loads and
stores from moving past the fence *on the processor that executes the
fence instruction*. There is no synchronization with other
processors. SYNC ALL, OTOH is a full barrier; the implementation must
be something like a counter that every co-image increments atomically,
and then waits (blocking or spinning) until the counter equals the
number of co-images.

> Q2: Can this be optimized in some way?

Probably not. For general issues with the shared-memory model, perhaps
shared memory Co-arrays can piggyback on the work being done for the
C++0x memory model, see

http://gcc.gnu.org/wiki/Atomic/GCCMM

Hans Boehm maintains a collection of links to various papers on the topic at

http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/


-- 
Janne Blomqvist

Reply via email to