Hi Andre,
I've tried to work out a (rather old-style) program that structurally resembles your testcase, but using plain MPI, with mpi_put for 1-sided communication as you suggested, and avoiding associate to avoid potential issues with buffers and asynchronous communication (see another recent PR), or aliasing. It works with any number of processes that is a squared integer. One can play with it and see that it works only if one uses the right communicators etc. See attached.
Can you see how that maps to the ways you want to use teams? If it is possible to have a testcase that works with at least one other compiler, that would be a real benefit.
Cheers,
Harald
Gesendet: Montag, 28. Juli 2025 um 00:05
Von: "Andre Vehreschild" <ve...@gmx.de>
An: "Harald Anlauf" <anl...@gmx.de>, morin-mik...@orange.fr
CC: fortran@gcc.gnu.org
Betreff: Re: Aw: Re: Add: [Bug fortran/121043] [16 Regression] Tests of OpenCoarray fail to pass, works on 15.1.1 20250712
Hi Harald,
Am 27. Juli 2025 22:25:40 schrieb Harald Anlauf <anl...@gmx.de>:
<Snipp>
Harald, are you still unconvinced? Do we need to discuss the behavior ofthe testcase test_teams_1? or something else?I will not make any specific suggestions about the actual implementationof coarrays in gfortran.Let me phrase the following requirements:(a) Assuming that the following code is standard-conforming:program pimplicit noneinteger :: img, datainteger :: cnt[*] = 0integer, allocatable :: res(:)[:]img = this_image()data = "" * img + 10 ! Something different on each imageallocate(res(num_images())[*], source=-1)select case (img)case (1)res(img )[1] = datacase (2)res(this_image ())[1] = datacase (3)res(get_val_pure ())[1] = datacase defaultres(get_val_impure())[1] = dataend selectsync allif (this_image() == 1) print *, rescontainspure integer function get_val_pure() result (get_val)get_val = imgend functionimpure integer function get_val_impure() result (get_val)get_val = imgcnt = cnt + 1 ! Count invocations of get_valend functionend programWould it work after the fix? (It does work with ifx and NAG).
Yes, I see no issue why it should not work after the fix. But please note, that my latest fix is to evaluate *every* function on the calling site! There is no check for pure or elemental or any such thing anymore.
I have serious doubts about the pure/elemental checks in the patch.
I got rid of them. Makes the patch far more easy!
Regarding the comparison between teams and coarrays and MPI concepts, Ithink it's difficult to map between them because fortran doesn't defineany communication, it just makes the coarrays world-accessible betweenimages without any more details. With MPI on the contrary communicationis explicit. So there are probably many MPI concepts that just don'texist with coarrays.(b) I would assume that coarrays can be fully implemented using MPI as theunderlying layer for the communication also across distributed nodes. Isthis true, or am I missing anything?
No, that is true. One can use MPI_get and MPI_put, the one-sided communication methods for res = caf[...] and caf[...] = something, respectively.
Or you can use MPI_receive and MPI_send, but then the other image has to have some way of knowing what to send and where to receive to. This is what is implemented in OpenCoarrays now. This is what the accessors are for.
MPI has a concept of communicators, and if I understand Andre right,caf has "accessors".
Ah, here is where my explanation was flawed. MPI's comminucators correspond to Fortran's teams. The caf accessors are used in the execution the coindexed expressions. Caf accessors know nothing about teams, they execute the assignment or retrieval of data on the remote image. In your example above there would be a caf accessor for each res(...) = data _expression_. Each one doing something like:
res(add_data.img) = data
where both add_data and data are transferred to the remote image.
In your example above teams do not play a role. Everything is running on the initial_team which corresponds to MPI_COMM_WORLD.
If they correspond in some way, can someonewrite the equivalent of test_teams_1 as a valid MPI program? It is notobvious to me (this may be my fault).
Well, that will take me some time, but it will be essentially something like this (pseudo code; doing this on a mobile; it keeps correcting my spelling):
Program test_teams_mpi
Integer, allocatable :: caf(3,3)
Type(MPI_Comm) :: row_comm, col_comm
! Here MPI_Win is needed and it will clutter the code even more. So I just left it out. In your mind just associate a MPI_Win to caf, row and cell, please.
Call MPI_Alloc_mem_shared(caf, sizeof(caf))
Caf = reshape(3,3, (/(I, I = 1, 9)/))
Call MPI_create_communicator(row_comm, mod(this_image(), 3) +1, MPI_COMM_WOLRD)
Associate(me => mod(this_image(), 3) +1)
Associate (row => caf (me, :), team_no => int(this_image() / 3) +1)
Call MPI_create_commincator(col_comm, me, row_comm) ! Well, not really needed, just for demonstration.
Associate (cell => row(me), team_no => me)
Cell = team_no
If (me /= 1) then
Call MPI_Put(row(me), cell, 1, row_comm)
Endif
End associate
If (me /= 1) then
Call MPI_put(caf(team_no, :), row, 1, MPI_CAF_COMM_WORLD)
End if
End associate
End associate
End program
Does this give you an idea, Harald? If not, I will debug the program and give you the working example, but as said it will take me some time. The above is just written down w/o ever having seen a compiler.
I am happy to leave the discussion to you, so if you, Andre and Mikael,come to a conclusion that resolves the above, just do it.
Well, that is very kind of you, but Mikael is funded from the same funding, that I was paid of. We agreed that while we are paid from the same source, we will not Okay each others work to prevent any smells.
So, I am positive that you now got what I am trying to do and I am convinced that when you look at the latest version of my patch, attached to this mail:
You'll see, that you convinced me of my wrongs a long time ago. Please take a look, because that patch is correcting what you criticized from the beginning.
Regards,
Andre
program p use mpi implicit none integer :: comm_world, comm_col, win_world, win_col integer :: rank_world, size_world, rank_col, size_col integer :: i, ierr, icol, jcol, n integer(MPI_ADDRESS_KIND) :: disp, winsize integer, allocatable, target :: data(:), col_data(:) integer, pointer, contiguous :: parr(:,:)
! initialize call mpi_init (ierr) if (ierr /= MPI_SUCCESS) error stop 1 comm_world = MPI_COMM_WORLD call mpi_comm_size (comm_world, size_world, ierr) call mpi_comm_rank (comm_world, rank_world, ierr) if (rank_world == 0) write(0,*) "Total number of processes:", size_world ! partition into n*n processes (n columns) n = nint (sqrt (real (size_world))) if (n*n /= size_world) error stop "num pes must be a squared integer" icol = rank_world / n jcol = mod (rank_world, n) ! create communicator for n groups of n processes (columns) call mpi_comm_split (comm_world, icol, jcol, comm_col, ierr) call mpi_comm_size (comm_col, size_col, ierr) call mpi_comm_rank (comm_col, rank_col, ierr) if (size_col /= n) error stop 2 if (rank_col /= jcol) error stop 3 ! prepare the "real work" allocate (data(size_world), source=-1) parr(0:n-1,0:n-1) => data ! Let each process does its work... do i = 0, size_world-1 call mpi_barrier (comm_world, ierr) ! for ordered output if (rank_world == i) write(0,*) "rank", i,': rank_col=', jcol end do parr(rank_col,icol) = (rank_col+1) + 100*(icol+1) ! the "result" ! ...and gather the results: ! (1) column-wise using an auxiliary array for each column ! create MPI window for 1-sided communication if (rank_col == 0) then allocate (col_data(0:n-1), source=-42) winsize = n * sizeof(1) else allocate (col_data(0)) winsize = 0 end if call mpi_barrier (comm_world, ierr) call mpi_win_create (col_data, winsize, sizeof(1), & MPI_INFO_NULL, comm_col, win_col, ierr) call mpi_win_fence (0, win_col, ierr) disp = rank_col call mpi_put (parr(rank_col, icol), 1, MPI_INTEGER, & 0, disp, 1, MPI_INTEGER, win_col, ierr) call mpi_win_fence (0, win_col, ierr) call mpi_win_free (win_col, ierr) ! (2) gather columns: create MPI window for 1-sided communication if (rank_world == 0) then winsize = size(parr) * sizeof(1) else winsize = 0 end if call mpi_win_create (parr, winsize, sizeof(1), & MPI_INFO_NULL, comm_world, win_world, ierr) call mpi_win_fence (0, win_world, ierr) if (rank_col == 0) then disp = icol*n call mpi_put (col_data, size(col_data), MPI_INTEGER, & 0, disp, size(col_data), MPI_INTEGER, win_world, ierr) end if call mpi_win_fence (0, win_world, ierr) call mpi_win_free (win_world, ierr) deallocate (col_data) ! show results on processor 0: if (rank_world == 0) then write(0,*) "data=", data end if call mpi_finalize (ierr) end program p