Hi Andre,
 
I've tried to work out a (rather old-style) program that structurally resembles your testcase, but using plain MPI, with mpi_put for 1-sided communication as you suggested, and avoiding associate to avoid potential issues with buffers and asynchronous communication (see another recent PR), or aliasing.  It works with any number of processes that is a squared integer.  One can play with it and see that it works only if one uses the right communicators etc.  See attached.
 
Can you see how that maps to the ways you want to use teams?  If it is possible to have a testcase that works with at least one other compiler, that would be a real benefit.
 
Cheers,
Harald
 
Gesendet: Montag, 28. Juli 2025 um 00:05
Von: "Andre Vehreschild" <ve...@gmx.de>
An: "Harald Anlauf" <anl...@gmx.de>, morin-mik...@orange.fr
CC: fortran@gcc.gnu.org
Betreff: Re: Aw: Re: Add: [Bug fortran/121043] [16 Regression] Tests of OpenCoarray fail to pass, works on 15.1.1 20250712
Hi Harald,
 
Am 27. Juli 2025 22:25:40 schrieb Harald Anlauf <anl...@gmx.de>:
<Snipp>
 
 
Harald, are you still unconvinced? Do we need to discuss the behavior of 
the testcase test_teams_1? or something else?
 
I will not make any specific suggestions about the actual implementation
of coarrays in gfortran.
 
Let me phrase the following requirements:
 
(a) Assuming that the following code is standard-conforming:
 
program p
  implicit none
  integer :: img, data
  integer :: cnt[*] = 0
  integer, allocatable :: res(:)[:]
  img = this_image()
  data = "" * img + 10  ! Something different on each image
  allocate(res(num_images())[*], source=-1)
  select case (img)
  case (1)
     res(img             )[1] = data
  case (2)
     res(this_image    ())[1] = data
  case (3)
     res(get_val_pure  ())[1] = data
  case default
     res(get_val_impure())[1] = data
  end select
  sync all
  if (this_image() == 1) print *, res
contains
  pure integer function get_val_pure() result (get_val)
    get_val = img
  end function
  impure integer function get_val_impure() result (get_val)
    get_val = img
    cnt = cnt + 1 ! Count invocations of get_val
  end function
end program
 
Would it work after the fix?  (It does work with ifx and NAG).
 
Yes, I see no issue why it should not work after the fix. But please note, that my latest fix is to evaluate *every* function on the calling site! There is no check for pure or elemental or any such thing anymore. 
 
 
I have serious doubts about the pure/elemental checks in the patch.
 
I got rid of them. Makes the patch far more easy!
 
 
Regarding the comparison between teams and coarrays and MPI concepts, I 
think it's difficult to map between them because fortran doesn't define
any communication, it just makes the coarrays world-accessible between 
images without any more details.  With MPI on the contrary communication 
is explicit.  So there are probably many MPI concepts that just don't 
exist with coarrays.
 
(b) I would assume that coarrays can be fully implemented using MPI as the
underlying layer for the communication also across distributed nodes. Is
this true, or am I missing anything?
 
No, that is true. One can use MPI_get and MPI_put, the one-sided communication methods for res = caf[...] and caf[...] = something, respectively.
 
Or you can use MPI_receive and MPI_send, but then the other image has to have some way of knowing what to send and where to receive to. This is what is implemented in OpenCoarrays now. This is what the accessors are for.
 
 
MPI has a concept of communicators, and if I understand Andre right,
caf has "accessors".
 
Ah, here is where my explanation was flawed. MPI's comminucators correspond to Fortran's teams. The caf accessors are used in the execution the coindexed expressions. Caf accessors know nothing about teams, they execute the assignment or retrieval of data on the remote image. In your example above there would be a caf accessor for each res(...) = data _expression_. Each one doing something like:
 
res(add_data.img) = data
 
where both add_data and data are transferred to the remote image.
 
In your example above teams do not play a role. Everything is running on the initial_team which corresponds to MPI_COMM_WORLD.
 
 If they correspond in some way, can someone
write the equivalent of test_teams_1 as a valid MPI program?  It is not
obvious to me (this may be my fault).
 
Well, that will take me some time, but it will be essentially something like this (pseudo code; doing this on a mobile; it keeps correcting my spelling):
 
Program test_teams_mpi
 
Integer, allocatable :: caf(3,3)
 
Type(MPI_Comm) :: row_comm, col_comm
 
! Here MPI_Win is needed and it will clutter the code even more. So I just left it out. In your mind just associate a MPI_Win to caf, row and cell, please.
Call MPI_Alloc_mem_shared(caf, sizeof(caf))
 
Caf = reshape(3,3, (/(I, I = 1, 9)/))
 
Call MPI_create_communicator(row_comm, mod(this_image(), 3) +1, MPI_COMM_WOLRD)
Associate(me => mod(this_image(), 3) +1)
Associate (row => caf (me, :), team_no => int(this_image() / 3) +1)
Call MPI_create_commincator(col_comm, me, row_comm) ! Well, not really needed, just for demonstration.
Associate (cell => row(me), team_no => me)
Cell = team_no
If (me /= 1) then 
  Call MPI_Put(row(me), cell, 1, row_comm)
Endif
End associate
If (me /= 1) then
  Call MPI_put(caf(team_no, :), row, 1, MPI_CAF_COMM_WORLD)
End if
End associate
End associate
 
End program
 
Does this give you an idea, Harald? If not, I will debug the program and give you the working example, but as said it will take me some time. The above is just written down w/o ever having seen a compiler.
 
 
I am happy to leave the discussion to you, so if you, Andre and Mikael,
come to a conclusion that resolves the above, just do it.
 
Well, that is very kind of you, but Mikael is funded from the same funding, that I was paid of. We agreed that while we are paid from the same source, we will not Okay each others work to prevent any smells.
 
So, I am positive that you now got what I am trying to do and I am convinced that when you look at the latest version of my patch, attached to this mail: 
 
 
You'll see, that you convinced me of my wrongs a long time ago. Please take a look, because that patch is correcting what you criticized from the beginning.
 
Regards,
Andre
 
program p
  use mpi
  implicit none
  integer :: comm_world, comm_col, win_world, win_col
  integer :: rank_world, size_world, rank_col, size_col
  integer :: i, ierr, icol, jcol, n
  integer(MPI_ADDRESS_KIND) :: disp, winsize
  integer, allocatable, target :: data(:), col_data(:)
  integer, pointer, contiguous :: parr(:,:)

  ! initialize
  call mpi_init (ierr)
  if (ierr /= MPI_SUCCESS) error stop 1
  comm_world = MPI_COMM_WORLD
  call mpi_comm_size (comm_world, size_world, ierr)
  call mpi_comm_rank (comm_world, rank_world, ierr)
  if (rank_world == 0) write(0,*) "Total number of processes:", size_world

  ! partition into n*n processes (n columns)
  n = nint (sqrt (real (size_world)))
  if (n*n /= size_world) error stop "num pes must be a squared integer"
  icol =      rank_world / n
  jcol = mod (rank_world, n)

  ! create communicator for n groups of n processes (columns)
  call mpi_comm_split (comm_world, icol, jcol, comm_col, ierr)
  call mpi_comm_size (comm_col, size_col, ierr)
  call mpi_comm_rank (comm_col, rank_col, ierr)
  if (size_col /= n)    error stop 2
  if (rank_col /= jcol) error stop 3

  ! prepare the "real work"
  allocate (data(size_world), source=-1)
  parr(0:n-1,0:n-1) => data

  ! Let each process does its work...
  do i = 0, size_world-1
     call mpi_barrier (comm_world, ierr)  ! for ordered output
     if (rank_world == i) write(0,*) "rank", i,': rank_col=', jcol
  end do
  parr(rank_col,icol) = (rank_col+1) + 100*(icol+1)  ! the "result"

  ! ...and gather the results:
  ! (1) column-wise using an auxiliary array for each column
  !     create MPI window for 1-sided communication
  if (rank_col == 0) then
     allocate (col_data(0:n-1), source=-42)
     winsize = n * sizeof(1)
  else
     allocate (col_data(0))
     winsize = 0
  end if
  call mpi_barrier (comm_world, ierr)
  call mpi_win_create (col_data, winsize, sizeof(1), &
                       MPI_INFO_NULL, comm_col, win_col, ierr)
  call mpi_win_fence (0, win_col, ierr)
  disp = rank_col
  call mpi_put (parr(rank_col, icol), 1, MPI_INTEGER, &
                0, disp, 1, MPI_INTEGER, win_col, ierr)
  call mpi_win_fence (0, win_col, ierr)
  call mpi_win_free (win_col, ierr)

  ! (2) gather columns: create MPI window for 1-sided communication
  if (rank_world == 0) then
     winsize = size(parr) * sizeof(1)
  else
     winsize = 0
  end if
  call mpi_win_create (parr, winsize, sizeof(1), &
                       MPI_INFO_NULL, comm_world, win_world, ierr)
  call mpi_win_fence (0, win_world, ierr)
  if (rank_col == 0) then
     disp = icol*n
     call mpi_put (col_data, size(col_data), MPI_INTEGER, &
                   0, disp, size(col_data), MPI_INTEGER, win_world, ierr)
  end if
  call mpi_win_fence (0, win_world, ierr)
  call mpi_win_free (win_world, ierr)
  deallocate (col_data)

  ! show results on processor 0:
  if (rank_world == 0) then
     write(0,*) "data=", data
  end if
  call mpi_finalize (ierr)
end program p

Reply via email to