https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95397

--- Comment #7 from Kirill Chilikin <chilikin.k at gmail dot com> ---
Using the following slightly shortened test:

PROGRAM TEST
  USE, INTRINSIC :: ISO_FORTRAN_ENV
  IMPLICIT NONE
  INTEGER, PARAMETER :: N = 32
  REAL(REAL64), DIMENSION(N) :: A, B
  INTEGER I1, I2
  !$ACC PARALLEL COPYOUT(A)
  !$ACC LOOP WORKER PRIVATE(B)
  DO I2 = 1, 1
    !$ACC LOOP VECTOR
    DO I1 = 1, N
      B(I1) = I1
    ENDDO
    !$ACC LOOP VECTOR
    DO I1 = 1, N
      A(I1) = B(I1)
    ENDDO
  ENDDO
  !$ACC END PARALLEL
  PRINT *, A
END PROGRAM

and the current development version, without offloading, tree dump
after the pass "ompexp" for the part related to the second-loop exit
condition looks like

  .offset.13 = .GOACC_LOOP (OFFSET, 1, 32, 1, -1, 0, .chunk_no.12);
  .bound.14 = .GOACC_LOOP (BOUND, 1, 32, 1, -1, 0, .offset.13);
...
  .offset.13 = .offset.13 + .step.10;
  if (.offset.13 < .bound.14)
    goto <bb 21>; [INV]
  else
    goto <bb 22>; [INV]

On device (gfortran -fopenacc -g -o test3 test3.f90 -fdump-tree-all
-save-temps -foffload=nvptx-none="-march=sm_35 -fdump-tree-all), this code
finally gets converted into

  _133 = .GOACC_DIM_POS (2);
  _68 = _133;
  _69 = 32;
...
  # _6 = PHI <_68(13), _79(14)>
  _79 = _6 + _67;
  if (_79 < _69)
    goto <bb 14>; [INV]
  else
    goto <bb 15>; [INV]

The assembly code is

                mov.u32 %r45, %ntid.x;
                mov.u32 %r46, 32;
                mov.u32 %r27, %tid.x;
...
                add.u32 %r27, %r27, %r45;
                setp.lt.s32     %r76, %r27, %r46;
        @%r76   bra     $L7;

The condition is therefore (%tid.x + %ntid.x < 32). Since in this case
%ntid.x == 32 and generally 0 <= %tid.x < %ntid.x, the condition is always
false, and the loop exits after one iteration. This is exactly what the test
shows: only the first output value is correct:

$ ./test3
   1.0000000000000000        0.0000000000000000     (30 more zeros)
For the original test, the problem looks similar. With

gfortran -fopenacc -g -o openacc-github-302-2 openacc-github-302-2_.f90
-fdump-tree-all -save-temps -foffload=nvptx-none="-march=sm_35 -fdump-tree-all"

The resulting "optimized" tree for the subroutine add_ps_routine contains

  <bb 6> :
  .offset.9_28 = .offset.9_5 + .step.6_20;
  if (.offset.9_28 < .bound.10_22)
    goto <bb 4>; [INV]
  else
    goto <bb 7>; [INV]

where .step.6_20

  _37 = .GOACC_DIM_SIZE (2);
  .step.6_20 = _37;

and, in assembly, equals %ntid.x. In this example, the arguments are scalar
but actual addition happens at fifth iteration out of 10:

    do i = 1, n
       if (i .eq. 5) then
          c = a + b
       end if
    end do

But loop exits after the first iteration due to incorrect condition, and
the addition never happens.

Reply via email to