https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95397
--- Comment #7 from Kirill Chilikin <chilikin.k at gmail dot com> ---
Using the following slightly shortened test:
PROGRAM TEST
USE, INTRINSIC :: ISO_FORTRAN_ENV
IMPLICIT NONE
INTEGER, PARAMETER :: N = 32
REAL(REAL64), DIMENSION(N) :: A, B
INTEGER I1, I2
!$ACC PARALLEL COPYOUT(A)
!$ACC LOOP WORKER PRIVATE(B)
DO I2 = 1, 1
!$ACC LOOP VECTOR
DO I1 = 1, N
B(I1) = I1
ENDDO
!$ACC LOOP VECTOR
DO I1 = 1, N
A(I1) = B(I1)
ENDDO
ENDDO
!$ACC END PARALLEL
PRINT *, A
END PROGRAM
and the current development version, without offloading, tree dump
after the pass "ompexp" for the part related to the second-loop exit
condition looks like
.offset.13 = .GOACC_LOOP (OFFSET, 1, 32, 1, -1, 0, .chunk_no.12);
.bound.14 = .GOACC_LOOP (BOUND, 1, 32, 1, -1, 0, .offset.13);
...
.offset.13 = .offset.13 + .step.10;
if (.offset.13 < .bound.14)
goto <bb 21>; [INV]
else
goto <bb 22>; [INV]
On device (gfortran -fopenacc -g -o test3 test3.f90 -fdump-tree-all
-save-temps -foffload=nvptx-none="-march=sm_35 -fdump-tree-all), this code
finally gets converted into
_133 = .GOACC_DIM_POS (2);
_68 = _133;
_69 = 32;
...
# _6 = PHI <_68(13), _79(14)>
_79 = _6 + _67;
if (_79 < _69)
goto <bb 14>; [INV]
else
goto <bb 15>; [INV]
The assembly code is
mov.u32 %r45, %ntid.x;
mov.u32 %r46, 32;
mov.u32 %r27, %tid.x;
...
add.u32 %r27, %r27, %r45;
setp.lt.s32 %r76, %r27, %r46;
@%r76 bra $L7;
The condition is therefore (%tid.x + %ntid.x < 32). Since in this case
%ntid.x == 32 and generally 0 <= %tid.x < %ntid.x, the condition is always
false, and the loop exits after one iteration. This is exactly what the test
shows: only the first output value is correct:
$ ./test3
1.0000000000000000 0.0000000000000000 (30 more zeros)
For the original test, the problem looks similar. With
gfortran -fopenacc -g -o openacc-github-302-2 openacc-github-302-2_.f90
-fdump-tree-all -save-temps -foffload=nvptx-none="-march=sm_35 -fdump-tree-all"
The resulting "optimized" tree for the subroutine add_ps_routine contains
<bb 6> :
.offset.9_28 = .offset.9_5 + .step.6_20;
if (.offset.9_28 < .bound.10_22)
goto <bb 4>; [INV]
else
goto <bb 7>; [INV]
where .step.6_20
_37 = .GOACC_DIM_SIZE (2);
.step.6_20 = _37;
and, in assembly, equals %ntid.x. In this example, the arguments are scalar
but actual addition happens at fifth iteration out of 10:
do i = 1, n
if (i .eq. 5) then
c = a + b
end if
end do
But loop exits after the first iteration due to incorrect condition, and
the addition never happens.