https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122798

--- Comment #3 from Josef Melcr <jmelcr at gcc dot gnu.org> ---
Hi,
I reduced simd7.f90 to:

subroutine foo (d)
  integer :: i, d(:)
  d = 9;
!$omp parallel do
  do i = 0, 0
    d = d + 3;
  end do
 end subroutine

  interface
    subroutine foo (d)
      integer :: d(:)
    end subroutine
  end interface
  integer :: d(7:9)
  call foo (d)
end

The array's stride gets propagated and the kernel gets cloned. The segfault
happens when the function tries to read the upper bound of the array:

  <bb 3> [local count: 25984552]:
  # q.12_9 = PHI <q.12_7(2), q.12_26(15)>
  # tt.13_11 = PHI <tt.13_8(2), 0(15)>
  _10 = _6 * q.12_9;
  _12 = _10 + tt.13_11;
  _13 = q.12_9 + _12;
  if (_12 >= _13)
    goto <bb 14>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 4> [local count: 12992276]:
  ubound.0_4 = *.omp_data_i_1(D).ubound.0;    <=== SEGFAULT
  if (ubound.0_4 <= 0)
    goto <bb 14>; [11.00%]
  else
    goto <bb 5>; [89.00%]

The gimple dumps of the optimized and original kernel are identical leading up
(and including) to the problematic load. Examining assembly yields the
following cause:

Optimized:
080493e0 <foo_._omp_fn.0.constprop.0>:
........
 80493ed:       e8 3e fc ff ff          call   8049030
<omp_get_num_threads@plt>
<======= MISSING ARGUMENT POINTER LOAD ==========>
 80493f2:       89 c3                   mov    %eax,%ebx
 80493f4:       e8 47 fc ff ff          call   8049040 <omp_get_thread_num@plt>
 80493f9:       89 c1                   mov    %eax,%ecx
.........
 804941f:       39 f9                   cmp    %edi,%ecx
 8049421:       7d 7d                   jge    80494a0
<foo_._omp_fn.0.constprop.0+0xc0>
 8049423:       8b 56 04                mov    0x4(%esi),%edx   <======
SEGFAULT
 8049426:       85 d2                   test   %edx,%edx
 8049428:       7e 76                   jle    80494a0
<foo_._omp_fn.0.constprop.0+0xc0>
 804942a:       8b 5e 0c                mov    0xc(%esi),%ebx

Unoptimized:
.........
 804920b:       e8 20 fe ff ff          call   8049030
<omp_get_num_threads@plt>
 8049210:       8b 74 24 30             mov    0x30(%esp),%esi          
<======== ARGUMENT POINTER LOAD
 8049214:       89 c7                   mov    %eax,%edi
 8049216:       e8 25 fe ff ff          call   8049040 <omp_get_thread_num@plt>
 804921b:       89 c1                   mov    %eax,%ecx
.........
 8049244:       39 da                   cmp    %ebx,%edx
 8049246:       0f 8d da 00 00 00       jge    8049326 <foo_._omp_fn.0+0x126>
 804924c:       8b 46 04                mov    0x4(%esi),%eax           
<========= NO SEGFAULT
 804924f:       85 c0                   test   %eax,%eax
 8049251:       0f 8e cf 00 00 00       jle    8049326 <foo_._omp_fn.0+0x126>
 8049257:       8b 6e 0c                mov    0xc(%esi),%ebp

The functions do the same work, except the unoptimized version loads the
argument pointer from the stack. The optimized version lacks this load, which
leads to the invalid memory access when it tries to load from the struct. It
appears that the optimization changes the function's calling convention. I will
investigate further.

Reply via email to