https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122798
--- Comment #3 from Josef Melcr <jmelcr at gcc dot gnu.org> ---
Hi,
I reduced simd7.f90 to:
subroutine foo (d)
integer :: i, d(:)
d = 9;
!$omp parallel do
do i = 0, 0
d = d + 3;
end do
end subroutine
interface
subroutine foo (d)
integer :: d(:)
end subroutine
end interface
integer :: d(7:9)
call foo (d)
end
The array's stride gets propagated and the kernel gets cloned. The segfault
happens when the function tries to read the upper bound of the array:
<bb 3> [local count: 25984552]:
# q.12_9 = PHI <q.12_7(2), q.12_26(15)>
# tt.13_11 = PHI <tt.13_8(2), 0(15)>
_10 = _6 * q.12_9;
_12 = _10 + tt.13_11;
_13 = q.12_9 + _12;
if (_12 >= _13)
goto <bb 14>; [50.00%]
else
goto <bb 4>; [50.00%]
<bb 4> [local count: 12992276]:
ubound.0_4 = *.omp_data_i_1(D).ubound.0; <=== SEGFAULT
if (ubound.0_4 <= 0)
goto <bb 14>; [11.00%]
else
goto <bb 5>; [89.00%]
The gimple dumps of the optimized and original kernel are identical leading up
(and including) to the problematic load. Examining assembly yields the
following cause:
Optimized:
080493e0 <foo_._omp_fn.0.constprop.0>:
........
80493ed: e8 3e fc ff ff call 8049030
<omp_get_num_threads@plt>
<======= MISSING ARGUMENT POINTER LOAD ==========>
80493f2: 89 c3 mov %eax,%ebx
80493f4: e8 47 fc ff ff call 8049040 <omp_get_thread_num@plt>
80493f9: 89 c1 mov %eax,%ecx
.........
804941f: 39 f9 cmp %edi,%ecx
8049421: 7d 7d jge 80494a0
<foo_._omp_fn.0.constprop.0+0xc0>
8049423: 8b 56 04 mov 0x4(%esi),%edx <======
SEGFAULT
8049426: 85 d2 test %edx,%edx
8049428: 7e 76 jle 80494a0
<foo_._omp_fn.0.constprop.0+0xc0>
804942a: 8b 5e 0c mov 0xc(%esi),%ebx
Unoptimized:
.........
804920b: e8 20 fe ff ff call 8049030
<omp_get_num_threads@plt>
8049210: 8b 74 24 30 mov 0x30(%esp),%esi
<======== ARGUMENT POINTER LOAD
8049214: 89 c7 mov %eax,%edi
8049216: e8 25 fe ff ff call 8049040 <omp_get_thread_num@plt>
804921b: 89 c1 mov %eax,%ecx
.........
8049244: 39 da cmp %ebx,%edx
8049246: 0f 8d da 00 00 00 jge 8049326 <foo_._omp_fn.0+0x126>
804924c: 8b 46 04 mov 0x4(%esi),%eax
<========= NO SEGFAULT
804924f: 85 c0 test %eax,%eax
8049251: 0f 8e cf 00 00 00 jle 8049326 <foo_._omp_fn.0+0x126>
8049257: 8b 6e 0c mov 0xc(%esi),%ebp
The functions do the same work, except the unoptimized version loads the
argument pointer from the stack. The optimized version lacks this load, which
leads to the invalid memory access when it tries to load from the struct. It
appears that the optimization changes the function's calling convention. I will
investigate further.