Issue |
153374
|
Summary |
[flang] [openmp] performance issue due to code generation for private variables
|
Labels |
|
Assignees |
|
Reporter |
shivaramaarao
|
consider the following program
```
program parallel_do_example
implicit none
integer :: i, n, k
real, dimension(4096) :: a, b, c , x
n = 4096
! Initialize arrays
do i = 1, n
a(i) = real(i)
b(i) = 2.0 * real(i)
x(k) = 0.0
end do
!$OMP PARALLEL DO PRIVATE(x)
do i = 1, n
do k = 1, n
x(k) = x(k) + a(k) + b(k)
enddo
c(i) = a(i) + b(i) * x(i)
end do
!$OMP END PARALLEL DO
! Print a few results to verify
print *, 'c(1) = ', c(1)
print *, 'c(50) = ', c(50)
print *, 'c(100) = ', c(100)
end program parallel_do_example
```
$flang -O3 -march=znver5 -fopenmp -S mytest.f90
The generated assembly shows memory allocated for x array through malloc and it is freed at the end of function call
```
pushq %rbp
pushq %r15
pushq %r14
pushq %r13
pushq %r12
pushq %rbx
subq $232, %rsp
movq (%rdx), %r15
movl $16384, %edi
movl (%r15), %ebp
callq malloc@PLT
```
```
vzeroupper
callq __kmpc_for_static_fini@PLT
.LBB1_30:
movq %rbx, %rdi
callq free@PLT
````
This causes significant performance degradation compared to classic flang and ifx compiler. This type of code is present in 350.md benchmark of omp2012. In the benchmark there is an array of size 3 is used and it is being allocated and freed.
A solution would be to allocate the variables in stack rather than malloc and free. that would help to improve the benchmark performance.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs