Tobias Burnus wrote:
I had also a glance at the patch - and it looks reasonable; in particular, I failed to generate a failing test case.

Actually, the test case is *not* OK.

If one compiles the original test case of the PR (or your workshare2.f90) with "-O" and looks at "-fdump-tree-original", one finds:

    #pragma omp parallel default(shared)
      {
        {
          real(kind=4) __var_1;
          {
            #pragma omp single
              {
                __var_1 = __builtin_cosf (b[0])
              }
...
                #pragma omp for schedule(static) nowait
                for (S.1 = 1; S.1 <= 5; S.1 = S.1 + 1)
                  {
a[S.1 + -1] = a[S.1 + -1] * D.1730 + a[S.1 + -1] * D.1731;

Thus, __var_1 is a thread-local variable; however, COS() is not executed in all threads but only in one due to the omp single: "The single construct specifies that the associated structured block is executed by only one of the threads in the team" (2.5.3 single Construct, OpenMP 3.1).

Jakub remarks that omp single is what we expand to omp workshare if it is not simple enough for us.

 * * *

With the test case below, the dump looks OK, but the FE optimization does not combine the two cos() calls - I have no idea why. The dump looks as:

  #pragma omp parallel default(shared)
    {
                D.1743 = __builtin_cosf (b[0]);
                D.1745 = __builtin_cosf (b[0]);
...
                  #pragma omp for schedule(static) nowait
                  for (S.2 = 1; S.2 <= 10; S.2 = S.2 + 1)
a[S.2 + D.1750] = a[S.2 + D.1748] * D.1743 + a[S.2 + D.1749] * D.1745;

Tobias

PS: The test case is:

program workshare
  implicit none
  real, parameter :: eps = 3e-7
  integer :: j
  real :: A(10,5), B(5)
  B(1) = 3.344
  call random_number(a)
  !$omp parallel default(shared)
  !$omp workshare
  forall (j=1:5)
    A(:,j) = A(:,j)*cos(B(1))+A(:,j)*cos(B(1))
  end forall
  !$omp end workshare
  !$omp end parallel
  print *, A
end program workshare

subroutine parallel_workshare
  implicit none
  real, parameter :: eps = 3e-7
  integer :: j
  real :: A(10,5), B(5)
  B(1) = 3.344
  call random_number(a)
  !$omp parallel workshare default(shared)
  forall (j=1:5)
    A(:,j) = A(:,j)*cos(B(1))+A(:,j)*cos(B(1))
  end forall
  !$omp end parallel workshare
  print *, A
end subroutine parallel_workshare

Reply via email to