https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83064

--- Comment #7 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
> I looked at the IL from the Fortran FE and it clearly uses a single memory
> area for tmp for each outer loop iteration. That is, the memory is allocated
> by the caller. 

I confirm that using

        pik = compute( low(i), high(i) )
        pi(i) = sum(pik)

gives the right result.

Does it means that the 'sum' in 'sum(compute( low(i), high(i) ))' is not part
of the parallelization?

> > Do you understand why the code is not parallelized with
> > -ftree-parallelize-loops=4?

> Because the outer loop has four iterations and we statically require
> at least two per thread for outer loops. 

Why is it so? and is it documented?

Reply via email to