On Sun, May 13, 2012 at 6:02 PM, Razya Ladelsky <ra...@il.ibm.com> wrote:
> Hi,
>
> This patch changes the minimum number of iterations of outer loops for the
> runtime check which tests whether it is worthwhile to parallelize the loop
> or not.
> The current minimum number of iterations for all loops is MIN_PER_THREAD *
> number of threads, when MIN_PER_THREAD is arbitrarily set to 100.
> This prevents some of the promising loops of SPEC2006 from getting
> parallelized.
> I changed the minimum bound for outer loops, under the assumption that
> even if there are not enough iterations, the fact that an outer loop
> contains more loops, obtains enough work to get parallelized.
> This indeed allowed for a lot more loops to get parallelized, resulting in
> substantial performance improvements for SPEC2006 benchmarks, measured on
> a Power7 6 core, 4 way SMT each.
> I compared  the trunk with O3 + autopar (parallelizing with 6 threads) vs.
> the trunk with   O3  minus vectorization.
> None of the benchmarks shows any significant degradation.
>
> The speedup shown for  libquatum  with autopar has been obtained with
> previous versions of autopar, having no relation to this patch, but surely
> not degraded by it either.
>
> These are the speedups I collected:
>
> 462.libquantum  2.5 X
> 410.bwaves      3.3 X
> 436.cactusADM   4.5 X
> 459.GemsFDTD    1.27 X
> 481.wrf         1.25 X
>
>
> Bootstrap and testsuite (with -ftree-parallelize-loops=4) pass
> successfully.
> spec-2006 showed no regressions.
>
>
> OK for trunk?

Can you add a comment that we should compute a better number-of-iterations
value here?  That is, if we have

  for (i = 0; i < n; ++i)
    for (j = 0; j < m; ++j)
      ...

we should compute nit = n * m, not nit = n.  Also may_be_zero handling
would need to be adjusted so we compute nit = (n-maybe-zero ? 0 : n) *
(m-maybe-zero ? 0 : m).  Thus, generally do a better job of computing
the work done per thread.

The patch is ok with a suitable comment.

Thanks,
Richard.

> Thanks,
> razya
>
> 2012-05-08  Razya Ladelsky  <ra...@il.ibm.com>
>
>                 * tree-parloops.c (gen_parallel_loop): Change
> many_iterations_cond for outer loops.
>

Reply via email to