Hi Rhys,
While that is true in theory, it is not applicable in practice, since
there can be no "return" within parallel sections. We need one parallel
section for each loop in this case.
Maxime
Le 2012-12-13 11:44, Rhys Ulerich a écrit :
This feels like you're getting a small
memory/cache bandwidth increase for the rkf45_apply level-1-BLAS-like
operations by using multiple cores but the cores are otherwise not
being used effectively. I say this because a state vector 1e6 doubles
long will not generally fit in cache. Adding more cores increases the
amount of cache available.
Hmm... I tentatively take this back on re-thinking how you've added
the #pragma omp lines to the rkf45.c file you attached elsewhere in
this thread. Try using a single
#pragma omp parallel
and then individual lines like
#pragma omp for
at each for loop. Using
#pragma omp parallel for
repeatedly as you've done can introduce excess overhead, depending on
your compiler, because it may incur unnecessary overhead.
- Rhys