> This feels like you're getting a small
> memory/cache bandwidth increase for the rkf45_apply level-1-BLAS-like
> operations by using multiple cores but the cores are otherwise not
> being used effectively.  I say this because a state vector 1e6 doubles
> long will not generally fit in cache.  Adding more cores increases the
> amount of cache available.

Hmm... I tentatively take this back on re-thinking how you've added
the #pragma omp lines to the rkf45.c file you attached elsewhere in
this thread.  Try using a single
    #pragma omp parallel
and then individual lines like
    #pragma omp for
at each for loop.  Using
    #pragma omp parallel for
repeatedly as you've done can introduce excess overhead, depending on
your compiler, because it may incur unnecessary overhead.

- Rhys

Reply via email to