On 04/14/2011 08:39 PM, mark florisson wrote:
On 14 April 2011 20:29, Dag Sverre Seljebotn<d.s.seljeb...@astro.uio.no> wrote:
On 04/13/2011 11:13 PM, mark florisson wrote:
Although there is omp_get_max_threads():
"The omp_get_max_threads routine returns an upper bound on the number
of threads that could be used to form a new team if a parallel region
without a num_threads clause were encountered after execution returns
from this routine."
So we could have threadsvailable() evaluate to that if encountered
outside a parallel region. Inside, it would evaluate to
omp_get_num_threads(). At worst, people would over-allocate a bit.
Well, over-allocating could well mean 1 GB, which could well mean getting an
unecesarry MemoryError (or, like in my case, if I'm not careful to set
ulimit, getting a SIGKILL sent to you 2 minutes after the fact by the
cluster patrol process...)
The upper bound is not "however many threads you think you can start",
but rather "how many threads are considered useful for your machine".
So if you use omp_set_num_threads(), it will return the value you set
there. Otherwise, if you have e.g. a quadcore, it will return 4. The
spec says:
"Note – The return value of the omp_get_max_threads routine can be
used to dynamically allocate sufficient storage for all threads in the
team formed at the subsequent active parallel region."
So this sounds like a viable option.
What would happen here: We have 8 cores. Some code has an OpenMP
parallel section with maxthreads=2, and inside the section another
function is called.
That called function uses threadsavailable(), and has a parallel block
that wants as many threads as it can get.
I don't know the details as well as you do, but my uninformed guess is
that in this case it'd be quite possible with a race where
omp_get_max_threads would return 7 in each case, then the first one to
the parallel would get the 7 threads. The remaining thread then has
allocated storage for 7 threads but only has 1 thread running.
BTW, I'm not sure what the difference is between the original idea and
omp_get_max_threads -- in the absence of such races as above, my
original idea with entering a parallel section (with the same scheduling
parameters) just to see how many threads we got, would work as well?
DS
_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel