On 13 April 2011 21:57, Dag Sverre Seljebotn <d.s.seljeb...@astro.uio.no> wrote:
> On 04/13/2011 09:31 PM, mark florisson wrote:
>>
>> On 5 April 2011 22:29, Dag Sverre Seljebotn<d.s.seljeb...@astro.uio.no>
>>  wrote:
>>>
>>> I've done a pretty major revision to the prange CEP, bringing in a lot of
>>> the feedback.
>>>
>>> Thread-private variables are now split in two cases:
>>>
>>>  i) The safe cases, which really require very little technical knowledge
>>> ->
>>> automatically inferred
>>>
>>>  ii) As an advanced feature, unsafe cases that requires some knowledge of
>>> threading ->  must be explicitly declared
>>>
>>> I think this split simplifies things a great deal.
>>>
>>> I'm rather excited over this now; this could turn out to be a really
>>> user-friendly and safe feature that would not only allow us to support
>>> OpenMP-like threading, but be more convenient to use in a range of common
>>> cases.
>>>
>>> http://wiki.cython.org/enhancements/prange
>>>
>>> Dag Sverre
>>>
>>> _______________________________________________
>>> cython-devel mailing list
>>> cython-devel@python.org
>>> http://mail.python.org/mailman/listinfo/cython-devel
>>>
>>>
>>
>> If we want to support cython.parallel.threadsavailable outside of
>> parallel regions (which does not depend on the schedule used for
>> worksharing constructs!), then we have to disable dynamic scheduling.
>> For instance, if OpenMP sees some OpenMP threads are already busy,
>> then with dynamic scheduling it dynamically establishes how many
>> threads to use for any parallel region.
>> So basically, if you put omp_get_num_threads() in a parallel region,
>> you have a race when you depend on that result in a subsequent
>> parallel region, because the number of busy OpenMP threads may have
>> changed.
>
> Ah, I don't know why I thought there wouldn't be a race condition. I wonder
> if the whole threadsavailable() idea should just be ditched and that we
> should think of something else. It's not a very common usecase. Starting to
> disable some forms of scheduling just to, essentially, shoehorn in one
> particular syntax, doesn't seem like the way to go.
>
> Perhaps this calls for support for the critical(?) block then, after all.
> I'm at least +1 on dropping threadsavailable() and instead require that you
> call numthreads() in a critical block:
>
> with parallel:
>    with critical:
>        # call numthreads() and allocate global buffer
>        # calling threadid() not allowed, if we can manage that
>    # get buffer slice for each thread

In that case I think you'd want single + a barrier. 'critical' means
that all threads execute the section, but exclusively. I think you
usually want to allocate either a shared worksharing buffer, or a
private thread-local buffer. In the former case you can allocate your
buffer outside any parallel section, in the latter case within the
parallel section. It the latter case the buffer will just not be
available outside of the parallel section.

We can still support any write-back to shared variables that are
explicitly declared later on (supposing we'd also support single and
barriers. Then the code would read as follows

cdef shared(void *) buf
cdef void *localbuf

with nogil, parallel:
    with single:
        buf = malloc(n * numthreads())

    barrier()

    localbuf = buf + n * threadid()
    <actual code here that uses localbuf (or buf if you don't assign to it)>

# localbuf undefined here
# buf is well-defined here

However, I don't believe it's very common to want to use private
buffers after the loop. If you have a buffer in terms of your loop
size, you want it shared, but I can't imagine a case where you want to
examine buffers that were allocated specifically for each thread after
the parallel section. So I'm +1 on dropping threadsavailable outside
parallel sections, but currently -1 on supporting this case, because
we can solve it later on with support for explicitly declared
variables + single + barriers.

>> So basically, to make threadsavailable() work outside parallel
>> regions, we'd have to disable dynamic scheduling (omp_set_dynamic(0)).
>> Of course, when OpenMP cannot request the amount of threads desired
>> (because they are bounded by a configurable thread limit (and the OS
>> of course)), the behaviour will be implementation defined. So then we
>> could just put a warning in the docs for that, and users can check for
>> this in the parallel region using threadsavailable() if it's really
>> important.
>
> Do you have any experience with what actually happen with, say, GNU OpenMP?
> I blindly assumed from the specs that it was an error condition ("flag an
> error any way you like"), but I guess that may be wrong.
>
> Just curious, I think we can just fall back to OpenMP behaviour; unless it
> terminates the interpreter in an error condition, in which case we should
> look into how expensive it is to check for the condition up front...

With libgomp you just get the maximum amount of available threads, up
to the number requested. So this code

  1 #include <stdio.h>
  2 #include <omp.h>
  3
  4 int main(void) {
  5     printf("The thread limit is: %d\n", omp_get_thread_limit());
  6     #pragma omp parallel num_threads(4)
  7     {
  8         #pragma omp single
  9         printf("We have %d threads in the thread team\n",
omp_get_num_threads());
 10     }
 11     return 0;
 12 }

requests 4 threads, but it gets only 2:

[0] [22:28] ~/code/openmp  ➤ OMP_THREAD_LIMIT=2 ./testomp
The thread limit is: 2
We have 2 threads in the thread team

>
> Dag Sverre
>
> _______________________________________________
> cython-devel mailing list
> cython-devel@python.org
> http://mail.python.org/mailman/listinfo/cython-devel
>
_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Reply via email to