On 13 April 2011 22:53, mark florisson <markflorisso...@gmail.com> wrote: > On 13 April 2011 21:57, Dag Sverre Seljebotn <d.s.seljeb...@astro.uio.no> > wrote: >> On 04/13/2011 09:31 PM, mark florisson wrote: >>> >>> On 5 April 2011 22:29, Dag Sverre Seljebotn<d.s.seljeb...@astro.uio.no> >>> wrote: >>>> >>>> I've done a pretty major revision to the prange CEP, bringing in a lot of >>>> the feedback. >>>> >>>> Thread-private variables are now split in two cases: >>>> >>>> i) The safe cases, which really require very little technical knowledge >>>> -> >>>> automatically inferred >>>> >>>> ii) As an advanced feature, unsafe cases that requires some knowledge of >>>> threading -> must be explicitly declared >>>> >>>> I think this split simplifies things a great deal. >>>> >>>> I'm rather excited over this now; this could turn out to be a really >>>> user-friendly and safe feature that would not only allow us to support >>>> OpenMP-like threading, but be more convenient to use in a range of common >>>> cases. >>>> >>>> http://wiki.cython.org/enhancements/prange >>>> >>>> Dag Sverre >>>> >>>> _______________________________________________ >>>> cython-devel mailing list >>>> cython-devel@python.org >>>> http://mail.python.org/mailman/listinfo/cython-devel >>>> >>>> >>> >>> If we want to support cython.parallel.threadsavailable outside of >>> parallel regions (which does not depend on the schedule used for >>> worksharing constructs!), then we have to disable dynamic scheduling. >>> For instance, if OpenMP sees some OpenMP threads are already busy, >>> then with dynamic scheduling it dynamically establishes how many >>> threads to use for any parallel region. >>> So basically, if you put omp_get_num_threads() in a parallel region, >>> you have a race when you depend on that result in a subsequent >>> parallel region, because the number of busy OpenMP threads may have >>> changed. >> >> Ah, I don't know why I thought there wouldn't be a race condition. I wonder >> if the whole threadsavailable() idea should just be ditched and that we >> should think of something else. It's not a very common usecase. Starting to >> disable some forms of scheduling just to, essentially, shoehorn in one >> particular syntax, doesn't seem like the way to go. >> >> Perhaps this calls for support for the critical(?) block then, after all. >> I'm at least +1 on dropping threadsavailable() and instead require that you >> call numthreads() in a critical block: >> >> with parallel: >> with critical: >> # call numthreads() and allocate global buffer >> # calling threadid() not allowed, if we can manage that >> # get buffer slice for each thread > > In that case I think you'd want single + a barrier. 'critical' means > that all threads execute the section, but exclusively. I think you > usually want to allocate either a shared worksharing buffer, or a > private thread-local buffer. In the former case you can allocate your > buffer outside any parallel section, in the latter case within the > parallel section. It the latter case the buffer will just not be > available outside of the parallel section. > > We can still support any write-back to shared variables that are > explicitly declared later on (supposing we'd also support single and > barriers. Then the code would read as follows > > cdef shared(void *) buf > cdef void *localbuf > > with nogil, parallel: > with single: > buf = malloc(n * numthreads()) > > barrier() > > localbuf = buf + n * threadid() > <actual code here that uses localbuf (or buf if you don't assign to it)> > > # localbuf undefined here > # buf is well-defined here > > However, I don't believe it's very common to want to use private > buffers after the loop. If you have a buffer in terms of your loop > size, you want it shared, but I can't imagine a case where you want to > examine buffers that were allocated specifically for each thread after > the parallel section. So I'm +1 on dropping threadsavailable outside > parallel sections, but currently -1 on supporting this case, because > we can solve it later on with support for explicitly declared > variables + single + barriers. > >>> So basically, to make threadsavailable() work outside parallel >>> regions, we'd have to disable dynamic scheduling (omp_set_dynamic(0)). >>> Of course, when OpenMP cannot request the amount of threads desired >>> (because they are bounded by a configurable thread limit (and the OS >>> of course)), the behaviour will be implementation defined. So then we >>> could just put a warning in the docs for that, and users can check for >>> this in the parallel region using threadsavailable() if it's really >>> important. >> >> Do you have any experience with what actually happen with, say, GNU OpenMP? >> I blindly assumed from the specs that it was an error condition ("flag an >> error any way you like"), but I guess that may be wrong. >> >> Just curious, I think we can just fall back to OpenMP behaviour; unless it >> terminates the interpreter in an error condition, in which case we should >> look into how expensive it is to check for the condition up front... > > With libgomp you just get the maximum amount of available threads, up > to the number requested. So this code > > 1 #include <stdio.h> > 2 #include <omp.h> > 3 > 4 int main(void) { > 5 printf("The thread limit is: %d\n", omp_get_thread_limit()); > 6 #pragma omp parallel num_threads(4) > 7 { > 8 #pragma omp single > 9 printf("We have %d threads in the thread team\n", > omp_get_num_threads()); > 10 } > 11 return 0; > 12 } > > requests 4 threads, but it gets only 2: > > [0] [22:28] ~/code/openmp ➤ OMP_THREAD_LIMIT=2 ./testomp > The thread limit is: 2 > We have 2 threads in the thread team > >> >> Dag Sverre >> >> _______________________________________________ >> cython-devel mailing list >> cython-devel@python.org >> http://mail.python.org/mailman/listinfo/cython-devel >> >
Although there is omp_get_max_threads(): "The omp_get_max_threads routine returns an upper bound on the number of threads that could be used to form a new team if a parallel region without a num_threads clause were encountered after execution returns from this routine." So we could have threadsvailable() evaluate to that if encountered outside a parallel region. Inside, it would evaluate to omp_get_num_threads(). At worst, people would over-allocate a bit. _______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel