Re: [Cython] prange CEP updated

Dag Sverre Seljebotn Thu, 21 Apr 2011 11:13:35 -0700

On 04/21/2011 10:37 AM, Robert Bradshaw wrote:

On Mon, Apr 18, 2011 at 7:51 AM, mark florisson
<[email protected]>  wrote:

On 18 April 2011 16:41, Dag Sverre Seljebotn<[email protected]>  wrote:

Excellent! Sounds great! (as I won't have my laptop for some days I can't
have a look yet but I will later)


You're right about (the current) buffers and the gil. A testcase explicitly
for them would be good.

Firstprivate etc: i think it'd be nice myself, but it is probably better to
take a break from it at this point so that we can think more about that and
not do anything rash; perhaps open up a specific thread on them and ask for
more general input. Perhaps you want to take a break or task-switch to
something else (fused types?) until I can get around to review and merge
what you have so far? You'll know best what works for you though. If you
decide to implement explicit threadprivate variables because you've got the
flow I certainly wom't object myself.

  Ok, cool, I'll move on :) I already included a test with a prange and
a numpy buffer with indexing.


Wow, you're just plowing away at this. Very cool.

+1 to disallowing nested prange, that seems to get really messy with
little benefit.

In terms of the CEP, I'm still unconvinced that firstprivate is not
safe to infer, but lets leave the initial values undefined rather than
specifying them to be NaNs (we can do that as an implementation if you
want), which will give us flexibility to change later once we've had a
chance to play around with it.

I don't see any technical issues with inferring firstprivate, thequestion is whether we want to. I suggest not inferring it in order tomake this safer: One should be able to just try to change a loop from"range" to "prange", and either a) have things fail very hard, or b)just work correctly and be able to trust the results.

Note that when I suggest using NaN, it is as initial values for EACHITERATION, not per-thread initialization. It is not about "firstprivate"or not, but about disabling thread-private variables entirely in favorof "per-iteration" variables.

I believe that by talking about "readonly" and "per-iteration"variables, rather than "thread-shared" and "thread-private" variables,this can be used much more safely and with virtually no knowledge of thedetails of threading. Again, what's in my mind are scientificprogrammers with (too) little training.

In the end it's a matter of taste and what is most convenient to moreusers. But I believe the case of needing real thread-private variablesthat preserves per-thread values across iterations (and thus also canpossibly benefit from firstprivate) is seldomly enough used that anexplicit declaration is OK, in particular when it buys us so much insafety in the common case.


To be very precise,

cdef double x, z
for i in prange(n):
    x = f(x)
    z = f(i)
    ...

goes to

cdef double x, z
for i in prange(n):
    x = z = nan
    x = f(x)
    z = f(i)
    ...

and we leave it to the C compiler to (trivially) optimize away "z =nan". And, yes, it is a stopgap solution until we've got control flowanalysis so that we can outright disallow such uses of x (withoutthreadprivate declaration, which also gives firstprivate behaviour).


The "cdef threadlocal(int) foo" declaration syntax feels odd to me...
We also probably want some way of explicitly marking a variable as
shared and still be able to assign to/flush/sync it. Perhaps the
parallel context could be used for these declarations, i.e.

     with parallel(threadlocal=a, shared=(b,c)):
         ...

which would be considered an "expert" usecase.

I'm not set on the syntax for threadlocal variables; although yourproposal feels funny/very unpythonic, almost like a C macro. For someinspiration, here's the Python solution (with no obvious place to putthe type):


import threading
mydata = threading.local()
mydata.myvar = ... # value is threadprivate

For all the discussion of threadsavailable/threadid, the most common
usecase I see is for allocating a large shared buffer and partitioning
it. This seems better handled by allocating separate thread-local
buffers, no? I still like the context idea, but everything in a
parallel block before and after the loop(s) also seems like a natural
place to put any setup/teardown code (though the context has the
advantage that __exit__ is always called, even if exceptions are
raised, which makes cleanup a lot easier to handle).

I'd *really* like to have try/finally available in cython.parallel blockfor this, although I realize that may have to wait for a while. A bigpart of our discussions at the workshop were about how to handleexceptions; I guess there'll be a "phase 2" of this wherebreak/continue/raise is dealt with.


Dag Sverre
_______________________________________________
cython-devel mailing list
[email protected]
http://mail.python.org/mailman/listinfo/cython-devel

Re: [Cython] prange CEP updated

Reply via email to