Re: [Cython] prange CEP updated

Dag Sverre Seljebotn Mon, 18 Apr 2011 07:48:47 -0700

Excellent! Sounds great! (as I won't have my laptop for some days I can't have 
a look yet but I will later)


You're right about (the current) buffers and the gil. A testcase explicitly for 
them would be good.

Firstprivate etc: i think it'd be nice myself, but it is probably better to 
take a break from it at this point so that we can think more about that and not 
do anything rash; perhaps open up a specific thread on them and ask for more 
general input. Perhaps you want to take a break or task-switch to something 
else (fused types?) until I can get around to review and merge what you have so 
far? You'll know best what works for you though. If you decide to implement 
explicit threadprivate variables because you've got the flow I certainly wom't 
object myself.


-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

mark florisson <markflorisso...@gmail.com> wrote:

On 18 April 2011 13:06, mark florisson <markflorisso...@gmail.com> wrote: > On 
16 April 2011 18:42, Dag Sverre Seljebotn <d.s.seljeb...@astro.uio.no> wrote: 
>> (Moving discussion from http://markflorisson.wordpress.com/, where Mark >> 
said:) > > Ok, sure, it was just an issue I was wondering about at that moment, 
> but it's a tricky issue, so thanks. > >> """ >> Started a new branch 
https://github.com/markflorisson88/cython/tree/openmp . >> >> Now the question 
is whether sharing attributes should be propagated >> outwards. e.g. if you do 
>> >> for i in prange(m): >>    for j in prange(n): >>        sum += i * j >> 
>> then ‘sum’ is a reduction for the inner parallel loop, but not for the outer 
>> one. So the user would currently have to rewrite this to >> >> for i in 
prange(m): >>    for j in prange(n): >>        sum += i * j >>    sum += 0 >> 
>> which seems a bit silly  . Of course, we could just disable nested >> 
parallelism, or tell the users to use a prange and a ‘fo
 r
from’ in such >> cases. >> """ >> >> Dag: Interesting. The first one is 
definitely the behaviour we want, as long >> as it doesn't cause unintended 
consequences. >> >> I don't really think it will -- the important thing is that 
that the order >> of loop iteration evaluation must be unimportant. And that is 
still true >> (for the outer loop, as well as for the inner) in your first 
example. >> >> Question: When you have nested pranges, what will happen is that 
two nested >> OpenMP parallel blocks are used, right? And do you know if there 
is complete >> freedom/"reentrancy" in that variables that are thread-private 
in an outer >> parallel block and be shared in an inner one, and vice versa? > 
> An implementation may or may not support it, and if it is supported > the 
behaviour can be configured through omp_set_nested(). So we should > consider 
the case where it is supported and enabled. > > If you have a lastprivate or 
reduction, and after the loop these are > (reduced and) as
 signed
to the original variable. So if that happens > inside a parallel construct 
which does not declare the variable > private to the construct, you actually 
have a race. So e.g. the nested > prange currently races in the outer parallel 
range. > >> If so I'd think that this algorithm should work and feel natural: 
>> >>  - In each prange, for the purposes of variable private/shared/reduction 
>> inference, consider all internal "prange" just as if they had been "range"; 
>> no special treatment. >> >>  - Recurse to children pranges. > > Right, that 
is most natural. Algorithmically, reductions and > lastprivates (as those can 
have races if placed in inner parallel > constructs) propagate outwards towards 
the outermost parallel block, > or up to the first parallel with block, or up 
to the first construct > that already determined the sharing attribute. > > 
e.g. > > with parallel: >     with parallel: >        for i in prange(n): >     
       for j in prange(n): >                sum += i
  * j > 
   # sum is well-defined here > # sum is undefined here > > Here 'sum' is a 
reduction for the two innermost loops. 'sum' is not > private for the inner 
parallel with block, as a prange in a parallel > with block is a worksharing 
loop that binds to that parallel with > block. However, the outermost parallel 
with block declares sum (and i > and j) private, so after that block all those 
variables become > undefined. > > However, in the outermost parallel with 
block, sum will have to be > initialized to 0 before anything else, or be 
declared firstprivate, > otherwise 'sum' is undefined to begin with. Do you 
think declaring it > firstprivate would be the way to go, or should we make it 
private and > issue a warning or perhaps even an error? > >> DS 
>>_____________________________________________
>> cython-devel mailing list >> cython-devel@python.org >> 
>> http://mail.python.org/mailman/listinfo/cython-devel >> > Everything seems 
>> to be working, although now the user has to be careful with nested parallel 
>> blocks as variables can be private there (and not firstprivate), i.e., the 
>> user has to do initialization at the right place (e.g. in the outermost 
>> parallel block that determines it private). I'm thinking of adding a 
>> warning, as the C compiler does. Two issues are remaining: 1) explicit 
>> declarations of firstprivates Do we still want those? 2) buffer auxiliary 
>> vars When unpacking numpy buffers and using typed numpy arrays, can 
>> reassignment or updates of a buffer-related variable ever occur in nogil 
>> code sections? I'm thinking this is not possible and therefore all buffer 
>> variables may be shared in parallel (for) 
>> sections?_____________________________________________
cython-devel mailing list cython-devel@python.org 
http://mail.python.org/mailman/listinfo/cython-devel

_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Re: [Cython] prange CEP updated

Reply via email to