On 11 March 2011 08:56, Dag Sverre Seljebotn <d.s.seljeb...@astro.uio.no> wrote: > On 03/11/2011 08:20 AM, Stefan Behnel wrote: >> >> Robert Bradshaw, 11.03.2011 01:46: >>> >>> On Tue, Mar 8, 2011 at 11:16 AM, Francesc Alted<fal...@pytables.org> >>> wrote: >>>> >>>> A Tuesday 08 March 2011 18:50:15 Stefan Behnel escrigué: >>>>> >>>>> mark florisson, 08.03.2011 18:00: >>>>>> >>>>>> What I meant was that the >>>>>> wrapper returned by the decorator would have to call the closure >>>>>> for every iteration, which introduces function call overhead. >>>>>> >>>>>> [...] >>>>>> >>>>>> I guess we just have to establish what we want to do: do we >>>>>> want to support code with Python objects (and exceptions etc), or >>>>>> just C code written in Cython? >>>>> >>>>> I like the approach that Sturla mentioned: using closures to >>>>> implement worker threads. I think that's very pythonic. You could do >>>>> something like this, for example: >>>>> >>>>> def worker(): >>>>> for item in queue: >>>>> with nogil: >>>>> do_stuff(item) >>>>> >>>>> queue.extend(work_items) >>>>> start_threads(worker, count) >>>>> >>>>> Note that the queue is only needed to tell the thread what to work >>>>> on. A lot of things can be shared over the closure. So the queue may >>>>> not even be required in many cases. >>>> >>>> I like this approach too. I suppose that you will need to annotate the >>>> items so that they are not Python objects, no? Something like: >>>> >>>> def worker(): >>>> cdef int item # tell that item is not a Python object! >>>> for item in queue: >>>> with nogil: >>>> do_stuff(item) >>>> >>>> queue.extend(work_items) >>>> start_threads(worker, count) >>> >>> On a slightly higher level, are we just trying to use OpenMP from >>> Cython, or are we trying to build it into the language? If the former, >>> it may make sense to stick closer than one might otherwise be tempted >>> in terms of API to the underlying C to leverage the existing >>> documentation. A library with a more Pythonic interface could perhaps >>> be written on top of that. Alternatively, if we're building it into >>> Cython itself, I'd it might be worth modeling it after the >>> multiprocessing module (though I understand it would be implemented >>> with threads), which I think is a decent enough model for managing >>> embarrassingly parallel operations. >> >> +1 >> >> >>> The above code is similar to that, >>> though I'd prefer the for loop implicit rather than as part of the >>> worker method (or at least as an argument). >> >> It provides a simple way to write per-thread initialisation code, though. >> And it's likely easier to make looping fast than to speed up the call into a >> closure. However, eventually, both ways will need to be supported anyway. >> >> >>> If we went this route, >>> what are the advantages of using OpenMP over, say, pthreads in the >>> background? (And could the latter be done with just a library + some >>> fancy GIL specifications?) >> >> In the above example, basically everything is explicit and nothing more >> than a simplified threading setup is needed. Even the implementation of >> "start_threads()" could be done in a couple of lines of Python code, >> including the collection of results and errors. If someone thinks we need >> more than that, I'd like to see a couple of concrete use cases and code >> examples first. >> >> >>> One thing that's nice about OpenMP as >>> implemented in C is that the serial code looks almost exactly like the >>> parallel code; the code at http://wiki.cython.org/enhancements/openmp >>> has this property too. >> >> Writing it with a closure isn't really that much different. You can put >> the inner function right where it would normally get executed and add a bit >> of calling/load distributing code below it. Not that bad IMO. >> >> It may be worth providing some ready-to-use decorators to do the load >> balancing, but I don't really like the idea of having a decorator magically >> invoke the function in-place that it decorates. >> >> >>> Also, I like the idea of being able to hold the GIL by the invoking >>> thread and having the "sharing" threads do the appropriate locking >>> among themselves when needed if possible, e.g. for exception raising. >> >> I like the explicit "with nogil" block in my example above. It makes it >> easy to use normal Python setup code, to synchronise based on the GIL if >> desired (e.g. to use a normal Python queue for communication), and it's >> simple enough not to get in the way. > > I'm supporting Robert here. Basically, I'm +1 to anything that can make me > pretend the GIL doesn't exist, even if it comes with a 2x performance hit: > Because that will make me write parallell code (which I can't be bothered to > do in Cython currently), and I have 4 cores on the laptop I use for > debugging, so I'd still get a 2x speedup. > > Perhaps the long-term solution is something like an "autogil" mode could > work where Cython automatically releases the GIL on blocks where it can > (such as a typed for-loop), and acquires it back when needed (an > exception-raising if-block within said for-loop). And when doing > multi-threading, GIL-requiring calls are dispatched to a master GIL-holding > thread (which would not be a worker thread, i.e. on 4 cores you'd have 4 > workers + 1 GIL-holding support thread). So the advice for speeding up code > is simply "make sure your code is all typed", just like before, but people > can follow that advice without even having to learn about the GIL. > > It's all about a) lowering learning curve for trivial purposes, b) allow > inserting temporary debug print statements using the GIL without having to > rework the code.
Have we ever thought about supporting 'with gil' as an actual statement instead of just as part of a function declaration or definition? Then you could just say in a 'with nogil:' block: 'with gil: print myvar'. On the other hand, we could also convert usages of 'print' (that is, of simple 'print a, b, c'-style printing) in 'nogil' blocks or functions to C printf statements, where possible. Would that be a wanted feature? > As for the discussion we had on using the GIL for locking, I think that > should be made explicit, even if it is a noop currently. I once wrote code > relying on the GIL, and really missed something like "cython.gil.lock()" to > put in there just for better code readability (yes, I used comments, > but...). > >> >> I think it simplifies things a lot when code can rely on the GIL being >> held when entering the thread function. Threading is complicated enough to >> keep it as explicit as possible. > > That's exactly the thing about OpenMP: It tends to hide the complexity of > threading and allow you to get on with your life. When you say this, it > sounds a bit like "people who don't want to learn the technical inner > details of Python should just use another language than Cython". > > If I write code in Fortran it may get parallelized, whereas I almost never > write parallel code in Cython (well, MPI, but not shared-memory), all the > "is-the-gil-held-or-not" is just too much too keep in my head. > > Dag Sverre > _______________________________________________ > cython-devel mailing list > cython-devel@python.org > http://mail.python.org/mailman/listinfo/cython-devel > _______________________________________________ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel