Robert Bradshaw, 11.03.2011 01:46:
On Tue, Mar 8, 2011 at 11:16 AM, Francesc Alted<fal...@pytables.org>  wrote:
A Tuesday 08 March 2011 18:50:15 Stefan Behnel escrigué:
mark florisson, 08.03.2011 18:00:
What I meant was that the
wrapper returned by the decorator would have to call the closure
for every iteration, which introduces function call overhead.

[...]

I guess we just have to establish what we want to do: do we
want to support code with Python objects (and exceptions etc), or
just C code written in Cython?

I like the approach that Sturla mentioned: using closures to
implement worker threads. I think that's very pythonic. You could do
something like this, for example:

      def worker():
          for item in queue:
              with nogil:
                  do_stuff(item)

      queue.extend(work_items)
      start_threads(worker, count)

Note that the queue is only needed to tell the thread what to work
on. A lot of things can be shared over the closure. So the queue may
not even be required in many cases.

I like this approach too.  I suppose that you will need to annotate the
items so that they are not Python objects, no?  Something like:

     def worker():
         cdef int item  # tell that item is not a Python object!
         for item in queue:
             with nogil:
                 do_stuff(item)

     queue.extend(work_items)
     start_threads(worker, count)

On a slightly higher level, are we just trying to use OpenMP from
Cython, or are we trying to build it into the language? If the former,
it may make sense to stick closer than one might otherwise be tempted
in terms of API to the underlying C to leverage the existing
documentation. A library with a more Pythonic interface could perhaps
be written on top of that. Alternatively, if we're building it into
Cython itself, I'd it might be worth modeling it after the
multiprocessing module (though I understand it would be implemented
with threads), which I think is a decent enough model for managing
embarrassingly parallel operations.

+1


The above code is similar to that,
though I'd prefer the for loop implicit rather than as part of the
worker method (or at least as an argument).

It provides a simple way to write per-thread initialisation code, though. And it's likely easier to make looping fast than to speed up the call into a closure. However, eventually, both ways will need to be supported anyway.


If we went this route,
what are the advantages of using OpenMP over, say, pthreads in the
background? (And could the latter be done with just a library + some
fancy GIL specifications?)

In the above example, basically everything is explicit and nothing more than a simplified threading setup is needed. Even the implementation of "start_threads()" could be done in a couple of lines of Python code, including the collection of results and errors. If someone thinks we need more than that, I'd like to see a couple of concrete use cases and code examples first.


One thing that's nice about OpenMP as
implemented in C is that the serial code looks almost exactly like the
parallel code; the code at http://wiki.cython.org/enhancements/openmp
has this property too.

Writing it with a closure isn't really that much different. You can put the inner function right where it would normally get executed and add a bit of calling/load distributing code below it. Not that bad IMO.

It may be worth providing some ready-to-use decorators to do the load balancing, but I don't really like the idea of having a decorator magically invoke the function in-place that it decorates.


Also, I like the idea of being able to hold the GIL by the invoking
thread and having the "sharing" threads do the appropriate locking
among themselves when needed if possible, e.g. for exception raising.

I like the explicit "with nogil" block in my example above. It makes it easy to use normal Python setup code, to synchronise based on the GIL if desired (e.g. to use a normal Python queue for communication), and it's simple enough not to get in the way.

I think it simplifies things a lot when code can rely on the GIL being held when entering the thread function. Threading is complicated enough to keep it as explicit as possible.

Stefan
_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Reply via email to