Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

Bruce Southey Thu, 16 Jun 2011 14:34:20 -0700

On 06/16/2011 02:05 PM, Brandt Belson wrote:

Hi all,
Thanks for the replies. As mentioned, I'm parallelizing so that I cantake many inner products simultaneously (which I agree isembarrassingly parallel). The library I'm writing asks the user tosupply a function that takes two objects and returns their innerproduct. After all the discussion though it seems this is toosimplistic of an approach. Instead, I plan to write this part of thelibrary as if the inner product function supplied by the user uses allavailable cores (with numpy and/or numexpr built with MKL or LAPACK).
As far as using fortran or C and openMP, this probably isn't worth thetime it would take, both for me and the user.
I've tried increasing the array sizes and found the same trends, sothe slowdown isn't only because the arrays are too small to see thebenefit of multiprocessing. I wrote the code to be easy for anyone toexperiment with, so feel free to play around with what is included inthe profiling, the sizes of arrays, functions used, etc.
I also tried using handythread.foreach with arraySize = (3000,1000),and found the following:
No shared memory, numpy array multiplication took 1.57585811615 seconds
Shared memory, numpy array multiplication took 1.25499510765 seconds
This is definitely an improvement from multiprocessing, but withoutknowing any better, I was hoping to see a roughly 8x speedup on my8-core workstation.
Based on what Chris sent, it seems there is some large overhead causedby multiprocessing pickling numpy arrays. To test what Robin mentioned
> If you are on Linux or Mac then fork works nicely so you have read
> only shared memory you just have to put it in a module before the fork
> (so before pool = Pool() ) and then all the subprocesses can access it
> without any pickling required. ie
> myutil.data = listofdata
> p = multiprocessing.Pool(8)
> def mymapfunc(i):
>   return mydatafunc(myutil.data[i])
>
> p.map(mymapfunc, range(len(myutil.data)))
I tried creating the arrayList in the myutil module and usingmultiprocessing to find the inner products of myutil.arrayList,however this was still slower than not using multiprocessing, so Ibelieve there is still some large overhead. Here are the results:
No shared memory, numpy array multiplication took 1.55906510353 seconds
Shared memory, numpy array multiplication took 9.82426381111 seconds
Shared memory, myutil.arrayList numpy array multiplication took8.77094507217 seconds
I'm attaching this code.
I'm going to work around this numpy/multiprocessing behavior withnumpy/numexpr built with MKL or LAPACK. It would be good to knowexactly what's causing this though. It would be nice if there was away to get the ideal speedup via multiprocessing, regardless of theinternal workings of the single-threaded inner product function, asthis was the behavior I expected. I imagine other people might comeacross similar situations, but again I'm going to try to get aroundthis by letting MKL or LAPACK make use of all available cores.
Thanks again,
Brandt


_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

I think this is not being benchmarked correctly because there should bea noticeable different when different number of threads are selected.


But really you should read these sources:
http://www.scipy.org/ParallelProgramming
http://stackoverflow.com/questions/5260068/multithreaded-blas-in-python-numpy

Also numpy has extra things going on like checks and copies thatprobably make using np.inner() slower. Thus, your 'numpy_inner_product'is probably as efficient as you can get without extreme measures likecython.


Bruce

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

Reply via email to