Hi,

I have found a very strange bug that I cannot understand.  I would like to do 
something like this:

Given 4 pairs of numpy arrays (x1, y1, x2, y2, x3, y3, x4, y4), I would like to 
compute each corresponding inner product ip = np.dot(yi.T, xj), for i = 1,...4 
and j = 1,...,4.  (Something like a correlation matrix.)

What I did was use shared memory and the multiprocessing module (with pool) to 
load the data in parallel. Each processor loads one pair of the snapshots, so I 
can do 4 simultaneous loads on my four-core machine, and it takes two cycles of 
loads to get all the data in memory.

Then I tried to do the inner products in parallel as well, asking each 
processors to do 4 of the 16 total inner products.  As it turned out, this was 
slower in parallel than in serial!!!

This only occurs when I use numpy functions.  If instead I replace the inner 
product task by printing to stdout or something of that sort, I get the 4x 
speedup that I expect.  Any ideas?





Jonathan Tu
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to