Hi, I have found a very strange bug that I cannot understand. I would like to do something like this:
Given 4 pairs of numpy arrays (x1, y1, x2, y2, x3, y3, x4, y4), I would like to compute each corresponding inner product ip = np.dot(yi.T, xj), for i = 1,...4 and j = 1,...,4. (Something like a correlation matrix.) What I did was use shared memory and the multiprocessing module (with pool) to load the data in parallel. Each processor loads one pair of the snapshots, so I can do 4 simultaneous loads on my four-core machine, and it takes two cycles of loads to get all the data in memory. Then I tried to do the inner products in parallel as well, asking each processors to do 4 of the 16 total inner products. As it turned out, this was slower in parallel than in serial!!! This only occurs when I use numpy functions. If instead I replace the inner product task by printing to stdout or something of that sort, I get the 4x speedup that I expect. Any ideas? Jonathan Tu _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
