Hi, I'm using masked arrays to compute large-scale standard deviation, multiplication, gaussian, and weighted averages. At first I thought using the masked arrays would be a great way to sidestep looping (which it is), but it's still slower than expected. Here's a snippet of the code that I'm using it for.
# Computing nearest neighbor distances. # Output will be about 270,000 rows long for the index # and 270,000x50 for the dist array. tree = ann.kd_tree(np.column_stack([l,b])) index, dist = tree.search(np.column_stack([l,b]),k=nth) # Clipping bad values by replacing them acceptable values av[np.where(av<=-10)] = -10 av[np.where(av>=50)] = 50 # Distance clipping and creating mask dist_arcsec = np.sqrt(dist)*3600 mask = dist_arcsec <= d_thresh # Creating masked array av_good = ma.array(av[index],mask=mask) dist_good = ma.array(dist_arcsec,mask=mask) # Reason why I'm using masked arrays. If these were # ndarrays with nan's, then the output would be nan. Std = np.array(np.std(av_good,axis=1)) Var = Std*Std Rho = np.zeros( (len(av), nth) ) Rho2 = np.zeros( (len(av), nth) ) dist_std = np.std(dist_good,axis=1) for j in range(nth): Rho[:,j] = dist_std Rho2[:,j] = Var # This part takes about 20 seconds to compute for a 270,000x50 masked array. # Using ndarrays of the same size takes about 2 second spatial_weight = 1.0 / (Rho*np.sqrt(2*np.pi)) * np.exp( - dist_good / (2*Rho**2)) # Like the spatial_weight section, this takes about 20 seconds W = spatial_weight / Rho2 # Takes less than one second. Ave = np.average(av_good,axis=1,weights=W) Any ideas on why it would take such a long time for processing? Especially the spatial_weight and W variables? Would there be a faster way to do this? Or is there a way that numpy.std can process ignore nan's when processing? Thanks, Eli Bressert _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion