Aronne made good suggestions. Here is another weapon for your arsenal: 1) I assume that the shape of your array is irrelevant (reshape if needed) 2) Depending on the structure of your data np.unique can be handy: arr_unique, idx = np.unique(arr1d, return_inverse=True) then search arr_unique instead of arr1d. 3) Caveat: np.unique is a major memory hogger, be prepared to waste ~1GB. Val
On Tue, Feb 7, 2012 at 10:34 PM, Aronne Merrelli <aronne.merre...@gmail.com>wrote: > > > On Mon, Feb 6, 2012 at 11:44 AM, Naresh Pai <n...@uark.edu> wrote: > >> I have two large matrices, say, ABC and DEF, each with a shape of 7000 by >> 4500. I have another list, say, elem, containing 850 values from ABC. I am >> interested in finding out the corresponding values in DEF where ABC has >> elem and store them *separately*. The code that I am using is: >> >> for i in range(len(elem)): >> DEF_distr = DEF[ABC==elem[i]] >> >> DEF_distr gets used for further processing before it gets cleared from >> memory and the next round of the above loop begins. The loop above >> currently takes about 20 minutes! I think the bottle neck is where elem is >> getting searched repeatedly in ABC. So I am looking for a solution where >> all elem can get processed in a single call and the indices of ABC be >> stored in another variable (separately). I would appreciate if you suggest >> any faster method for getting DEF_distr. >> >> > You'll need to mention some details about the contents of ABC/DEF in order > to get the best answer (what range of values, do they have a certain > structure, etc). I made the assumption that ABC and elem have integers (I'm > not sure it makes sense to search for ABC==elem[n] unless they are both > integers), and then used a sort followed by searchsorted. This has a side > effect of reordering the elements in DEF_distr. I don't know if that > matters. You can skip the .copy() calls if you don't care that ABC/DEF are > sorted. > > ABC_1D = ABC.copy().ravel() > ABC_1D_sorter = np.argsort(ABC_1D) > ABC_1D = ABC_1D[ABC_1D_sorter] > DEF_1D = DEF.copy().ravel() > DEF_1D = DEF_1D[ABC_1D_sorter] > ind1 = np.searchsorted(ABC_1D, elem, side='left') > ind2 = np.searchsorted(ABC_1D, elem, side='right') > DEF_distr = [] > for n in range(len(elem)): > DEF_distr.append( DEF_1D[ind1[n]:ind2[n]] ) > > > I tried this on the big memory workstation, and for the 7Kx4K size I get > about 100 seconds for the simple method and 10 seconds for this more > complicated sort-based method - if you are getting 20 minutes for that, > maybe there is a memory problem, or a different part of the code that is > the bottleneck? > > Hope that helps, > Aronne > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion