On Sun, Feb 5, 2012 at 7:02 PM, David Verelst <david.vere...@gmail.com> wrote: > Just out of curiosity, what speed-up factor did you achieve? > > Regards, > David > > On 04/02/12 22:20, Naresh wrote: >> Warren Weckesser<warren.weckesser<at> enthought.com> writes: >> >>> >>> On Sat, Feb 4, 2012 at 2:35 PM, Benjamin Root<ben.root<at> ou.edu> wrote: >>> >>> >>> On Saturday, February 4, 2012, Naresh Pai<npai<at> uark.edu> wrote:> I am >> somewhat new to Python (been coding with Matlab mostly). I am trying to >>>> simplify (and expedite) a piece of code that is currently a bottleneck in a >> larger >>>> code.> I have a large array (7000 rows x 4500 columns) titled say, abc, >>>> and >> I am trying> to find a fast method to count the number of instances of each >> unique value within> it. All unique values are stored in a variable, say, >> unique_elem. My current code >>> >>>> is as follows:> import numpy as np> #allocate space for storing element >> count> elem_count = zeros((len(unique_elem),1))> #loop through and count >> number >> of unique_elem> for i in range(len(unique_elem)): >>> >>>> elem_count[i]= np.sum(reduce(np.logical_or,(abc== x for x >> in [unique_elem[i]])))> This loop is bottleneck because I have about 850 >> unique >> elements and it takes> about 9-10 minutes. Can you suggest a faster way to >> do >> this? >>> >>>> Thank you,> Naresh> >>> no.unique() can return indices and reverse indices. It would be trivial to >> histogram the reverse indices using np.histogram(). >>> >>> Instead of histogram(), you can use bincount() on the inverse indices:u, >>> inv = >> np.unique(abc, return_inverse=True)n = np.bincount(inv)u will be an array of >> the >> unique elements, and n will be an array of the corresponding number of >> occurrences.Warren >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion<at> scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> The histogram() solution works perfect since unique_elem is ordered. I >> appreciate everyone's help. >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion
np.histogram works pretty well. I'm getting speeds something like 1300 ms on float64 data. A hash table-based solution is faster (no big surprise here), about 800ms so in the ballpark of 40% faster. Whenever I get motivated enough I'm going to make a pull request on NumPy with something like khash.h and start fixing all the O(N log N) algorithms floating around that ought to be O(N). NumPy should really have a "match" function similar to R's and a lot of other things. - Wes _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion