On Fri, Aug 27, 2010 at 15:21, Nathaniel Smith <n...@pobox.com> wrote: > On Fri, Aug 27, 2010 at 1:17 PM, Robert Kern <robert.k...@gmail.com> wrote: >> But in any case, that would be very slow for large arrays since it >> would invoke a Python function call for every value in ar. Instead, >> iterate over the valid array, which is much shorter: >> >> mask = np.zeros(ar.shape, dtype=bool) >> for good in valid: >> mask |= (ar == good) >> >> Wrap that up into a function and you're good to go. That's about as >> efficient as it gets unless if the valid array gets large. > > Probably even more efficient if 'ar' is large and 'valid' is small, > and shorter to boot: > > np.in1d(ar, valid)
Not according to my timings: [~] |2> def kern_in(x, valid): ..> mask = np.zeros(x.shape, dtype=bool) ..> for good in valid: ..> mask |= (x == good) ..> return mask ..> [~] |6> ar = np.random.randint(100, size=1000000) [~] |7> valid = np.arange(0, 100, 5) [~] |8> %timeit kern_in(ar, valid) 10 loops, best of 3: 115 ms per loop [~] |9> %timeit np.in1d(ar, valid) 1 loops, best of 3: 279 ms per loop As valid gets larger, in1d() will catch up but for smallish sizes of valid, which I suspect given the "non-numeric" nature of the OP's (Hi, Brett!) request, kern_in() is usually better. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion