On Tue, Sep 21, 2010 at 1:29 PM, Gökhan Sever <[email protected]> wrote: > > > On Tue, Sep 21, 2010 at 1:55 AM, Peter Schmidtke <[email protected]> > wrote: >> >> Dear all, >> >> I'd like to know if there is a pythonic / numpy way of retrieving unique >> lines of a 2d numpy array. >> >> In a way I have this : >> >> [[409 152] >> [409 152] >> [409 152] >> [409 152] >> [409 152] >> [409 152] >> [409 152] >> [409 152] >> [409 152] >> [409 152] >> [409 152] >> [426 193] >> [431 129]] >> >> And I'd like to get this : >> >> [[409 152] >> [426 193] >> [431 129]] >> >> >> How can I do this without workarounds like string concatenation or such >> things? Numpy.unique flattens the whole array so it's not really of use >> here. > > Here is one alternative: > I[15]: a = np.array([[409, 152], [409, 152], [426, 193], [431, 129]]) > I[16]: np.array(list(set(tuple(i) for i in a.tolist()))) > O[16]: > array([[409, 152], > [426, 193], > [431, 129]]) > I[6]: %timeit > np.unique(a.view([('',a.dtype)]*a.shape[1])).view(a.dtype).reshape(-1,a.shape[1]) > 10000 loops, best of 3: 51 us per loop > I[8]: %timeit np.array(list(set(tuple(i) for i in a.tolist()))) > 10000 loops, best of 3: 31.4 us per loop > # Try with a bigger array > I[9]: k = np.array((a.tolist()*50000)) > I[10]: %timeit np.array(list(set(tuple(i) for i in k.tolist()))) > 1 loops, best of 3: 324 ms per loop > I[11]: %timeit > np.unique(k.view([('',k.dtype)]*k.shape[1])).view(k.dtype).reshape(-1,k.shape[1]) > 1 loops, best of 3: 790 ms per loop
I'm a bit surprised, I think np.unique does some extra work to maintain the order. The tolist() might not be necessary if you iterate over rows. Josef > > Seems like faster on these tests comparing to the unique method. Also it is > more readable. Still not uber Pythonic. Haskell has "nub" to remove > duplicate list > elements. http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/Data-List.html#v%3Anub > -- > Gökhan > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
