Re: [Numpy-discussion] unique rows of array
Josef, Many thanks for the example! It should become an official NumPy recipe :) Thanks again, Masha liu...@usc.edu On Aug 17, 2009, at 10:03 PM, josef.p...@gmail.com wrote: On Tue, Aug 18, 2009 at 12:59 AM, Maria Liukisliu...@usc.edu wrote: On Aug 17, 2009, at 9:51 PM, Charles R Harris wrote: On Mon, Aug 17, 2009 at 10:30 PM, Maria Liukis liu...@usc.edu wrote: Hello everybody, While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's unique(array, 'rows') to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example: Just to be clear, do you mean finding all rows that only occur once in the array? Yes. I interpreted your question as removing duplicates. It keeps rows that occur more than once. That's what my example is intended to do. Josef snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] unique rows of array
On Tue, Aug 18, 2009 at 2:01 AM, Maria Liukisliu...@usc.edu wrote: Josef, Many thanks for the example! It should become an official NumPy recipe :) Thanks again, Masha liu...@usc.edu Actually, there is also an implementation of unique rows in scipy.stats._support. It uses loops (and array concatenation in the loop), but it preserves the order of the rows in the array. In general, I don't recommend using scipy.stats._support, since many or most functions are not tested and only some are used in scipy.stats. These functions wait for a rewrite or removal. When I thought about a rewrite last year, I didn't know much about structured arrays and views. Josef cc array([[10, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) scipy.stats._support.unique(cc) array([[10, 1, 2], [ 3, 4, 5], [ 9, 10, 11]]) unique columns using transpose : cct = cc.T.copy() cct array([[10, 3, 3, 9], [ 1, 4, 4, 10], [ 2, 5, 5, 11]]) scipy.stats._support.unique(cct.T).T array([[10, 3, 9], [ 1, 4, 10], [ 2, 5, 11]]) Josef On Aug 17, 2009, at 10:03 PM, josef.p...@gmail.com wrote: On Tue, Aug 18, 2009 at 12:59 AM, Maria Liukisliu...@usc.edu wrote: On Aug 17, 2009, at 9:51 PM, Charles R Harris wrote: On Mon, Aug 17, 2009 at 10:30 PM, Maria Liukis liu...@usc.edu wrote: Hello everybody, While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's unique(array, 'rows') to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example: Just to be clear, do you mean finding all rows that only occur once in the array? Yes. I interpreted your question as removing duplicates. It keeps rows that occur more than once. That's what my example is intended to do. Josef snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] unique rows of array
Hello everybody, While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's unique (array, 'rows') to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example: ## A SNIPPET FROM THE DISCUSSION [Numpy-discussion] Finding unique rows in an array [Was: Finding a row match within a numpy array] A Tuesday 21 August 2007, Mark.Miller escrigué: A slightly related question on this topic... Is there a good loopless way to identify all of the unique rows in an array? Something like numpy.unique() is ideal, but capable of extracting unique subarrays along an axis. You can always do a view of the rows as strings and then use unique(). Here is an example: In [1]: import numpy In [2]: a=numpy.arange(12).reshape(4,3) In [3]: a[2]=(3,4,5) In [4]: a Out[4]: array([[ 0, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) now, create the view and select the unique rows: In [5]: b=numpy.unique(a.view('S%d'%a.itemsize*a.shape[0])).view('i4') and finally restore the shape: In [6]: b.reshape((len(b)/a.shape[1], a.shape[1])) Out[6]: array([[ 0, 1, 2], [ 3, 4, 5], [ 9, 10, 11]]) If you want to find unique columns instead of rows, do a tranpose first on the initial array. END OF DISCUSSION Provided example works only because array elements are row-sorted. Changing tested array to (in my case, it's 'c'): c array([[ 0, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) c[0] = (11, 10, 0) c array([[11, 10, 0], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) b = np.unique(c.view('S%s' %c.itemsize*c.shape[0])) b array(['', '\x03', '\x04', '\x05', '\t', '\n', '\x0b'], dtype='|S4') b.view('i4') array([ 0, 3, 4, 5, 9, 10, 11]) b.reshape((len(b)/c.shape[1], c.shape[1])).view('i4') Traceback (most recent call last): File stdin, line 1, in module ValueError: total size of new array must be unchanged Since len(b) = 7. Suggested approach would work if the whole row would be converted to a single string, I guess. But from what I could gather, numpy.array.view() only changes display element-wise. Before I start re-inventing the wheel, I was just wondering if using existing numpy functionality one could find unique rows in an array. Many thanks in advance! Masha liu...@usc.edu ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] unique rows of array
On Tue, Aug 18, 2009 at 12:30 AM, Maria Liukisliu...@usc.edu wrote: Hello everybody, While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's unique(array, 'rows') to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example: ## A SNIPPET FROM THE DISCUSSION [Numpy-discussion] Finding unique rows in an array [Was: Finding a row match within a numpy array] A Tuesday 21 August 2007, Mark.Miller escrigué: A slightly related question on this topic... Is there a good loopless way to identify all of the unique rows in an array? Something like numpy.unique() is ideal, but capable of extracting unique subarrays along an axis. You can always do a view of the rows as strings and then use unique(). Here is an example: In [1]: import numpy In [2]: a=numpy.arange(12).reshape(4,3) In [3]: a[2]=(3,4,5) In [4]: a Out[4]: array([[ 0, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) now, create the view and select the unique rows: In [5]: b=numpy.unique(a.view('S%d'%a.itemsize*a.shape[0])).view('i4') and finally restore the shape: In [6]: b.reshape((len(b)/a.shape[1], a.shape[1])) Out[6]: array([[ 0, 1, 2], [ 3, 4, 5], [ 9, 10, 11]]) If you want to find unique columns instead of rows, do a tranpose first on the initial array. END OF DISCUSSION Provided example works only because array elements are row-sorted. Changing tested array to (in my case, it's 'c'): c array([[ 0, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) c[0] = (11, 10, 0) c array([[11, 10, 0], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) b = np.unique(c.view('S%s' %c.itemsize*c.shape[0])) b array(['', '\x03', '\x04', '\x05', '\t', '\n', '\x0b'], dtype='|S4') b.view('i4') array([ 0, 3, 4, 5, 9, 10, 11]) b.reshape((len(b)/c.shape[1], c.shape[1])).view('i4') Traceback (most recent call last): File stdin, line 1, in module ValueError: total size of new array must be unchanged Since len(b) = 7. Suggested approach would work if the whole row would be converted to a single string, I guess. But from what I could gather, numpy.array.view() only changes display element-wise. Before I start re-inventing the wheel, I was just wondering if using existing numpy functionality one could find unique rows in an array. Many thanks in advance! Masha liu...@usc.edu one way is to convert to structured array c = np.array([[ 0, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) np.unique1d(c.view([('',c.dtype)]*c.shape[1])).view(c.dtype).reshape(-1,c.shape[1]) array([[ 0, 1, 2], [ 3, 4, 5], [ 9, 10, 11]]) for explanation, I asked a similar question last december about sortrows. (I never remember, when I need the last reshape and when not) Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] unique rows of array
On Mon, Aug 17, 2009 at 10:30 PM, Maria Liukis liu...@usc.edu wrote: Hello everybody, While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's unique(array, 'rows') to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example: Just to be clear, do you mean finding all rows that only occur once in the array? snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] unique rows of array
Josef, Thanks, I'll try that and will search for your question from last december :) Masha liu...@usc.edu On Aug 17, 2009, at 9:44 PM, josef.p...@gmail.com wrote: On Tue, Aug 18, 2009 at 12:30 AM, Maria Liukisliu...@usc.edu wrote: Hello everybody, While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's unique(array, 'rows') to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example: ## A SNIPPET FROM THE DISCUSSION [Numpy-discussion] Finding unique rows in an array [Was: Finding a row match within a numpy array] A Tuesday 21 August 2007, Mark.Miller escrigué: A slightly related question on this topic... Is there a good loopless way to identify all of the unique rows in an array? Something like numpy.unique() is ideal, but capable of extracting unique subarrays along an axis. You can always do a view of the rows as strings and then use unique (). Here is an example: In [1]: import numpy In [2]: a=numpy.arange(12).reshape(4,3) In [3]: a[2]=(3,4,5) In [4]: a Out[4]: array([[ 0, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) now, create the view and select the unique rows: In [5]: b=numpy.unique(a.view('S%d'%a.itemsize*a.shape[0])).view ('i4') and finally restore the shape: In [6]: b.reshape((len(b)/a.shape[1], a.shape[1])) Out[6]: array([[ 0, 1, 2], [ 3, 4, 5], [ 9, 10, 11]]) If you want to find unique columns instead of rows, do a tranpose first on the initial array. END OF DISCUSSION Provided example works only because array elements are row-sorted. Changing tested array to (in my case, it's 'c'): c array([[ 0, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) c[0] = (11, 10, 0) c array([[11, 10, 0], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) b = np.unique(c.view('S%s' %c.itemsize*c.shape[0])) b array(['', '\x03', '\x04', '\x05', '\t', '\n', '\x0b'], dtype='|S4') b.view('i4') array([ 0, 3, 4, 5, 9, 10, 11]) b.reshape((len(b)/c.shape[1], c.shape[1])).view('i4') Traceback (most recent call last): File stdin, line 1, in module ValueError: total size of new array must be unchanged Since len(b) = 7. Suggested approach would work if the whole row would be converted to a single string, I guess. But from what I could gather, numpy.array.view() only changes display element-wise. Before I start re-inventing the wheel, I was just wondering if using existing numpy functionality one could find unique rows in an array. Many thanks in advance! Masha liu...@usc.edu one way is to convert to structured array c = np.array([[ 0, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) np.unique1d(c.view([('',c.dtype)]*c.shape[1])).view (c.dtype).reshape(-1,c.shape[1]) array([[ 0, 1, 2], [ 3, 4, 5], [ 9, 10, 11]]) for explanation, I asked a similar question last december about sortrows. (I never remember, when I need the last reshape and when not) Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] unique rows of array
On Aug 17, 2009, at 9:51 PM, Charles R Harris wrote: On Mon, Aug 17, 2009 at 10:30 PM, Maria Liukis liu...@usc.edu wrote: Hello everybody, While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's unique(array, 'rows') to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example: Just to be clear, do you mean finding all rows that only occur once in the array? Yes. snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] unique rows of array
On Tue, Aug 18, 2009 at 12:59 AM, Maria Liukisliu...@usc.edu wrote: On Aug 17, 2009, at 9:51 PM, Charles R Harris wrote: On Mon, Aug 17, 2009 at 10:30 PM, Maria Liukis liu...@usc.edu wrote: Hello everybody, While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's unique(array, 'rows') to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example: Just to be clear, do you mean finding all rows that only occur once in the array? Yes. I interpreted your question as removing duplicates. It keeps rows that occur more than once. That's what my example is intended to do. Josef snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] unique rows of array
On Tue, Aug 18, 2009 at 1:03 AM, josef.p...@gmail.com wrote: On Tue, Aug 18, 2009 at 12:59 AM, Maria Liukisliu...@usc.edu wrote: On Aug 17, 2009, at 9:51 PM, Charles R Harris wrote: On Mon, Aug 17, 2009 at 10:30 PM, Maria Liukis liu...@usc.edu wrote: Hello everybody, While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's unique(array, 'rows') to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example: Just to be clear, do you mean finding all rows that only occur once in the array? Yes. I interpreted your question as removing duplicates. It keeps rows that occur more than once. That's what my example is intended to do. Josef snip Chuck Just a reminder about views on views, I don't think the recommendation to take the transpose to get unique columns works. We had the discussion some time ago, that views work on the original array data and not on the view, and in this case the transpose creates a view. example below Also, unique does a sort and doesn't preserve order. Josef c=np.array([[ 10, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) cc = c.copy() #backup c = cc.T cc array([[10, 1, 2], [ 3, 4, 5], [ 3, 4, 5], [ 9, 10, 11]]) np.unique1d(c.view([('',c.dtype)]*c.shape[1])).view(c.dtype).reshape(-1,c.shape[1]) Traceback (most recent call last): File pyshell#46, line 1, in module np.unique1d(c.view([('',c.dtype)]*c.shape[1])).view(c.dtype).reshape(-1,c.shape[1]) ValueError: new type not compatible with array. c = cc.T.copy() c array([[10, 3, 3, 9], [ 1, 4, 4, 10], [ 2, 5, 5, 11]]) np.unique1d(c.view([('',c.dtype)]*c.shape[1])).view(c.dtype).reshape(-1,c.shape[1]) array([[ 1, 4, 4, 10], [ 2, 5, 5, 11], [10, 3, 3, 9]]) c = np.ascontiguousarray(cc.T) np.unique1d(c.view([('',c.dtype)]*c.shape[1])).view(c.dtype).reshape(-1,c.shape[1]) array([[ 1, 4, 4, 10], [ 2, 5, 5, 11], [10, 3, 3, 9]]) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] unique rows of array
On Aug 17, 2009, at 10:03 PM, josef.p...@gmail.com wrote: On Tue, Aug 18, 2009 at 12:59 AM, Maria Liukisliu...@usc.edu wrote: On Aug 17, 2009, at 9:51 PM, Charles R Harris wrote: On Mon, Aug 17, 2009 at 10:30 PM, Maria Liukis liu...@usc.edu wrote: Hello everybody, While re-implementing some Matlab code in Python, I've run into a problem of finding a NumPy function analogous to the Matlab's unique(array, 'rows') to get unique rows of an array. Searching the web, I've found a similar discussion from couple of years ago with an example: Just to be clear, do you mean finding all rows that only occur once in the array? Sorry, I think it shows that I should stop working pass 10pm :) Yes. I interpreted your question as removing duplicates. It keeps rows that occur more than once. Yes, I meant keeping only unique (without duplicates) rows. That's what my example is intended to do. Josef snip Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion