On Tue, Aug 12, 2014 at 11:35 AM, Warren Weckesser < warren.weckes...@gmail.com> wrote:
> I created a pull request (https://github.com/numpy/numpy/pull/4958) that > defines the function `count_unique`. `count_unique` generates a > contingency table from a collection of sequences. For example, > > In [7]: x = [1, 1, 1, 1, 2, 2, 2, 2, 2] > > In [8]: y = [3, 4, 3, 3, 3, 4, 5, 5, 5] > > In [9]: (xvals, yvals), counts = count_unique(x, y) > > In [10]: xvals > Out[10]: array([1, 2]) > > In [11]: yvals > Out[11]: array([3, 4, 5]) > > In [12]: counts > Out[12]: > array([[3, 1, 0], > [1, 1, 3]]) > > > It can be interpreted as a multi-argument generalization of `np.unique(x, > return_counts=True)`. > > It overlaps with Pandas' `crosstab`, but I think this is a pretty > fundamental counting operation that fits in numpy. > > Matlab's `crosstab` (http://www.mathworks.com/help/stats/crosstab.html) > and R's `table` perform the same calculation (with a few more bells and > whistles). > > > For comparison, here's Pandas' `crosstab` (same `x` and `y` as above): > > In [28]: import pandas as pd > > In [29]: xs = pd.Series(x) > > In [30]: ys = pd.Series(y) > > In [31]: pd.crosstab(xs, ys) > Out[31]: > col_0 3 4 5 > row_0 > 1 3 1 0 > 2 1 1 3 > > > And here is R's `table`: > > > x <- c(1,1,1,1,2,2,2,2,2) > > y <- c(3,4,3,3,3,4,5,5,5) > > table(x, y) > y > x 3 4 5 > 1 3 1 0 > 2 1 1 3 > > > Is there any interest in adding this (or some variation of it) to numpy? > > > Warren > > While searching StackOverflow in the numpy tag for "count unique", I just discovered that I basically reinvented Eelco Hoogendoorn's code in his answer to http://stackoverflow.com/questions/10741346/numpy-frequency-counts-for-unique-values-in-an-array. Nice one, Eelco! Warren
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion