On Mon, Apr 16, 2012 at 5:27 PM, Skipper Seabold <[email protected]>wrote:
> Hi, > > I have a pull request here [1] to add a cut function similar to R's > [2]. It seems there are often requests for similar functionality. It's > something I'm making use of for my own work and would like to use in > statstmodels and in generating instances of pandas' Factor class, but > is this generally something people would find useful to warrant its > inclusion in numpy? It will be even more useful I think with an enum > dtype in numpy. > > If you aren't familiar with cut, here's a potential use case. Going > from a continuous to a categorical variable. > > Given a continuous variable > > [~/] > [8]: age = np.random.randint(15,70, size=100) > > [~/] > [9]: age > [9]: > array([58, 32, 20, 25, 34, 69, 52, 27, 20, 23, 51, 61, 39, 54, 39, 44, 27, > 17, 29, 18, 66, 25, 44, 21, 54, 32, 50, 60, 25, 41, 68, 25, 42, 69, > 50, 69, 24, 69, 69, 48, 30, 20, 18, 15, 50, 48, 44, 27, 57, 52, 40, > 27, 58, 45, 44, 32, 54, 19, 36, 32, 55, 17, 55, 15, 19, 29, 22, 25, > 36, 44, 29, 53, 37, 31, 51, 39, 21, 66, 25, 26, 20, 17, 41, 50, 27, > 23, 62, 69, 65, 34, 38, 61, 39, 34, 38, 35, 18, 36, 29, 26]) > > Give me a variable where people are in age groups (lower bound is not > inclusive) > > [~/] > [10]: groups = [14, 25, 35, 45, 55, 70] > > [~/] > [11]: age_cat = np.cut(age, groups) > > [~/] > [12]: age_cat > [12]: > array([5, 2, 1, 1, 2, 5, 4, 2, 1, 1, 4, 5, 3, 4, 3, 3, 2, 1, 2, 1, 5, 1, 3, > 1, 4, 2, 4, 5, 1, 3, 5, 1, 3, 5, 4, 5, 1, 5, 5, 4, 2, 1, 1, 1, 4, 4, > 3, 2, 5, 4, 3, 2, 5, 3, 3, 2, 4, 1, 3, 2, 4, 1, 4, 1, 1, 2, 1, 1, 3, > 3, 2, 4, 3, 2, 4, 3, 1, 5, 1, 2, 1, 1, 3, 4, 2, 1, 5, 5, 5, 2, 3, 5, > 3, 2, 3, 2, 1, 3, 2, 2]) > > Skipper > > [1] https://github.com/numpy/numpy/pull/248 > [2] http://stat.ethz.ch/R-manual/R-devel/library/base/html/cut.html > Is this the same as `np.searchsorted` (with reversed arguments)? In [292]: np.searchsorted(groups, age) Out[292]: array([5, 2, 1, 1, 2, 5, 4, 2, 1, 1, 4, 5, 3, 4, 3, 3, 2, 1, 2, 1, 5, 1, 3, 1, 4, 2, 4, 5, 1, 3, 5, 1, 3, 5, 4, 5, 1, 5, 5, 4, 2, 1, 1, 1, 4, 4, 3, 2, 5, 4, 3, 2, 5, 3, 3, 2, 4, 1, 3, 2, 4, 1, 4, 1, 1, 2, 1, 1, 3, 3, 2, 4, 3, 2, 4, 3, 1, 5, 1, 2, 1, 1, 3, 4, 2, 1, 5, 5, 5, 2, 3, 5, 3, 2, 3, 2, 1, 3, 2, 2])
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
