On Tue, Apr 23, 2013 at 6:16 PM, Sebastian Berg <[email protected]> wrote: > On Tue, 2013-04-23 at 12:13 -0500, Jonathan Helmus wrote: >> Back in December it was pointed out on the scipy-user list[1] that >> numpy has a percentile function which has similar functionality to >> scipy's stats.scoreatpercentile. I've been trying to harmonize these >> two functions into a single version which has the features of both. >> Scipy PR 374[2] introduced a version which look the parameters from >> both the scipy and numpy percentile function and was accepted into Scipy >> with the plan that it would be depreciated when a similar function was >> introduced into Numpy. Then I moved to enhancing the Numpy version with >> Pull Request 2970 [3]. With some input from Sebastian Berg the >> percentile function was rewritten with further vectorization, but >> neither of us felt fully comfortable with the final product. Can >> someone look at implementation in the PR and suggest what should be done >> from here? >> > > Thanks! For me the main question is the vectorized usage when both > haystack (`a`) and needle (`q`) are vectorized. What I mean is for: > > np.percentile(np.random.randn(n1, n2, N), [25., 50., 75.], axis=-1) > > I would probably expect an output shape of (n1, n2, 3), but currently > you will get the needle dimensions first, because it is roughly the same > as > > [np.percentile(np.random.randn(n1, n2, N), q, axis=-1) for q in [25., 50., > 75.]] > > so for the (probably rare) vectorization of both `a` and `q`, would it > be preferable to do some kind of long term behaviour change, or just put > the dimensions in `q` first, which should be compatible to the current > list?
I don't have much of a preference either way, but I'm glad this is going into numpy. We can work with it either way. In stats, the most common case will be axis=0, and then the two are the same, aren't they? What I like about the second version is unrolling (with 2 or 3 quantiles), which I think will work u, l = np.random.randn(2,5) or res = np.percentile(...) func(*res) The first case will be nicer when there are lots of percentiles, but I guess I won't need it much except for axis=0. Actually, I would prefer the second version, because it might be a bit more cumbersome to get the individual percentiles out if the axis is somewhere in the middle, however I don't think I have a case like that. The first version would be consistent with reduceat, and that would be more numpythonic. I would go for that in numpy. my 2.5c Josef > > Regards, > > Sebastian > >> Cheers, >> >> - Jonathan Helmus >> >> >> [1] http://thread.gmane.org/gmane.comp.python.scientific.user/33331 >> [2] https://github.com/scipy/scipy/pull/374 >> [3] https://github.com/numpy/numpy/pull/2970 >> _______________________________________________ >> NumPy-Discussion mailing list >> [email protected] >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
