On Wed, 2013-04-24 at 12:03 -0400, [email protected] wrote: > On Wed, Apr 24, 2013 at 4:11 AM, Sebastian Berg > <[email protected]> wrote: > > On Tue, 2013-04-23 at 23:33 -0400, [email protected] wrote: > >> On Tue, Apr 23, 2013 at 6:16 PM, Sebastian Berg > >> <[email protected]> wrote: > >> > On Tue, 2013-04-23 at 12:13 -0500, Jonathan Helmus wrote: > >> >> Back in December it was pointed out on the scipy-user list[1] that > >> >> numpy has a percentile function which has similar functionality to > >> >> scipy's stats.scoreatpercentile. I've been trying to harmonize these > >> >> two functions into a single version which has the features of both. > >> >> Scipy PR 374[2] introduced a version which look the parameters from > >> >> both the scipy and numpy percentile function and was accepted into Scipy > >> >> with the plan that it would be depreciated when a similar function was > >> >> introduced into Numpy. Then I moved to enhancing the Numpy version with > >> >> Pull Request 2970 [3]. With some input from Sebastian Berg the > >> >> percentile function was rewritten with further vectorization, but > >> >> neither of us felt fully comfortable with the final product. Can > >> >> someone look at implementation in the PR and suggest what should be done > >> >> from here? > >> >> > >> > > >> > Thanks! For me the main question is the vectorized usage when both > >> > haystack (`a`) and needle (`q`) are vectorized. What I mean is for: > >> > > >> > np.percentile(np.random.randn(n1, n2, N), [25., 50., 75.], axis=-1) > >> > > >> > I would probably expect an output shape of (n1, n2, 3), but currently > >> > you will get the needle dimensions first, because it is roughly the same > >> > as > >> > > >> > [np.percentile(np.random.randn(n1, n2, N), q, axis=-1) for q in [25., > >> > 50., 75.]] > >> > > >> > so for the (probably rare) vectorization of both `a` and `q`, would it > >> > be preferable to do some kind of long term behaviour change, or just put > >> > the dimensions in `q` first, which should be compatible to the current > >> > list? > >> > >> I don't have much of a preference either way, but I'm glad this is > >> going into numpy. > >> We can work with it either way. > >> > >> In stats, the most common case will be axis=0, and then the two are > >> the same, aren't they? > >> > >> What I like about the second version is unrolling (with 2 or 3 > >> quantiles), which I think will work > >> > >> u, l = np.random.randn(2,5) > >> or > >> res = np.percentile(...) > >> func(*res) > >> > >> The first case will be nicer when there are lots of percentiles, but I > >> guess I won't need it much except for axis=0. > >> > >> Actually, I would prefer the second version, because it might be a bit > >> more cumbersome to get the individual percentiles out if the axis is > >> somewhere in the middle, however I don't think I have a case like > >> that. > >> > > > > I never thought about the axis being where to insert the dimensions of > > the quantiles. That would be a third option. It feels simpler to me to > > just always use the end (or the start) though. > > If the choices are start or end, then I prefer start for unpacking. >
I missed the reduceat argument, it kind of makes sense to me (and usually we will have either axis=0 or axis=-1 I guess). I was going to check what searchsorted does, but it doesn't vectorize :). Sebastian > Josef > > > > > - Sebastian > > > >> The first version would be consistent with reduceat, and that would be > >> more numpythonic. I would go for that in numpy. > >> > >> my 2.5c > >> > >> Josef > >> > >> > > >> > Regards, > >> > > >> > Sebastian > >> > > >> >> Cheers, > >> >> > >> >> - Jonathan Helmus > >> >> > >> >> > >> >> [1] http://thread.gmane.org/gmane.comp.python.scientific.user/33331 > >> >> [2] https://github.com/scipy/scipy/pull/374 > >> >> [3] https://github.com/numpy/numpy/pull/2970 > >> >> _______________________________________________ > >> >> NumPy-Discussion mailing list > >> >> [email protected] > >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> >> > >> > > >> > > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > [email protected] > >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> [email protected] > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > [email protected] > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
