Please correct me if I misunderstood, but the code in that commit is doing a full sort, somewhat similar to what `scipy.stats.scoreatpercentile`. If that is correct, I will run some benchmarks first, but I think there is value to going forward with a numpy version that extends the current partitioning scheme.
- Joe On Tue, Feb 16, 2016 at 2:39 PM, <josef.p...@gmail.com> wrote: > > > On Tue, Feb 16, 2016 at 1:41 PM, Joseph Fox-Rabinovitz > <jfoxrabinov...@gmail.com> wrote: >> >> Thanks for pointing me to that. I had something a bit different in >> mind but that definitely looks like a good start. >> >> On Tue, Feb 16, 2016 at 1:32 PM, Antony Lee <antony....@berkeley.edu> >> wrote: >> > See earlier discussion here: https://github.com/numpy/numpy/issues/6326 >> > Basically, naïvely sorting may be faster than a not-so-optimized version >> > of >> > quickselect. >> > >> > Antony >> > >> > 2016-02-15 21:49 GMT-08:00 Joseph Fox-Rabinovitz >> > <jfoxrabinov...@gmail.com>: >> >> >> >> I would like to add a `weights` keyword to `np.partition`, >> >> `np.percentile` and `np.median`. My reason for doing so is to to allow >> >> `np.histogram` to process automatic bin selection with weights. >> >> Currently, weights are not supported for the automatic bin selection >> >> and would be difficult to support in `auto` mode without having >> >> `np.percentile` support a `weights` keyword. I suspect that there are >> >> many other uses for such a feature. >> >> >> >> I have taken a preliminary look at the C implementation of the >> >> partition functions that are the basis for `partition`, `median` and >> >> `percentile`. I think that it would be possible to add versions (or >> >> just extend the functionality of existing ones) that check the ratio >> >> of the weights below the partition point to the total sum of the >> >> weights instead of just counting elements. >> >> >> >> One of the main advantages of such an implementation is that it would >> >> allow any real weights to be handled correctly, not just integers. >> >> Complex weights would not be supported. >> >> >> >> The purpose of this email is to see if anybody objects, has ideas or >> >> cares at all about this proposal before I spend a significant amount >> >> of time working on it. For example, did I miss any functions in my >> >> list? >> >> >> >> Regards, >> >> >> >> -Joe >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion@scipy.org >> >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion@scipy.org >> > https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > statsmodels just got weighted quantiles > https://github.com/statsmodels/statsmodels/pull/2707 > > I didn't try to figure out it's computational efficiency, and we would > gladly delegate to whatever fast algorithm would be in numpy. > > Josef > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion