[Numpy-discussion] Re: Percentile/Quantile "interpolation" refactor

Sebastian Berg Thu, 04 Nov 2021 15:16:05 -0700

On Wed, 2021-10-13 at 10:25 -0500, Sebastian Berg wrote:
> Hi all,
> 
> after a long time Abel has helped us and refactored the quantile and
> percentile functions' `interpolation` keyword.
>


This PR is now merged to be included in the upcoming 1.22 release. 
Please don't hesitate in case there is any concern about it, all notes
from the old email remain unchanged.

There is a good chance that the documentation could use a bit of
revising so input would be greatly appreciated!

The one thing that I definitely plan to do before the next release is
to rename the `interpolation` keyword argument to `method`. Method
seems a much clearer name and it forces users who do not use the
default to consider switching to a more standard methods.
(Only the default version is really described in literature.)

Cheers,

Sebastian


> This was long overdue since NumPy implements three (the non-default)
> interpolation methods that appear to be very much non-standard.  On
> the
> other hand, NumPy currently has no unbiased methods (i.e. population
> estimate).
> 
> There are two main questions right now with respect to the API. 
> First
> which names to use for the methods and second, how to deal with
> "outliers".
> 
> 
> The PR
> https://github.com/numpy/numpy/pull/19857#issuecomment-939852134
> adds the methods and gives them (currently) the following names
> (sorted
> by the R methods) – the names will be used as string identifiers:
> 
> 1. inverted cdf
> 2. averaged inverted cdf
> 3. closest observation
> 4. interpolated inverted cdf
> 5. hazen  (name from wolfram)
> 6. weibull  (name from wolfram)
> 7. linear  (default!  Better name deferred)
> 8. median unbiased
> 9. normal unbiased
> 
> And additionally the four ones we currently have:
> 
> * lower
> * higher
> * nearest
> * midpoint
> 
> Number 5. and 6. are named "exclusive" and "inclusive" by Python in
> their `method` keyword argument.  While I like the name `method=` and
> may want to move to it, I am not sure I like "inclusive" and
> "exclusive".
> The current plan was to defer the kwarg rename into a followup,
> although it should be discussed before the next release.
> 
> 
> The second main question is how to deal with outliers (this does not
> affect the default method 7, which finds the sample quantiles and not
> a
> population estimate).  Wikipedia says this:
> 
>     Packages differ in how they estimate quantiles beyond the lowest
>     and highest values in the sample, i.e. p < 1/N and p > (N − 1)/N.
>     Choices include returning an error value, computing linear
>     extrapolation, or assuming a constant value.
> 
> The current choice is clipping (assuming a constant value), but this
> could be modified.
> 
> 
> Any feedback is appreciated!  Otherwise, this will probably move
> forward in the current state for the next release.
> 
> Cheers,
> 
> Sebastian
> _______________________________________________
> NumPy-Discussion mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: [email protected]

signature.asc
Description: This is a digitally signed message part

_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: [email protected]

[Numpy-discussion] Re: Percentile/Quantile "interpolation" refactor

Reply via email to