Hi all,

there is a PR to merge very limited support for weights in quantiles,
which given no further input I will probably merge based on sklearn
devs saying that they will use it.  This means, adding a `weights`
kwarg [1]. See:

    https://github.com/numpy/numpy/pull/24254

Limited here means that it would only work for the "inverted_cdf"
method (which is not the default one).

Why is it very limited?  Because this limited version is the only form
we/I am pretty confident about getting it right.

There are various problems with making it more broad:
1. Weights are not clearly defined and can have many meanings, e.g.:
   * frequency weights (repeated observations)
   * probability weights (removing sample biases)
   * "analytic"/"precision" weights (encoding observation
     precision/variance).

2. There is very little to no literature on how to deal with the
   subtleties of dealing with (in the context of the various types
   of weights:
   * Interpolation (relevant to all interpolating methods)
   * Unbiasing (the main difference between the methods)

The PR adds the most minimal thing, where weights are largly equivalent
(no unbiasing issues, no interpolation). [2]

Due to these complexities (and the lack of many statistic specialists
looking at it) there is a point to be made that we just shouldn't add
this in NumPy, but if nobody else has an opinion, I will go with the
sklearn devs who want it :).
(Also with weights we have to rely on full sorting for now, which can
be slow, which I can live with personally.)

- Sebastian


[1] There are different styles of weights and for some method that
clearly matters.  Thus, if we ever expand the definition, it may be
that `weights` has to be mapped to one of these, or that the the
generic `weights` kwarg would raise an error for these that you need to
pick a specific one like `pweights=`, or `fweights=`.

[2] I am not quite sure about "analytic weights" here, but to me these
do not really make sense in the context of a discrete interpolation
method.

_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to