On Wed, Feb 27, 2013 at 6:46 AM, David Pine <djp...@gmail.com> wrote:

> As of NumPy v1.7, numpy.polyfit includes an option for providing weighting
> to data to be fit.  It's a welcome addition, but the implementation seems a
> bit non-standard, perhaps even wrong, and I wonder if someone can enlighten
> me.
>
> 1. The documentation does not specify what the weighting array "w" is
> supposed to be.  After some fooling around I figured out that it is
> 1/sigma, where sigma is the standard deviation uncertainty "sigma" in the y
> data.  A better implementation, which would be consistent with how
> weighting is done in scipy.optimize.curve_fit, would be to specify "sigma"
> directly, rather than "w".  This is typically what the user has at hand,
> this syntax is more transparent, and it would make polyfit consistent with
> the nonlinear fitting routine scipy.optimize.curve_fit.
>

Weights can be used for more things than sigma. Another common use is to
set the weight to zero for bad data points. Zero is a nicer number than
inf, although that would probably work for ieee numbers. Inf and 0/inf == 0
are modern innovations, so multiplication weights are somewhat traditional.
Weights of zero do need to be taken into account in counting the degrees of
freedom however.


>
> 2. The definition of the covariance matrix in polyfit is peculiar (or
> maybe wrong, I'm not sure).  Towards the end of the code for polyfit, the
> standard covariance matrix is calculated and given the variable name
> "Vbase".  However, instead of returning Vbase as the covariance matrix,
> polyfit returns the quantity Vbase * fac, where fac = resids / (len(x) -
> order - 2.0).  fac scales as N, the number of data points, so the
> covariance matrix is about N times larger than it should be for large
> values of N; the increase is smaller for small N (of order 10).  Perhaps
> the authors of polyfit want to use a more conservative error estimate, but
> the references given in the poyfit code make no mention of it.  I think it
> would be much better to return the standard definition of the covariance
> matrix (see, for example, the book Numerical Recipes).
>

The resids scale as N, so fac is approximately constant. The effect of more
data points shows up in  (A^T * A)^-1, which is probably what is called
covariance, but really isn't until it is scaled by fac.

Chuck
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to