On Thu, Jul 4, 2013 at 8:01 PM, David Schaich <[email protected]> wrote: > Hi all, > > I recently adopted python, and am in the process of replacing my old > analysis tools. For simple (e.g., linear) interpolations and > extrapolations, in the past I used gnuplot. Today I set up the > equivalent with polyfit in numpy v1.7.1, first running a simple test to > reproduce the gnuplot result. > > A discussion on this list back in February alerted me that I should use > 1/sigma for the weights in polyfit as opposed to 1/sigma**2. Fine -- > that's not what I'm used to, but I can make a note. > http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065649.html > > Another issue mentioned in that thread is scaling the covariance matrix by > fac = resids / (len(x) - order - 2.0) > This wreaked havoc on the simple test I mentioned above (and include > below), which fit three data points to a straight line. I spent hours > trying to figure out why numpy returned negative variances, before > tracking down this line. And, indeed, if I add a fake fourth data point, > I end up with inf's and nan's. > > There is some lengthy justification for subtracting that 2 around line > 590 in lib/polynomial.py. Fine -- it's nothing I recall seeing before > (and I removed it from my local installation), but I'm just a new user. > > However, I do think it is important to fix polyfit so that it doesn't > produce pathological results like those I encountered today. Here are a > couple of possibilities that would let the subtraction of 2 remain: > * Check whether len(x) > order + 2, and if it is not, either > ** Die with an error > ** Scale by resids / (len(x) - order) instead of resids / (len(x) - > order - 2.0)
I would throw out the -2, or at least make it optional like `ddof`. (It's not in the docstring AFAICS) returning a negative (!) definite covariance matrix is definitely a bug. (should return nan or raise exception) my 1.5 cents Josef > > * Don't bother with this scaling at all, leaving it to the users (who > can subtract 2 if they want). This is what scipy.optimize.leastsq does, > after what seems to be a good deal of discussion: "This matrix must be > multiplied by the residual variance to get the covariance of the > parameter estimates – see curve_fit." > http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.leastsq.html > https://github.com/scipy/scipy/pull/448 > > I leave it to those of you with more numpy experience to decide what > would be the best way to go. > > Cheers, > David > http://www-hep.colorado.edu/~schaich/ > > > +++ > Here's the simple example: > >>> import numpy as np > >>> m = np.array([0.008, 0.01, 0.015]) > >>> dat = np.array([1.0822582, 1.0805417, 1.0766624]) > >>> weight = np.array([1/0.000370, 1/0.000355, 1/0.000249]) > >>> out, cov = np.polyfit(m, dat, 1, full=False, w=weight, cov=True) > >>> print out, '\n', cov > [-0.79269957 1.08854252] > [[ -2.34965006e-04 2.84428412e-06] > [ 2.84428412e-06 -3.66283662e-08]] > >>> print np.sqrt(-1. * cov[0][0]) > 0.0153285682895 > >>> print np.sqrt(-1. * cov[1][1]) > 0.000191385386578 > +++ > > Gnuplot gives > +++ > Final set of parameters Asymptotic Standard Error > ======================= ========================== > A = -0.792719 +/- 0.01533 (1.934%) > B = 1.08854 +/- 0.0001914 (0.01758%) > +++ > so up to the negative sign, all is good. > > For its part, scipy.optimize.leastsq needs me to do the scaling: > +++ > >>> import numpy as np > >>> from scipy import optimize > >>> m = np.array([0.008, 0.01, 0.015]) > >>> dat = np.array([1.0822582, 1.0805417, 1.0766624]) > >>> err = np.array([0.000370, 0.000355, 0.000249]) > >>> linear = lambda p, x: p[0] * x + p[1] > >>> errfunc = lambda p, x, y, err: (linear(p, x) - y) / err > >>> p_in = [-1., 1.] > >>> all_out = optimize.leastsq(errfunc, p_in[:], args=(m, dat, err), > full_output = 1) > >>> out = all_out[0] > >>> cov = all_out[1] > >>> print out, '\n', cov > [-0.79269959 1.08854252] > [[ 3.40800756e-03 -4.12544212e-05] > [ -4.12544212e-05 5.31270007e-07]] > >>> chiSq_dof = ((errfunc(out, m, dat, err))**2).sum() / (len(m) - > len(out)) > >>> cov *= chiSq_dof > >>> print cov > [[ 2.34964190e-04 -2.84427528e-06] > [ -2.84427528e-06 3.66282716e-08]] > +++ > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
