Hi all, I recently adopted python, and am in the process of replacing my old analysis tools. For simple (e.g., linear) interpolations and extrapolations, in the past I used gnuplot. Today I set up the equivalent with polyfit in numpy v1.7.1, first running a simple test to reproduce the gnuplot result.
A discussion on this list back in February alerted me that I should use 1/sigma for the weights in polyfit as opposed to 1/sigma**2. Fine -- that's not what I'm used to, but I can make a note. http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065649.html Another issue mentioned in that thread is scaling the covariance matrix by fac = resids / (len(x) - order - 2.0) This wreaked havoc on the simple test I mentioned above (and include below), which fit three data points to a straight line. I spent hours trying to figure out why numpy returned negative variances, before tracking down this line. And, indeed, if I add a fake fourth data point, I end up with inf's and nan's. There is some lengthy justification for subtracting that 2 around line 590 in lib/polynomial.py. Fine -- it's nothing I recall seeing before (and I removed it from my local installation), but I'm just a new user. However, I do think it is important to fix polyfit so that it doesn't produce pathological results like those I encountered today. Here are a couple of possibilities that would let the subtraction of 2 remain: * Check whether len(x) > order + 2, and if it is not, either ** Die with an error ** Scale by resids / (len(x) - order) instead of resids / (len(x) - order - 2.0) * Don't bother with this scaling at all, leaving it to the users (who can subtract 2 if they want). This is what scipy.optimize.leastsq does, after what seems to be a good deal of discussion: "This matrix must be multiplied by the residual variance to get the covariance of the parameter estimates – see curve_fit." http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.leastsq.html https://github.com/scipy/scipy/pull/448 I leave it to those of you with more numpy experience to decide what would be the best way to go. Cheers, David http://www-hep.colorado.edu/~schaich/ +++ Here's the simple example: >>> import numpy as np >>> m = np.array([0.008, 0.01, 0.015]) >>> dat = np.array([1.0822582, 1.0805417, 1.0766624]) >>> weight = np.array([1/0.000370, 1/0.000355, 1/0.000249]) >>> out, cov = np.polyfit(m, dat, 1, full=False, w=weight, cov=True) >>> print out, '\n', cov [-0.79269957 1.08854252] [[ -2.34965006e-04 2.84428412e-06] [ 2.84428412e-06 -3.66283662e-08]] >>> print np.sqrt(-1. * cov[0][0]) 0.0153285682895 >>> print np.sqrt(-1. * cov[1][1]) 0.000191385386578 +++ Gnuplot gives +++ Final set of parameters Asymptotic Standard Error ======================= ========================== A = -0.792719 +/- 0.01533 (1.934%) B = 1.08854 +/- 0.0001914 (0.01758%) +++ so up to the negative sign, all is good. For its part, scipy.optimize.leastsq needs me to do the scaling: +++ >>> import numpy as np >>> from scipy import optimize >>> m = np.array([0.008, 0.01, 0.015]) >>> dat = np.array([1.0822582, 1.0805417, 1.0766624]) >>> err = np.array([0.000370, 0.000355, 0.000249]) >>> linear = lambda p, x: p[0] * x + p[1] >>> errfunc = lambda p, x, y, err: (linear(p, x) - y) / err >>> p_in = [-1., 1.] >>> all_out = optimize.leastsq(errfunc, p_in[:], args=(m, dat, err), full_output = 1) >>> out = all_out[0] >>> cov = all_out[1] >>> print out, '\n', cov [-0.79269959 1.08854252] [[ 3.40800756e-03 -4.12544212e-05] [ -4.12544212e-05 5.31270007e-07]] >>> chiSq_dof = ((errfunc(out, m, dat, err))**2).sum() / (len(m) - len(out)) >>> cov *= chiSq_dof >>> print cov [[ 2.34964190e-04 -2.84427528e-06] [ -2.84427528e-06 3.66282716e-08]] +++ _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
