On Jan 8, 2008 7:48 PM, Robert Kern <[EMAIL PROTECTED]> wrote: > Charles R Harris wrote: > > > Suppose you have a set of z_i and want to choose z to minimize the > > average square error $ \sum_i |z_i - z|^2 $. The solution is that > > $z=\mean{z_i}$ and the resulting average error is given by 2). Note that > > I didn't mention Gaussians anywhere. No distribution is needed to > > justify the argument, just the idea of minimizing the squared distance. > > Leaving out the ^2 would yield another metric, or one could ask for a > > minmax solution. It is a question of the distance function, not > > probability. Anyway, that is one justification for the approach in 2) > > and it is one that makes a lot of applied math simple. Whether of not a > > least squares fit is useful is different question. > > If you're not doing probability, then what are you using var() for? I can > accept > that the quantity is meaningful for your problem, but I'm not convinced > it's a > variance. >
Lots of fits don't involve probability distributions. For instance, one might want to fit a polynomial to a mathematical curve. This sort of distinction between probability and distance goes back to Gauss himself, although not in his original work on least squares. Whether or not variance implies probability is a semantic question. I think if we are going to compute a single number, 2) is as good as anything even if it doesn't capture the shape of the scatter plot. A 2D covariance wouldn't necessarily capture the shape either. Chuck
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion