On Jan 8, 2008 6:54 PM, Robert Kern <[EMAIL PROTECTED]> wrote: > Neal Becker wrote: > > I noticed that if I generate complex rv i.i.d. with var=1, that numpy > says: > > > > var (<real part>) -> (close to 1.0) > > var (<imag part>) -> (close to 1.0) > > > > but > > > > var (complex array) -> (close to complex 0) > > > > Is that not a strange definition? > > There is some discussion on this in the tracker. > > http://projects.scipy.org/scipy/numpy/ticket/638 > > The current state of affairs is that the implementation of var() just > naively > applies the standard formula for real numbers. > > mean((x - mean(x)) ** 2) > > I think this is pretty obviously wrong prima facie. AFAIK, no one > considers this > a valid definition of variance for complex RVs or in fact a useful value. > I > think we should change this. Unfortunately, there is no single alternative > but > several. > > 1. Punt. Complex numbers are inherently multidimensional, and a single > scale > parameter doesn't really describe most distributions of complex numbers. > Instead, you need a real covariance matrix which you can get with cov([ > z.real, > z.imag]). This estimates the covariance matrix of a 2-D Gaussian > distribution > over RR^2 (interpreted as CC). > > 2. Take a slightly less naive formula for the variance which seems to show > up in > some texts: > > mean(absolute(z - mean(z)) ** 2) > > This estimates the single parameter of a circular Gaussian over RR^2 > (interpreted as CC). It is also the trace of the covariance matrix above. > > 3. Take the variances of the real and imaginary components independently. > This > is equivalent to taking the diagonal of the covariance matrix above. This > wouldn't be the definition of "*the* complex variance" that anyone else > uses, > but rather another form of punting. "There isn't a single complex variance > to > give you, but in the spirit of broadcasting, we'll compute the marginal > variances of each dimension independently." > > Personally, I like 1 a lot. I'm hesitant to support 2 until I've seen an > actual > application of that definition. The references I have been given in the > ticket > comments are all early parts of books where the authors are laying out > definitions without applications. Personally, it feels to me like the > authors > are just sticking in the absolute()'s ex post facto just so they can > extend the > definition they already have to complex numbers. I'm also not a fan of the > expectation-centric treatments of random variables. IMO, the variance of > an > arbitrary RV isn't an especially important quantity. It's a parameter of a > Gaussian distribution, and in this case, I see no reason to favor circular > Gaussians in CC over general ones. >
> But if someone shows me an actual application of the definition, I can > amend my > view. > Suppose you have a set of z_i and want to choose z to minimize the average square error $ \sum_i |z_i - z|^2 $. The solution is that $z=\mean{z_i}$ and the resulting average error is given by 2). Note that I didn't mention Gaussians anywhere. No distribution is needed to justify the argument, just the idea of minimizing the squared distance. Leaving out the ^2 would yield another metric, or one could ask for a minmax solution. It is a question of the distance function, not probability. Anyway, that is one justification for the approach in 2) and it is one that makes a lot of applied math simple. Whether of not a least squares fit is useful is different question. Chuck
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion