Neal Becker wrote:
> I noticed that if I generate complex rv i.i.d. with var=1, that numpy says:
> 
> var (<real part>) -> (close to 1.0)
> var (<imag part>) -> (close to 1.0)
> 
> but
> 
> var (complex array) -> (close to complex 0)
> 
> Is that not a strange definition?

There is some discussion on this in the tracker.

   http://projects.scipy.org/scipy/numpy/ticket/638

The current state of affairs is that the implementation of var() just naively 
applies the standard formula for real numbers.

   mean((x - mean(x)) ** 2)

I think this is pretty obviously wrong prima facie. AFAIK, no one considers 
this 
a valid definition of variance for complex RVs or in fact a useful value. I 
think we should change this. Unfortunately, there is no single alternative but 
several.

1. Punt. Complex numbers are inherently multidimensional, and a single scale 
parameter doesn't really describe most distributions of complex numbers. 
Instead, you need a real covariance matrix which you can get with cov([z.real, 
z.imag]). This estimates the covariance matrix of a 2-D Gaussian distribution 
over RR^2 (interpreted as CC).

2. Take a slightly less naive formula for the variance which seems to show up 
in 
some texts:

   mean(absolute(z - mean(z)) ** 2)

This estimates the single parameter of a circular Gaussian over RR^2 
(interpreted as CC). It is also the trace of the covariance matrix above.

3. Take the variances of the real and imaginary components independently. This 
is equivalent to taking the diagonal of the covariance matrix above. This 
wouldn't be the definition of "*the* complex variance" that anyone else uses, 
but rather another form of punting. "There isn't a single complex variance to 
give you, but in the spirit of broadcasting, we'll compute the marginal 
variances of each dimension independently."

Personally, I like 1 a lot. I'm hesitant to support 2 until I've seen an actual 
application of that definition. The references I have been given in the ticket 
comments are all early parts of books where the authors are laying out 
definitions without applications. Personally, it feels to me like the authors 
are just sticking in the absolute()'s ex post facto just so they can extend the 
definition they already have to complex numbers. I'm also not a fan of the 
expectation-centric treatments of random variables. IMO, the variance of an 
arbitrary RV isn't an especially important quantity. It's a parameter of a 
Gaussian distribution, and in this case, I see no reason to favor circular 
Gaussians in CC over general ones.

But if someone shows me an actual application of the definition, I can amend my 
view.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to