On Thu, Jan 26, 2012 at 1:25 PM, Bruce Southey <[email protected]> wrote: > On Thu, Jan 26, 2012 at 10:07 AM, Pierre Haessig > <[email protected]> wrote: >> Le 26/01/2012 15:57, Bruce Southey a écrit : >>> Can you please provide a >>> couple of real examples with expected output that clearly show what >>> you want? >>> >> Hi Bruce, >> >> Thanks for your ticket feedback ! It's precisely because I see a big >> potential impact of the proposed change that I send first a ML message, >> second a ticket before jumping to a pull-request like a Sergio Leone's >> cowboy (sorry, I watched "for a few dollars more" last weekend...) >> >> Now, I realize that in the ticket writing I made the wrong trade-off >> between conciseness and accuracy which led to some of the errors you >> raised. Let me try to use your example to try to share what I have in mind. >> >>> >> X = array([-2.1, -1. , 4.3]) >>> >> Y = array([ 3. , 1.1 , 0.12]) >> >> Indeed, with today's cov behavior we have a 2x2 array: >>> >> cov(X,Y) >> array([[ 11.71 , -4.286 ], >> [ -4.286 , 2.14413333]]) >> >> Now, when I used the word 'concatenation', I wasn't precise enough >> because I meant assembling X and Y in the sense of 2 vectors of >> observations from 2 random variables X and Y. >> This is achieved by concatenate(X,Y) *when properly playing with >> dimensions* (which I didn't mentioned) : >>> >> XY = np.concatenate((X[None, :], Y[None, :])) >> array([[-2.1 , -1. , 4.3 ], >> [ 3. , 1.1 , 0.12]]) > > In this context, I find stacking, np.vstack((X,Y)), more appropriate > than concatenate. > >> >> In this case, I can indeed say that "cov(X,Y) is equivalent to cov(XY)". >>> >> np.cov(XY) >> array([[ 11.71 , -4.286 ], >> [ -4.286 , 2.14413333]]) >> > Sure the resulting array is the same but whole process is totally different. > > >> (And indeed, the actual cov Python code does use concatenate() ) > Yes, but the user does not see that. Whereas you are forcing the user > to do the stacking in the correct dimensions. > > >> >> >> Now let me come back to my assertion about this behavior *usefulness*. >> You'll acknowledge that np.cov(XY) is made of four blocks (here just 4 >> simple scalars blocks). > No there are not '4' blocks just rows and columns.
Sturla showed the 4 blocks in his first message. > >> * diagonal blocks are just cov(X) and cov(Y) (which in this case comes >> to var(X) and var(Y) when setting ddof to 1) > Sure but variances are still covariances. > >> * off diagonal blocks are symetric and are actually the covariance >> estimate of X, Y observations (from >> http://en.wikipedia.org/wiki/Covariance) > Sure >> >> that is : >>> >> ((X-X.mean()) * (Y-Y.mean())).sum()/ (3-1) >> -4.2860000000000005 >> >> The new proposed behaviour for cov is that cov(X,Y) would return : >> array(-4.2860000000000005) instead of the 2*2 matrix. > > But how you interpret an 2D array where the rows are greater than 2? >>>> Z=Y+X >>>> np.cov(np.vstack((X,Y,Z))) > array([[ 11.71 , -4.286 , 7.424 ], > [ -4.286 , 2.14413333, -2.14186667], > [ 7.424 , -2.14186667, 5.28213333]]) > > >> >> * This would be in line with the cov(X,Y) mathematical definition, as >> well as with R behavior. > I don't care what R does because I am using Python and Python is > infinitely better than R is! > > But I think that is only in the 1D case. I just checked R to make sure I remember correctly > xx = matrix((1:20)^2, nrow=4) > xx [,1] [,2] [,3] [,4] [,5] [1,] 1 25 81 169 289 [2,] 4 36 100 196 324 [3,] 9 49 121 225 361 [4,] 16 64 144 256 400 > cov(xx, 2*xx[,1:2]) [,1] [,2] [1,] 86.0000 219.3333 [2,] 219.3333 566.0000 [3,] 352.6667 912.6667 [4,] 486.0000 1259.3333 [5,] 619.3333 1606.0000 > cov(xx) [,1] [,2] [,3] [,4] [,5] [1,] 43.0000 109.6667 176.3333 243.0000 309.6667 [2,] 109.6667 283.0000 456.3333 629.6667 803.0000 [3,] 176.3333 456.3333 736.3333 1016.3333 1296.3333 [4,] 243.0000 629.6667 1016.3333 1403.0000 1789.6667 [5,] 309.6667 803.0000 1296.3333 1789.6667 2283.0000 > >> * This would save memory and computing resources. (and therefore help >> save the planet ;-) ) > Nothing that you have provided shows that it will. I don't know about saving the planet, but if X and Y have the same number of columns, we save 3 quarters of the calculations, as Sturla also explained in his first message. Josef > >> >> However, I do understand that the impact for this change may be big. >> This indeed requires careful reviewing. >> >> Pierre >> _______________________________________________ >> NumPy-Discussion mailing list >> [email protected] >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > Bruce > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
