Hi Sudheer, Le 14/03/2013 10:18, Sudheer Joseph a écrit : > Dear Numpy/Scipy experts, > Attached is a script > which I made to test the numpy.correlate ( which is called py > plt.xcorr) to see how the cross correlation is calculated. From this > it appears the if i call plt.xcorr(x,y) > Y is slided back in time compared to x. ie if y is a process that > causes a delayed response in x after 5 timesteps then there should be > a high correlation at Lag 5. However in attached plot the response is > seen in only -ve side of the lags. > Can any one advice me on how to see which way exactly the 2 series > are slided back or forth.? and understand the cause result relation > better?( I understand merely by correlation one cannot assume cause > and result relation, but it is important to know which series is older > in time at a given lag. You indeed pointed out a lack of documentation of in matplotlib.xcorr function because the definition of covariance can be ambiguous.
The way I would try to get an interpretation of xcorr function (& its friends) is to go back to the theoretical definition of cross-correlation, which is a normalized version of the covariance. In your example you've created a time series X(k) and a lagged one : Y(k) = X(k-5) Now, the covariance function of X and Y is commonly defined as : Cov_{X,Y}(h) = E(X(k+h) * Y(k)) where E is the expectation (assuming that X and Y are centered for the sake of clarity). If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This yields naturally the fact that the covariance is indeed maximal at h=-5 and not h=+5. Note that this reasoning does yield the opposite result with a different definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h)) (and that's what I first did !). Therefore, I think there should be a definition in of cross correlation in matplotlib xcorr docstring. In R's acf doc, there is this mention : "The lag k value returned by ccf(x, y) estimates the correlation between x[t+k] and y[t]. " (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html) Now I believe, this upper discussion really belongs to matplotlib ML. I'll put an issue on github (I just spotted a mistake the definition of normalization anyway) Coming back to numpy : There's a strange thing, the definition of numpy.correlate seems to give the other definition "z[k] = sum_n a[n] * conj(v[n+k])" ( http://docs.scipy.org/doc/numpy/reference/generated/numpy.correlate.html) although its usage prooves otherwise. What did I miss ? best, Pierre
signature.asc
Description: OpenPGP digital signature
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion