On Mon, Mar 18, 2013 at 1:10 PM, Skipper Seabold <jsseab...@gmail.com> wrote: > On Mon, Mar 18, 2013 at 1:00 PM, Pierre Haessig <pierre.haes...@crans.org> > wrote: >> >> Hi Sudheer, >> >> Le 14/03/2013 10:18, Sudheer Joseph a écrit : >> >> Dear Numpy/Scipy experts, >> Attached is a script which I >> made to test the numpy.correlate ( which is called py plt.xcorr) to see how >> the cross correlation is calculated. From this it appears the if i call >> plt.xcorr(x,y) >> Y is slided back in time compared to x. ie if y is a process that causes a >> delayed response in x after 5 timesteps then there should be a high >> correlation at Lag 5. However in attached plot the response is seen in only >> -ve side of the lags. >> Can any one advice me on how to see which way exactly the 2 series are >> slided back or forth.? and understand the cause result relation better?( I >> understand merely by correlation one cannot assume cause and result >> relation, but it is important to know which series is older in time at a >> given lag. >> >> You indeed pointed out a lack of documentation of in matplotlib.xcorr >> function because the definition of covariance can be ambiguous. >> >> The way I would try to get an interpretation of xcorr function (& its >> friends) is to go back to the theoretical definition of cross-correlation, >> which is a normalized version of the covariance. >> >> In your example you've created a time series X(k) and a lagged one : Y(k) >> = X(k-5) >> >> Now, the covariance function of X and Y is commonly defined as : >> Cov_{X,Y}(h) = E(X(k+h) * Y(k)) where E is the expectation >> (assuming that X and Y are centered for the sake of clarity). >> >> If I plug in the definition of Y, I get Cov(h) = E(X(k+h) * X(k-5)). This >> yields naturally the fact that the covariance is indeed maximal at h=-5 and >> not h=+5. >> >> Note that this reasoning does yield the opposite result with a different >> definition of the covariance, ie. Cov_{X,Y}(h) = E(X(k) * Y(k+h)) (and >> that's what I first did !). >> >> >> Therefore, I think there should be a definition in of cross correlation in >> matplotlib xcorr docstring. In R's acf doc, there is this mention : "The lag >> k value returned by ccf(x, y) estimates the correlation between x[t+k] and >> y[t]. " >> (see http://stat.ethz.ch/R-manual/R-devel/library/stats/html/acf.html) >> >> Now I believe, this upper discussion really belongs to matplotlib ML. I'll >> put an issue on github (I just spotted a mistake the definition of >> normalization anyway) > > > > You might be interested in the statsmodels implementation which should be > similar to the R functionality. > > http://nbviewer.ipython.org/urls/raw.github.com/jseabold/tutorial/master/tsa_arma.ipynb > http://statsmodels.sourceforge.net/devel/generated/statsmodels.tsa.stattools.acf.html > http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.tsaplots.plot_acf.html
we don't have any cross-correlation xcorr, AFAIR but I guess it should work the same way. Josef > > Skipper > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion