> On Thu, 18 Dec 2003 08:13:53 GMT, "jackson marshmallow" > <[EMAIL PROTECTED]> wrote: > > > I need to write a program that will calculate a non-parametric correlation > > between two time series. The series' length is usually about 1000 points. > > > > Let's say for Spearman's rho the minimal cost of calculation equals the > > number of data points N. I will also need to compute the p-value. If I use > > Surely, that should say something like, "proportionate to N". > But that's not very complete. The main cost of rank-order > correlation is sorting, which is not linear in N; after that, > you compute a correlation, which is merely linear. >
OK, but you only have to sort once, but if you perform randomization, the correlation will have to be computed over and over again, so the cost of sorting will negligible. > > randomization to determine the p-value, and there are P permutations, then > > the cost is N*P operations. > > ... randomization ... and "there are P permutations ..." > Surely, that is referring to the P 'randomizations' that > were just invoked in the phrase before. > Surely. > For N=1000, aren't you going to have to do 50 or 100 > thousand randomizations, in order to out-perform the > precision of the 'approximate' test that you get by > using the ordinary table for Pearson correlations? > (the Spearman *is* a Pearson, performed on ranks). > I think (correct me if I'm wrong) the p-value should be fairly accurate if we do a limited number of random permutations, say, 1000. > > > > I understand that the more robust statistic is Kendall's tau (or, in this > > case, its variant, Somer's D), but the cost is N-square/2. > > > Well, 'robustness' is not the best word, in my opinion, > when they are testing different hypotheses. Effectively, I have series X and series Y. Let Yn = f(Xn). The hypothesis I want to test is that f is a monotonic function. > Spearmans tests for the sum of squared differences in > ranks where Kendall's tests for the simple absolute sum. > Both tests are absolutely "robust" in terms of being > full size, assuming that you start with untied ranks. > (And, for tiny samples, under 30 or so, Kendall's is > more awkward since it has the lumpier distribution.) > > That is rather similar to the difference between the tests on > a contingency table as performed with Pearson's product- > moment chisquared, in contrast to the Likelihood chisquared. > > > > > > The question is this: can I select a limited number of random pairs to > > calculate a valid estimate of Kendall's tau? > > > > That's curious. Instead of re-allocating all 1000, and > doing that tens of thousands of times in order to get an > exact p-level, you want to allocate just *part* of them? > - I don't think this works. You could read up on > 'bootstrapping', which usually uses samples that match > the original N. I think there are reasons for keeping the > sample relatively large, beyond the obvious symmetry, > I was less interested in this part of the question. > I'm sorry... > > -- > Rich Ulrich, [EMAIL PROTECTED] > http://www.pitt.edu/~wpilib/index.html > "Taxes are the price we pay for civilization." . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
