I have trouble with some terminology. And the *need* for the answers, instead of merely using (accurate) approximations for the tests.
On Thu, 18 Dec 2003 08:13:53 GMT, "jackson marshmallow" <[EMAIL PROTECTED]> wrote: > I need to write a program that will calculate a non-parametric correlation > between two time series. The series' length is usually about 1000 points. > > Let's say for Spearman's rho the minimal cost of calculation equals the > number of data points N. I will also need to compute the p-value. If I use Surely, that should say something like, "proportionate to N". But that's not very complete. The main cost of rank-order correlation is sorting, which is not linear in N; after that, you compute a correlation, which is merely linear. > randomization to determine the p-value, and there are P permutations, then > the cost is N*P operations. ... randomization ... and "there are P permutations ..." Surely, that is referring to the P 'randomizations' that were just invoked in the phrase before. For N=1000, aren't you going to have to do 50 or 100 thousand randomizations, in order to out-perform the precision of the 'approximate' test that you get by using the ordinary table for Pearson correlations? (the Spearman *is* a Pearson, performed on ranks). > > I understand that the more robust statistic is Kendall's tau (or, in this > case, its variant, Somer's D), but the cost is N-square/2. Well, 'robustness' is not the best word, in my opinion, when they are testing different hypotheses. Effectively, Spearmans tests for the sum of squared differences in ranks where Kendall's tests for the simple absolute sum. Both tests are absolutely "robust" in terms of being full size, assuming that you start with untied ranks. (And, for tiny samples, under 30 or so, Kendall's is more awkward since it has the lumpier distribution.) That is rather similar to the difference between the tests on a contingency table as performed with Pearson's product- moment chisquared, in contrast to the Likelihood chisquared. > > The question is this: can I select a limited number of random pairs to > calculate a valid estimate of Kendall's tau? > That's curious. Instead of re-allocating all 1000, and doing that tens of thousands of times in order to get an exact p-level, you want to allocate just *part* of them? - I don't think this works. You could read up on 'bootstrapping', which usually uses samples that match the original N. I think there are reasons for keeping the sample relatively large, beyond the obvious symmetry, I was less interested in this part of the question. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html "Taxes are the price we pay for civilization." . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
