I have trouble with some terminology.
And the *need*  for the answers, instead of merely 
using (accurate) approximations for the tests.

On Thu, 18 Dec 2003 08:13:53 GMT, "jackson marshmallow"
<[EMAIL PROTECTED]> wrote:

> I need to write a program that will calculate a non-parametric correlation
> between two time series. The series' length is usually about 1000 points.
> 
> Let's say for Spearman's rho the minimal cost of calculation equals the
> number of data points N. I will also need to compute the p-value. If I use

Surely, that should say something like,  "proportionate to N".
But that's not very complete.  The main cost of rank-order 
correlation is sorting, which is not linear in N; after that, 
you compute a correlation, which is merely linear.

> randomization to determine the p-value, and there are P permutations, then
> the cost is N*P operations.

... randomization ... and "there are P permutations ..." 
Surely, that is referring to the P 'randomizations'  that
were just invoked in the phrase before.

For N=1000, aren't you going to have to do 50 or 100 
thousand randomizations, in order to out-perform the 
precision of the 'approximate'  test that you get by
using the ordinary table for Pearson correlations?
(the Spearman *is*  a Pearson, performed on ranks).

> 
> I understand that the more robust statistic is Kendall's tau (or, in this
> case, its variant, Somer's D), but the cost is N-square/2.


Well, 'robustness'  is not the best word, in my opinion, 
when they are testing different hypotheses.  Effectively,
Spearmans tests for the sum of squared differences in
ranks where Kendall's tests for the simple absolute sum.
Both tests are absolutely "robust"  in terms of being 
full size, assuming  that you start with untied ranks.
(And, for  tiny samples, under 30 or so, Kendall's is 
more awkward since it has the lumpier distribution.)

That is rather similar to the difference between the tests on
a contingency table as performed with Pearson's product-
moment chisquared, in contrast to the Likelihood chisquared.


> 
> The question is this: can I select a limited number of random pairs to
> calculate a valid estimate of Kendall's tau?
> 

That's curious.  Instead of re-allocating all 1000, and
doing that tens of thousands of times in order to get an
exact p-level,  you want to allocate just *part*  of them?
 - I don't think this works.  You could read up on 
'bootstrapping', which usually uses samples that match
the original N.  I think there are reasons for keeping the
sample relatively large,  beyond the obvious symmetry, 
I was less interested in this part of the question.


-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
"Taxes are the price we pay for civilization." 
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to