> On Thu, 18 Dec 2003 08:13:53 GMT, "jackson marshmallow"
> <[EMAIL PROTECTED]> wrote:
>
> > I need to write a program that will calculate a non-parametric
correlation
> > between two time series. The series' length is usually about 1000
points.
> >
> > Let's say for Spearman's rho the minimal cost of calculation equals the
> > number of data points N. I will also need to compute the p-value. If I
use
>
> Surely, that should say something like,  "proportionate to N".
> But that's not very complete.  The main cost of rank-order
> correlation is sorting, which is not linear in N; after that,
> you compute a correlation, which is merely linear.
>

OK, but you only have to sort once, but if you perform randomization, the
correlation will have to be computed over and over again, so the cost of
sorting will negligible.

> > randomization to determine the p-value, and there are P permutations,
then
> > the cost is N*P operations.
>
> ... randomization ... and "there are P permutations ..."
> Surely, that is referring to the P 'randomizations'  that
> were just invoked in the phrase before.
>

Surely.

> For N=1000, aren't you going to have to do 50 or 100
> thousand randomizations, in order to out-perform the
> precision of the 'approximate'  test that you get by
> using the ordinary table for Pearson correlations?
> (the Spearman *is*  a Pearson, performed on ranks).
>

I think (correct me if I'm wrong) the p-value should be fairly accurate if
we do a limited number of random permutations, say, 1000.

> >
> > I understand that the more robust statistic is Kendall's tau (or, in
this
> > case, its variant, Somer's D), but the cost is N-square/2.
>
>
> Well, 'robustness'  is not the best word, in my opinion,
> when they are testing different hypotheses.  Effectively,

I have series X and series Y. Let Yn = f(Xn). The hypothesis I want to test
is that f is a monotonic function.

> Spearmans tests for the sum of squared differences in
> ranks where Kendall's tests for the simple absolute sum.
> Both tests are absolutely "robust"  in terms of being
> full size, assuming  that you start with untied ranks.
> (And, for  tiny samples, under 30 or so, Kendall's is
> more awkward since it has the lumpier distribution.)
>
> That is rather similar to the difference between the tests on
> a contingency table as performed with Pearson's product-
> moment chisquared, in contrast to the Likelihood chisquared.
>
>
> >
> > The question is this: can I select a limited number of random pairs to
> > calculate a valid estimate of Kendall's tau?
> >
>
> That's curious.  Instead of re-allocating all 1000, and
> doing that tens of thousands of times in order to get an
> exact p-level,  you want to allocate just *part*  of them?
>  - I don't think this works.  You could read up on
> 'bootstrapping', which usually uses samples that match
> the original N.  I think there are reasons for keeping the
> sample relatively large,  beyond the obvious symmetry,
> I was less interested in this part of the question.
>

I'm sorry...

>
> --
> Rich Ulrich, [EMAIL PROTECTED]
> http://www.pitt.edu/~wpilib/index.html
> "Taxes are the price we pay for civilization."


.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to