Hi all,

I have a little worry with the way Splus calculates the two-sample
Kolmogorov-Smirnov statistics.

Consider the following samples:
---
> ds1
 [1]  1  2  3  4  5  6  7  8  9 10
> ds2
 [1] 10 10 10 10 10 10 10 10 10 10
---

The output of the KS test is:

---
> ks.gof(ds1,ds2)

         Two-Sample Kolmogorov-Smirnov Test

data:  ds1 and ds2
ks = 1, p-value = 0
alternative hypothesis: cdf of ds1 does not equal the
              cdf of ds2 for at least one sample point.
---

However, the difference between the empirical cdf for sample 1 and 2 is:

Interval: ]-oo;1[  [1;2[  [2;3[ ... [9;10[  [10;+oo[
cdf1-cdf2:   0      0.1    0.2  ...   0.9     0

So the KS statistics (max absolute difference between the cdf's)
should be 0.9, not 1 as indicated by Splus (ks = 1 above).

Incidentally this yields an underestimated p-value.

I would very much appreciate comments on this behaviour, in particular,
is this a bug in Splus, and is it well-known. It seems like there are
implications for people relying on this test.

Pointers to appropriate forums/newsgroups also appreciated.

Thanks in advance and best regards from Grenoble,

    Cyril.

---
Cyril Goutte                                 [EMAIL PROTECTED]
INRIA Rhone-Alpes                             Tel: (+33) 4 76 61 55 13
Zirst - 655 avenue de l'Europe - Montbonnot   Fax: (+33) 4 76 61 54 77
38334 Saint Ismier Cedex - France                 www.inrialpes.fr/is2





=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to