Re: closeness of distributions

Glen Wed, 17 Sep 2003 21:42:36 -0700

Rajarshi Guha <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>...
> Hi,
>   I'm working on a problem in which I'd like to determine whether two
> sets of data come from the same distribution. It seems that the Kolmogorv
> Smirnov test will give me the information I need.


No test can answer this question. Sometimes a test might give you 
an indication that two samples don't seem to have come from the 
same distribution, but it simply /can't/ tell you that they have
come from the same distribution.

But yes, the K-S two-sample test (often called the Smirnov test) is
a way of testing the null hypothesis that the two samples came from
the same distribution - it's just that failing to reject that null
doesn't mean that the test is telling you they do come from the
same distribution.

> However I'd like to go further than just accept or reject the H0 for the
> test. Is there any way (using this test or someother test) to determine
> *how* similar the distributions of two sets are?

> Am I correct in thinking that  a P value for the KS test would provide this
> information?

The test statistic (NOT the p-value) is a measure of /dis/similarity.

There are many possible kinds of deviation between distributions. 
Many of these are targeted by available test statistics.

> I looked up Conovers book on non parametric statistics for
> the algorithm of the KS test. However it does not mention any way of
> calculating a P for the test. Is it possible?

Yes it is, but as I mentioned, it doesn't answer the question you're asking.

> Are there any other tests that would be able to tell me how similar the
> distributions of two sets of observations are

The same comments apply to other tests - they will measure some aspect
or aspects of dissimilarity between the two samples. The test statistic
(not the p-value) tells you something about "how different". The p-value
tells you something about how unusual the samples were (as measured by the 
test statistic) /assuming the two samples were from the same distribution/.

This is a function of both the measure of discrepancy and the sample size.

Glen
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: closeness of distributions

Reply via email to