Rajarshi Guha <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>... > Hi, > I'm working on a problem in which I'd like to determine whether two > sets of data come from the same distribution. It seems that the Kolmogorv > Smirnov test will give me the information I need.
No test can answer this question. Sometimes a test might give you an indication that two samples don't seem to have come from the same distribution, but it simply /can't/ tell you that they have come from the same distribution. But yes, the K-S two-sample test (often called the Smirnov test) is a way of testing the null hypothesis that the two samples came from the same distribution - it's just that failing to reject that null doesn't mean that the test is telling you they do come from the same distribution. > However I'd like to go further than just accept or reject the H0 for the > test. Is there any way (using this test or someother test) to determine > *how* similar the distributions of two sets are? > Am I correct in thinking that a P value for the KS test would provide this > information? The test statistic (NOT the p-value) is a measure of /dis/similarity. There are many possible kinds of deviation between distributions. Many of these are targeted by available test statistics. > I looked up Conovers book on non parametric statistics for > the algorithm of the KS test. However it does not mention any way of > calculating a P for the test. Is it possible? Yes it is, but as I mentioned, it doesn't answer the question you're asking. > Are there any other tests that would be able to tell me how similar the > distributions of two sets of observations are The same comments apply to other tests - they will measure some aspect or aspects of dissimilarity between the two samples. The test statistic (not the p-value) tells you something about "how different". The p-value tells you something about how unusual the samples were (as measured by the test statistic) /assuming the two samples were from the same distribution/. This is a function of both the measure of discrepancy and the sample size. Glen . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
