On 20 Oct 2003 20:16:52 -0700, [EMAIL PROTECTED] (Michael) wrote: > > Reference? What do you mean by "removing > > duplicates" and what is this K-S test about? > > I am testing goodness-of-fit for continuous distributions. > > I have been taught to delete any duplicate data points for example if > we have 10,33,44,44,44,44,55,56 I would calculate my sample and > theorectical probabilites then delete all but ONE of the 44's (the > duplicate data points) and the calculate the differences and then find > then take the absolute values of the differences and my D-value is the > largest of those. > > The question is which ONE do leave behind? Its a choice that has an > impact on the resulting D-value.
No, no, no. You have these items ranked, and then you compare the rank to a CDF, and you want to find the maximum difference. It is not *necessary* to compute a D for any rank in the middle of ties, because it can't possibly give the maximum D. In that sense -- because it can't be useful -- you can 'delete' the act of computing the D. But you certainly do not delete the data. [ ... ] > > What method do you use to test goodness-of-fit for coninuous > distributions? K-S is designed for continuous distributions. Shapiro-Wilks is popular, and its principle of correlation. Personally, I most often look at a scatterplot of a couple of interesting variables. Outliers matter, but 'homogeneity of variance' and 'linearity of regression' matter, too. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html "Taxes are the price we pay for civilization." . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
