In article <[EMAIL PROTECTED]>, Rich Ulrich <[EMAIL PROTECTED]> wrote: >On Wed, 17 Oct 2001 15:50:35 +0200, Tobias Richter ><[EMAIL PROTECTED]> wrote:
>> We have collected variables that represent proportions (i. e., the >> proportion of sentences in a number of texts that belong to a certain >> category). The distributions of these variables are highly skewed (the >> proportions for most of the texts are zero or rather low). So my >Low proportions, and a lot at zero? >There is no way you can transform to "symmetry" when >there is a clump at one end and a long tail at the other. If one has a continuous distribution, one can always transform to symmetry. It is by no means clear that it dces any good. >First thought: the dichotomy of None/Some sometimes >contains most of the information that is useful. Dummy Var1. >Related thought: "none" is sometimes a separate dimension >from what is implicitly measured by the continuous values above zero. >If that dimension does seem useful: Dummy Var2. >> question is: Is there a function that transforms the proportions into >> symmetrically distributed variables? And is there a reliable statistics >> text that discusses such transformations? >"Symmetry" might happen, and it is good to have for the sake >of testing. However, describing a scientific model with >meaningful parameters is a better starting point, and you >can devise tests from there. This is VERY important. The model should always come from the subject field, not from the data, or from mathematical convenience. However, one can use robustness results, in the real sense of the term, to see that procedures devised for certain models are good for others. >I mean: it is useful if you have a "Poisson model with a >Poisson parameter", say, at the stage of setting up a model. >You might want to take the square root before you do testing, >and you know that is appropriate for the Poisson; but the raw >Poisson parameter is a number that is ordinarily additive. >I have not seen many texts that tackle transformations in >the abstract. Finney's classic text on bioassay has a few pages. >Or, I think, Mosteller and Tukey, "Data Analysis and Regression." -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 [EMAIL PROTECTED] Phone: (765)494-6054 FAX: (765)494-0558 ================================================================= Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =================================================================