David Delgado Gomez <[EMAIL PROTECTED]> wrote: > Well the data are coming from a population that follows a > normal distribution, in my case is just data from a disease, > but in these data part of them are coming from another > different disease. The fact is the second disease values > are always much bigger than the mean of the previous one > and the problem is they are not so numerous to separate then > with a mixture of gaussian. Because of they are a small number > my algorithm does not works. And I need to separate them > because they will be noise in my results. Looking to the > histogram they can consider outliers(because they are > far away from the peak) so stimating the variance I can > take the 90 % of the data that belongs to the first disease > and do my results with them
Well, from this description it seems clear that the observations are described well by a mixture model, with one bump for each disease. Making a hard assigment of each observation to one bump or the other is an approximation to the right solution, which is to count each observation in proportion to how well it belongs to each bump. If there is a lot of overlap, partial assignment yields substantially different results than all-or-nothing assignment. It's not very difficult to work with partial assignments, so I don't see that there's much to gain by thinking up various hacks. Another consideration is that disease #2 may be of greater importance in some slightly different context; why not get in the habit of working carefully, so that you can say something interesting about both diseases. For what it's worth, Robert Dodier -- ... much of what is called "rational" seems more like rationalization, to me; a ruse intended to make something desired appear necessary. -- Jeff Inman . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
