Konrad Halupka <[EMAIL PROTECTED]> wrote: >Rich Ulrich wrote: >> > I have several variables (X1, X2...) measuring various traits of >> > individuals and one variable (Y) which is binary (survived/did not >> > survive). I would like to check if the variation in survival can be >> > explained with Xi variables. >> > >> > It looks like a typical logistic regression problem. However it bothers >> > me that the Y variable has a non-random error. The group "survived" >> > surely consists of individuals who *did survived*, but the group "did >> > not survive" is likely to include some individuals which actually >> > survived but were not detected by an observer. How to proceed with the >> > analysis? >> >> How to proceed? -- just, proceed. Do you have any choice? >> Do you have some data in hand that you have not mentioned? >> There is no useful way to weight the data, if that is what you are >> wondering: a regression on a dummy-variable scored 0/1 gives you >> the same test as if it were scored as an (equivocating) 0/ .5. And >> the coefficients are easier to understand in the first one. >> >> When you describe your prediction equation, you might want to use a >> score other than the computer-program's default cutoff to describe the >> fit of prediction and outcome. But that is often the case. >> >> Do you have a hint about who covertly survived >> (which might suggest using a 3-group classification)? >> Do you have an extremely high rate of success, so that >> someone might be relying on the accuracy of your predictions >> for some purpose? >> > >Thanks for response. Indeed, I was concerned if there were some methods >of weighting the data. > >Birds are marked and after 12 months the observer attempts to find them >again in the field. Of course it is impossible to search a very wide >area. Those individuals who "covertly survive" have a tendency to >disperse farther than those who survive "overtly" (i.e. can be relocated >within a reasonable distance from the place where they were originally >found). IMO, this is an interesting problem, which is probably quite general. I encountered it in my area (hydrometeorology) some years ago: Trying to predict occurence of thunderstorms over an area based on remote sensing and large scale variables. A regression model is fitted based on the reports from a number of ground stations that are sparcely distributed over the area. The specific feature of such ground observations is that, if a thunderstorm was reported, then it occured (almost surely), however, a lot of them are simply not noticed (e.g., occured too far from any station). The applications known to me simply ignore this specific deficiency of the observations. I wonder whether it could be tractable? Of course, you need some quantitative estimate of the error. Weighted regression seems to me a correct way to account for the fact that some observations are less reliable than the others. Thus, the comment by Rich is confusing for me. Am I wrong? Are there more sophisticated ways to include such error information in a predictive model? BTW: I am new in this NG. Is s.s.e. a good place to discuss questions like the above, or the s.s.m., or s.s.c. would be better? Regards, Greg ___ Grzegorz Jan Ciach http://ia.net/~gciach ================================================================= Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =================================================================
