Konrad Halupka <[EMAIL PROTECTED]> wrote:
>Rich Ulrich wrote:

>> > I have several variables (X1, X2...) measuring various traits of
>> > individuals and one variable (Y) which is binary (survived/did not
>> > survive). I would like to check if the variation in survival can be
>> > explained with Xi variables.
>> >
>> > It looks like a typical logistic regression problem. However it bothers
>> > me that the Y variable has a non-random error. The group "survived"
>> > surely consists of individuals who *did survived*, but the group "did
>> > not survive" is likely to include some individuals which actually
>> > survived but were not detected by an observer. How to proceed with the
>> > analysis?
>> 
>> How to proceed? -- just, proceed.  Do you have any choice?
>> Do you have some data in hand that you have not mentioned?
>> There is no useful way to weight the data, if that is what you are
>> wondering:  a regression on a dummy-variable scored 0/1  gives you
>> the same test as if it were scored as an (equivocating)  0/ .5.  And
>> the coefficients are easier to understand in the first one.
>> 
>> When you describe your prediction equation, you might want to use a
>> score other than the computer-program's default cutoff to describe the
>> fit of prediction and outcome.  But that is often the case.
>> 
>> Do you have a hint about who covertly survived
>> (which might suggest using a 3-group classification)?
>> Do you have an extremely high rate of success, so that
>> someone might be relying on the accuracy of your predictions
>> for some purpose?
>> 
>
>Thanks for response. Indeed, I was concerned if there were some methods
>of weighting the data.
>
>Birds are marked and after 12 months the observer attempts to find them
>again in the field. Of course it is impossible to search a very wide
>area. Those individuals who "covertly survive" have a tendency to
>disperse farther than those who survive "overtly" (i.e. can be relocated
>within a reasonable distance from the place where they were originally
>found). 


IMO, this is an interesting problem, which is 
probably quite general.  I encountered it in my 
area (hydrometeorology) some years ago: 

  Trying to predict occurence of thunderstorms 
  over an area based on remote sensing and large 
  scale variables.  A regression model is fitted 
  based on the reports from a number of ground 
  stations that are sparcely distributed over 
  the area.  The specific feature of such ground 
  observations is that, if a thunderstorm was 
  reported, then it occured (almost surely), 
  however, a lot of them are simply not noticed 
  (e.g., occured too far from any station).

The applications known to me simply ignore this 
specific deficiency of the observations.  I wonder 
whether it could be tractable?  Of course, you 
need some quantitative estimate of the error.  
Weighted regression seems to me a correct way to 
account for the fact that some observations are 
less reliable than the others.  Thus, the comment 
by Rich is confusing for me.  Am I wrong?  Are 
there more sophisticated ways to include such 
error information in a predictive model? 


BTW: I am new in this NG. Is s.s.e. a good place 
to discuss questions like the above, or the s.s.m., 
or s.s.c. would be better?

Regards,
Greg

___
Grzegorz Jan Ciach
http://ia.net/~gciach



=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to