> >> > I have several variables (X1, X2...) measuring various traits of
> >> > individuals and one variable (Y) which is binary (survived/did not
> >> > survive). I would like to check if the variation in survival can be
> >> > explained with Xi variables.
> >> >
> >> > It looks like a typical logistic regression problem. However it bothers
> >> > me that the Y variable has a non-random error. The group "survived"
> >> > surely consists of individuals who *did survived*, but the group "did
> >> > not survive" is likely to include some individuals which actually
> >> > survived but were not detected by an observer. How to proceed with the
> >> > analysis?
> >>
---------
> >> How to proceed? -- just, proceed.  Do you have any choice?
> >> Do you have some data in hand that you have not mentioned?
> >> There is no useful way to weight the data, if that is what you are
> >> wondering:  a regression on a dummy-variable scored 0/1  gives you
> >> the same test as if it were scored as an (equivocating)  0/ .5.  And
> >> the coefficients are easier to understand in the first one.
> >>
> >> When you describe your prediction equation, you might want to use a
> >> score other than the computer-program's default cutoff to describe the
> >> fit of prediction and outcome.  But that is often the case.
> >>
----------
> IMO, this is an interesting problem, which is
> probably quite general.  I encountered it in my
> area (hydrometeorology) some years ago:
> 
>   Trying to predict occurence of thunderstorms
>   over an area based on remote sensing and large
>   scale variables.  A regression model is fitted
>   based on the reports from a number of ground
>   stations that are sparcely distributed over
>   the area.  The specific feature of such ground
>   observations is that, if a thunderstorm was
>   reported, then it occured (almost surely),
>   however, a lot of them are simply not noticed
>   (e.g., occured too far from any station).
> 
> The applications known to me simply ignore this
> specific deficiency of the observations.  I wonder
> whether it could be tractable?  Of course, you
> need some quantitative estimate of the error.
> Weighted regression seems to me a correct way to
> account for the fact that some observations are
> less reliable than the others.  Thus, the comment
> by Rich is confusing for me.  Am I wrong?  Are
> there more sophisticated ways to include such
> error information in a predictive model?
-------------- 

I received one more answer which came directly to my mail account.
It said that a solution to the problem has been already elaborated. It
is known as "censoring" and is explained in texts on survival analysis,
failure rates, proportional hazards models, etc.
I was informed that the topic is covered in book by Lawless ('82). In
Amazon I have found a book which seemed to fit:
Statistical Models and Methods for Lifetime Data
by J. F. Lawless, Jerld T. Lawless. (Wiley, 1982)
There was also a book by
Kleinbaum DG 1996. Survival analysis: a self learning text. (issued by
Springer), which seems to be on a more elementary level and has received
enthusiastic reviews. However, there is no detailed info on its
contents, so I don't know if it includes "censoring". 

Regards
kh


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to