[R] Cox model -missing data.

2014-12-19 Thread aoife doherty
Hi all,

I have a data set like this:

Test.cox file:

V1V2 V3   Survival   Event
ann  13  WTHomo   41
ben  20  *51
tom  40  Variant  61


where * indicates that I don't know what the value is for V3 for Ben.

I've set up a Cox model to run like this:

#!/usr/bin/Rscript
library(bdsmatrix)
library(kinship2)
library(survival)
library(coxme)
death.dat - read.table(Test.cox,header=T)
deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid))
sink(Test.cox.R.Output)
Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) +
strata(factor(V2)) + factor(V3)) +
(1|ID),data=death.dat,varlist=deathdat.kmat)
Model
sink()



As you can see from the Test.cox file, I have a missing value *. How and
where do I tell the R script treat * as a missing variable. If I can't
incorporate missing values into the model, I assume the alternative is to
remove all of the rows with missing data, which will greatly reduce my data
set, as most rows have at least one missing variable.

Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model -missing data.

2014-12-19 Thread Shouro Dasgupta
First recode the *  in NA: death.dat$v3[death.dat$v1==*] - NA

Include this in your model: na.rm=TRUE

Or you could create a new dataset: newdata - na.omit(death.dat)

Shouro




On Fri, Dec 19, 2014 at 11:12 AM, aoife doherty aoife.m.dohe...@gmail.com
wrote:

 Hi all,

 I have a data set like this:

 Test.cox file:

 V1V2 V3   Survival   Event
 ann  13  WTHomo   41
 ben  20  *51
 tom  40  Variant  61


 where * indicates that I don't know what the value is for V3 for Ben.

 I've set up a Cox model to run like this:

 #!/usr/bin/Rscript
 library(bdsmatrix)
 library(kinship2)
 library(survival)
 library(coxme)
 death.dat - read.table(Test.cox,header=T)
 deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid))
 sink(Test.cox.R.Output)
 Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) +
 strata(factor(V2)) + factor(V3)) +
 (1|ID),data=death.dat,varlist=deathdat.kmat)
 Model
 sink()



 As you can see from the Test.cox file, I have a missing value *. How and
 where do I tell the R script treat * as a missing variable. If I can't
 incorporate missing values into the model, I assume the alternative is to
 remove all of the rows with missing data, which will greatly reduce my data
 set, as most rows have at least one missing variable.

 Thanks

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model -missing data.

2014-12-19 Thread Ted Harding
Hi Aoife,
I think that if you simply replace each * in the data file
with NA, then it should work (NA is usually interpreted
as missing for those functions for which missingness is
relevant). How you subsequently deal with records which have
missing values is another question (or many questions ... ).

So your data should look like:

V1   V2  V3   Survival   Event
ann  13  WTHomo   41
ben  20  NA   51
tom  40  Variant  61

Hoping this helps,
Ted.

On 19-Dec-2014 10:12:00 aoife doherty wrote:
 Hi all,
 
 I have a data set like this:
 
 Test.cox file:
 
 V1V2 V3   Survival   Event
 ann  13  WTHomo   41
 ben  20  *51
 tom  40  Variant  61
 
 
 where * indicates that I don't know what the value is for V3 for Ben.
 
 I've set up a Cox model to run like this:
 
#!/usr/bin/Rscript
 library(bdsmatrix)
 library(kinship2)
 library(survival)
 library(coxme)
 death.dat - read.table(Test.cox,header=T)
 deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid))
 sink(Test.cox.R.Output)
 Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) +
 strata(factor(V2)) + factor(V3)) +
 (1|ID),data=death.dat,varlist=deathdat.kmat)
 Model
 sink()
 
 
 
 As you can see from the Test.cox file, I have a missing value *. How and
 where do I tell the R script treat * as a missing variable. If I can't
 incorporate missing values into the model, I assume the alternative is to
 remove all of the rows with missing data, which will greatly reduce my data
 set, as most rows have at least one missing variable.
 
 Thanks
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 19-Dec-2014  Time: 10:21:23
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model -missing data.

2014-12-19 Thread aoife doherty
Many thanks, I appreciate the response.

When I convert the missing values to NA and run the cox model as described
in previous post,  the cox model seems to remove all of the rows with a
missing value (as the number of rows n in the cox output after I
completely remove any row with missing data is the same as the number of
rows n in the cox output after I change the missing values to NA).

What I had been hoping to do is not completely remove a row with missing
data for a co-variable, but rather somehow censor or estimate a value for
the missing value?

In reality, I have ~600 people with survival data and say 6 variables
attached to them. After I incorporate a 7th variable (for which the
information isn't available for every individual), I have 400 people left.
Since I still have survival data and almost all of the information for the
other 200 people (the only thing missing is information about that 7th
variable), it seems a waste to remove all of the survival data for 200
people over one co-variate. So I was hoping instead of completely removing
the rows, to just somehow acknowledge that the data for this particular
co-variate is missing in the model but not completely remove the row? This
is more what I was hoping someone would know if it's possible to
incorporate into the model I described above?

Thanks



On Fri, Dec 19, 2014 at 10:21 AM, Ted Harding ted.hard...@wlandres.net
wrote:

 Hi Aoife,
 I think that if you simply replace each * in the data file
 with NA, then it should work (NA is usually interpreted
 as missing for those functions for which missingness is
 relevant). How you subsequently deal with records which have
 missing values is another question (or many questions ... ).

 So your data should look like:

 V1   V2  V3   Survival   Event
 ann  13  WTHomo   41
 ben  20  NA   51
 tom  40  Variant  61

 Hoping this helps,
 Ted.

 On 19-Dec-2014 10:12:00 aoife doherty wrote:
  Hi all,
 
  I have a data set like this:
 
  Test.cox file:
 
  V1V2 V3   Survival   Event
  ann  13  WTHomo   41
  ben  20  *51
  tom  40  Variant  61
 
 
  where * indicates that I don't know what the value is for V3 for Ben.
 
  I've set up a Cox model to run like this:
 
 #!/usr/bin/Rscript
  library(bdsmatrix)
  library(kinship2)
  library(survival)
  library(coxme)
  death.dat - read.table(Test.cox,header=T)
  deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid))
  sink(Test.cox.R.Output)
  Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) +
  strata(factor(V2)) + factor(V3)) +
  (1|ID),data=death.dat,varlist=deathdat.kmat)
  Model
  sink()
 
 
 
  As you can see from the Test.cox file, I have a missing value *. How
 and
  where do I tell the R script treat * as a missing variable. If I can't
  incorporate missing values into the model, I assume the alternative is to
  remove all of the rows with missing data, which will greatly reduce my
 data
  set, as most rows have at least one missing variable.
 
  Thanks
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 -
 E-Mail: (Ted Harding) ted.hard...@wlandres.net
 Date: 19-Dec-2014  Time: 10:21:23
 This message was sent by XFMail
 -


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model -missing data.

2014-12-19 Thread Ted Harding
Yes, your basic reasoning is correct. In general, the observed variables
carry information about the variables with missing values, so (in some
way) the missing values can be replaced with estimates (imputations)
and the standard regression method will then work as though the
replacements were there is the first place. To incorporate the inevitable
uncertainty about what the missing values really were, one approach
(multiple imputation) is to do the replacement many times over,
sampling the replacement values from a posterior distribution estimated
from the non-missing data. There are other approaches.

This is where the many questions kick in! I don't have time at the
moment, to go into further detail (there's a lot of it, and several
R packages which deal with missing data in different ways), but I hope
that someone can meanwhile point you in the right direction.

With best wishes,
Ted.

On 19-Dec-2014 11:17:27 aoife doherty wrote:
 Many thanks, I appreciate the response.
 
 When I convert the missing values to NA and run the cox model as described
 in previous post,  the cox model seems to remove all of the rows with a
 missing value (as the number of rows n in the cox output after I
 completely remove any row with missing data is the same as the number of
 rows n in the cox output after I change the missing values to NA).
 
 What I had been hoping to do is not completely remove a row with missing
 data for a co-variable, but rather somehow censor or estimate a value for
 the missing value?
 
 In reality, I have ~600 people with survival data and say 6 variables
 attached to them. After I incorporate a 7th variable (for which the
 information isn't available for every individual), I have 400 people left.
 Since I still have survival data and almost all of the information for the
 other 200 people (the only thing missing is information about that 7th
 variable), it seems a waste to remove all of the survival data for 200
 people over one co-variate. So I was hoping instead of completely removing
 the rows, to just somehow acknowledge that the data for this particular
 co-variate is missing in the model but not completely remove the row? This
 is more what I was hoping someone would know if it's possible to
 incorporate into the model I described above?
 
 Thanks
 
 
 
 On Fri, Dec 19, 2014 at 10:21 AM, Ted Harding ted.hard...@wlandres.net
 wrote:

 Hi Aoife,
 I think that if you simply replace each * in the data file
 with NA, then it should work (NA is usually interpreted
 as missing for those functions for which missingness is
 relevant). How you subsequently deal with records which have
 missing values is another question (or many questions ... ).

 So your data should look like:

 V1   V2  V3   Survival   Event
 ann  13  WTHomo   41
 ben  20  NA   51
 tom  40  Variant  61

 Hoping this helps,
 Ted.

 On 19-Dec-2014 10:12:00 aoife doherty wrote:
  Hi all,
 
  I have a data set like this:
 
  Test.cox file:
 
  V1V2 V3   Survival   Event
  ann  13  WTHomo   41
  ben  20  *51
  tom  40  Variant  61
 
 
  where * indicates that I don't know what the value is for V3 for Ben.
 
  I've set up a Cox model to run like this:
 
 #!/usr/bin/Rscript
  library(bdsmatrix)
  library(kinship2)
  library(survival)
  library(coxme)
  death.dat - read.table(Test.cox,header=T)
  deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid))
  sink(Test.cox.R.Output)
  Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) +
  strata(factor(V2)) + factor(V3)) +
  (1|ID),data=death.dat,varlist=deathdat.kmat)
  Model
  sink()
 
 
 
  As you can see from the Test.cox file, I have a missing value *. How
 and
  where do I tell the R script treat * as a missing variable. If I can't
  incorporate missing values into the model, I assume the alternative is to
  remove all of the rows with missing data, which will greatly reduce my
 data
  set, as most rows have at least one missing variable.
 
  Thanks
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 -
 E-Mail: (Ted Harding) ted.hard...@wlandres.net
 Date: 19-Dec-2014  Time: 10:21:23
 This message was sent by XFMail
 -

 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 

Re: [R] Cox model -missing data.

2014-12-19 Thread Michael Dewey

Comment inline

On 19/12/2014 11:17, aoife doherty wrote:

Many thanks, I appreciate the response.

When I convert the missing values to NA and run the cox model as described
in previous post,  the cox model seems to remove all of the rows with a
missing value (as the number of rows n in the cox output after I
completely remove any row with missing data is the same as the number of
rows n in the cox output after I change the missing values to NA).

What I had been hoping to do is not completely remove a row with missing
data for a co-variable, but rather somehow censor or estimate a value for
the missing value?


I think you are searching for some form of imputation here. A full 
answer would be way beyond the scope of this list as it depends on so 
many things including the mechanism driving the missingness.


Have a look at
http://missingdata.lshtm.ac.uk/
and see whether that helps.



In reality, I have ~600 people with survival data and say 6 variables
attached to them. After I incorporate a 7th variable (for which the
information isn't available for every individual), I have 400 people left.
Since I still have survival data and almost all of the information for the
other 200 people (the only thing missing is information about that 7th
variable), it seems a waste to remove all of the survival data for 200
people over one co-variate. So I was hoping instead of completely removing
the rows, to just somehow acknowledge that the data for this particular
co-variate is missing in the model but not completely remove the row? This
is more what I was hoping someone would know if it's possible to
incorporate into the model I described above?

Thanks



On Fri, Dec 19, 2014 at 10:21 AM, Ted Harding ted.hard...@wlandres.net
wrote:


Hi Aoife,
I think that if you simply replace each * in the data file
with NA, then it should work (NA is usually interpreted
as missing for those functions for which missingness is
relevant). How you subsequently deal with records which have
missing values is another question (or many questions ... ).

So your data should look like:

V1   V2  V3   Survival   Event
ann  13  WTHomo   41
ben  20  NA   51
tom  40  Variant  61

Hoping this helps,
Ted.

On 19-Dec-2014 10:12:00 aoife doherty wrote:

Hi all,

I have a data set like this:

Test.cox file:

V1V2 V3   Survival   Event
ann  13  WTHomo   41
ben  20  *51
tom  40  Variant  61


where * indicates that I don't know what the value is for V3 for Ben.

I've set up a Cox model to run like this:

#!/usr/bin/Rscript
library(bdsmatrix)
library(kinship2)
library(survival)
library(coxme)
death.dat - read.table(Test.cox,header=T)
deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid))
sink(Test.cox.R.Output)
Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) +
strata(factor(V2)) + factor(V3)) +
(1|ID),data=death.dat,varlist=deathdat.kmat)
Model
sink()



As you can see from the Test.cox file, I have a missing value *. How

and

where do I tell the R script treat * as a missing variable. If I can't
incorporate missing values into the model, I assume the alternative is to
remove all of the rows with missing data, which will greatly reduce my

data

set, as most rows have at least one missing variable.

Thanks

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 19-Dec-2014  Time: 10:21:23
This message was sent by XFMail
-



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5577 / Virus Database: 4253/8764 - Release Date: 12/19/14




--
Michael
http://www.dewey.myzen.co.uk

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.