[R] Cox model -missing data.
Hi all, I have a data set like this: Test.cox file: V1V2 V3 Survival Event ann 13 WTHomo 41 ben 20 *51 tom 40 Variant 61 where * indicates that I don't know what the value is for V3 for Ben. I've set up a Cox model to run like this: #!/usr/bin/Rscript library(bdsmatrix) library(kinship2) library(survival) library(coxme) death.dat - read.table(Test.cox,header=T) deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid)) sink(Test.cox.R.Output) Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) + strata(factor(V2)) + factor(V3)) + (1|ID),data=death.dat,varlist=deathdat.kmat) Model sink() As you can see from the Test.cox file, I have a missing value *. How and where do I tell the R script treat * as a missing variable. If I can't incorporate missing values into the model, I assume the alternative is to remove all of the rows with missing data, which will greatly reduce my data set, as most rows have at least one missing variable. Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cox model -missing data.
First recode the * in NA: death.dat$v3[death.dat$v1==*] - NA Include this in your model: na.rm=TRUE Or you could create a new dataset: newdata - na.omit(death.dat) Shouro On Fri, Dec 19, 2014 at 11:12 AM, aoife doherty aoife.m.dohe...@gmail.com wrote: Hi all, I have a data set like this: Test.cox file: V1V2 V3 Survival Event ann 13 WTHomo 41 ben 20 *51 tom 40 Variant 61 where * indicates that I don't know what the value is for V3 for Ben. I've set up a Cox model to run like this: #!/usr/bin/Rscript library(bdsmatrix) library(kinship2) library(survival) library(coxme) death.dat - read.table(Test.cox,header=T) deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid)) sink(Test.cox.R.Output) Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) + strata(factor(V2)) + factor(V3)) + (1|ID),data=death.dat,varlist=deathdat.kmat) Model sink() As you can see from the Test.cox file, I have a missing value *. How and where do I tell the R script treat * as a missing variable. If I can't incorporate missing values into the model, I assume the alternative is to remove all of the rows with missing data, which will greatly reduce my data set, as most rows have at least one missing variable. Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cox model -missing data.
Hi Aoife, I think that if you simply replace each * in the data file with NA, then it should work (NA is usually interpreted as missing for those functions for which missingness is relevant). How you subsequently deal with records which have missing values is another question (or many questions ... ). So your data should look like: V1 V2 V3 Survival Event ann 13 WTHomo 41 ben 20 NA 51 tom 40 Variant 61 Hoping this helps, Ted. On 19-Dec-2014 10:12:00 aoife doherty wrote: Hi all, I have a data set like this: Test.cox file: V1V2 V3 Survival Event ann 13 WTHomo 41 ben 20 *51 tom 40 Variant 61 where * indicates that I don't know what the value is for V3 for Ben. I've set up a Cox model to run like this: #!/usr/bin/Rscript library(bdsmatrix) library(kinship2) library(survival) library(coxme) death.dat - read.table(Test.cox,header=T) deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid)) sink(Test.cox.R.Output) Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) + strata(factor(V2)) + factor(V3)) + (1|ID),data=death.dat,varlist=deathdat.kmat) Model sink() As you can see from the Test.cox file, I have a missing value *. How and where do I tell the R script treat * as a missing variable. If I can't incorporate missing values into the model, I assume the alternative is to remove all of the rows with missing data, which will greatly reduce my data set, as most rows have at least one missing variable. Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 19-Dec-2014 Time: 10:21:23 This message was sent by XFMail __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cox model -missing data.
Many thanks, I appreciate the response. When I convert the missing values to NA and run the cox model as described in previous post, the cox model seems to remove all of the rows with a missing value (as the number of rows n in the cox output after I completely remove any row with missing data is the same as the number of rows n in the cox output after I change the missing values to NA). What I had been hoping to do is not completely remove a row with missing data for a co-variable, but rather somehow censor or estimate a value for the missing value? In reality, I have ~600 people with survival data and say 6 variables attached to them. After I incorporate a 7th variable (for which the information isn't available for every individual), I have 400 people left. Since I still have survival data and almost all of the information for the other 200 people (the only thing missing is information about that 7th variable), it seems a waste to remove all of the survival data for 200 people over one co-variate. So I was hoping instead of completely removing the rows, to just somehow acknowledge that the data for this particular co-variate is missing in the model but not completely remove the row? This is more what I was hoping someone would know if it's possible to incorporate into the model I described above? Thanks On Fri, Dec 19, 2014 at 10:21 AM, Ted Harding ted.hard...@wlandres.net wrote: Hi Aoife, I think that if you simply replace each * in the data file with NA, then it should work (NA is usually interpreted as missing for those functions for which missingness is relevant). How you subsequently deal with records which have missing values is another question (or many questions ... ). So your data should look like: V1 V2 V3 Survival Event ann 13 WTHomo 41 ben 20 NA 51 tom 40 Variant 61 Hoping this helps, Ted. On 19-Dec-2014 10:12:00 aoife doherty wrote: Hi all, I have a data set like this: Test.cox file: V1V2 V3 Survival Event ann 13 WTHomo 41 ben 20 *51 tom 40 Variant 61 where * indicates that I don't know what the value is for V3 for Ben. I've set up a Cox model to run like this: #!/usr/bin/Rscript library(bdsmatrix) library(kinship2) library(survival) library(coxme) death.dat - read.table(Test.cox,header=T) deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid)) sink(Test.cox.R.Output) Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) + strata(factor(V2)) + factor(V3)) + (1|ID),data=death.dat,varlist=deathdat.kmat) Model sink() As you can see from the Test.cox file, I have a missing value *. How and where do I tell the R script treat * as a missing variable. If I can't incorporate missing values into the model, I assume the alternative is to remove all of the rows with missing data, which will greatly reduce my data set, as most rows have at least one missing variable. Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 19-Dec-2014 Time: 10:21:23 This message was sent by XFMail - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cox model -missing data.
Yes, your basic reasoning is correct. In general, the observed variables carry information about the variables with missing values, so (in some way) the missing values can be replaced with estimates (imputations) and the standard regression method will then work as though the replacements were there is the first place. To incorporate the inevitable uncertainty about what the missing values really were, one approach (multiple imputation) is to do the replacement many times over, sampling the replacement values from a posterior distribution estimated from the non-missing data. There are other approaches. This is where the many questions kick in! I don't have time at the moment, to go into further detail (there's a lot of it, and several R packages which deal with missing data in different ways), but I hope that someone can meanwhile point you in the right direction. With best wishes, Ted. On 19-Dec-2014 11:17:27 aoife doherty wrote: Many thanks, I appreciate the response. When I convert the missing values to NA and run the cox model as described in previous post, the cox model seems to remove all of the rows with a missing value (as the number of rows n in the cox output after I completely remove any row with missing data is the same as the number of rows n in the cox output after I change the missing values to NA). What I had been hoping to do is not completely remove a row with missing data for a co-variable, but rather somehow censor or estimate a value for the missing value? In reality, I have ~600 people with survival data and say 6 variables attached to them. After I incorporate a 7th variable (for which the information isn't available for every individual), I have 400 people left. Since I still have survival data and almost all of the information for the other 200 people (the only thing missing is information about that 7th variable), it seems a waste to remove all of the survival data for 200 people over one co-variate. So I was hoping instead of completely removing the rows, to just somehow acknowledge that the data for this particular co-variate is missing in the model but not completely remove the row? This is more what I was hoping someone would know if it's possible to incorporate into the model I described above? Thanks On Fri, Dec 19, 2014 at 10:21 AM, Ted Harding ted.hard...@wlandres.net wrote: Hi Aoife, I think that if you simply replace each * in the data file with NA, then it should work (NA is usually interpreted as missing for those functions for which missingness is relevant). How you subsequently deal with records which have missing values is another question (or many questions ... ). So your data should look like: V1 V2 V3 Survival Event ann 13 WTHomo 41 ben 20 NA 51 tom 40 Variant 61 Hoping this helps, Ted. On 19-Dec-2014 10:12:00 aoife doherty wrote: Hi all, I have a data set like this: Test.cox file: V1V2 V3 Survival Event ann 13 WTHomo 41 ben 20 *51 tom 40 Variant 61 where * indicates that I don't know what the value is for V3 for Ben. I've set up a Cox model to run like this: #!/usr/bin/Rscript library(bdsmatrix) library(kinship2) library(survival) library(coxme) death.dat - read.table(Test.cox,header=T) deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid)) sink(Test.cox.R.Output) Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) + strata(factor(V2)) + factor(V3)) + (1|ID),data=death.dat,varlist=deathdat.kmat) Model sink() As you can see from the Test.cox file, I have a missing value *. How and where do I tell the R script treat * as a missing variable. If I can't incorporate missing values into the model, I assume the alternative is to remove all of the rows with missing data, which will greatly reduce my data set, as most rows have at least one missing variable. Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 19-Dec-2014 Time: 10:21:23 This message was sent by XFMail - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
Re: [R] Cox model -missing data.
Comment inline On 19/12/2014 11:17, aoife doherty wrote: Many thanks, I appreciate the response. When I convert the missing values to NA and run the cox model as described in previous post, the cox model seems to remove all of the rows with a missing value (as the number of rows n in the cox output after I completely remove any row with missing data is the same as the number of rows n in the cox output after I change the missing values to NA). What I had been hoping to do is not completely remove a row with missing data for a co-variable, but rather somehow censor or estimate a value for the missing value? I think you are searching for some form of imputation here. A full answer would be way beyond the scope of this list as it depends on so many things including the mechanism driving the missingness. Have a look at http://missingdata.lshtm.ac.uk/ and see whether that helps. In reality, I have ~600 people with survival data and say 6 variables attached to them. After I incorporate a 7th variable (for which the information isn't available for every individual), I have 400 people left. Since I still have survival data and almost all of the information for the other 200 people (the only thing missing is information about that 7th variable), it seems a waste to remove all of the survival data for 200 people over one co-variate. So I was hoping instead of completely removing the rows, to just somehow acknowledge that the data for this particular co-variate is missing in the model but not completely remove the row? This is more what I was hoping someone would know if it's possible to incorporate into the model I described above? Thanks On Fri, Dec 19, 2014 at 10:21 AM, Ted Harding ted.hard...@wlandres.net wrote: Hi Aoife, I think that if you simply replace each * in the data file with NA, then it should work (NA is usually interpreted as missing for those functions for which missingness is relevant). How you subsequently deal with records which have missing values is another question (or many questions ... ). So your data should look like: V1 V2 V3 Survival Event ann 13 WTHomo 41 ben 20 NA 51 tom 40 Variant 61 Hoping this helps, Ted. On 19-Dec-2014 10:12:00 aoife doherty wrote: Hi all, I have a data set like this: Test.cox file: V1V2 V3 Survival Event ann 13 WTHomo 41 ben 20 *51 tom 40 Variant 61 where * indicates that I don't know what the value is for V3 for Ben. I've set up a Cox model to run like this: #!/usr/bin/Rscript library(bdsmatrix) library(kinship2) library(survival) library(coxme) death.dat - read.table(Test.cox,header=T) deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid)) sink(Test.cox.R.Output) Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) + strata(factor(V2)) + factor(V3)) + (1|ID),data=death.dat,varlist=deathdat.kmat) Model sink() As you can see from the Test.cox file, I have a missing value *. How and where do I tell the R script treat * as a missing variable. If I can't incorporate missing values into the model, I assume the alternative is to remove all of the rows with missing data, which will greatly reduce my data set, as most rows have at least one missing variable. Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 19-Dec-2014 Time: 10:21:23 This message was sent by XFMail - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - No virus found in this message. Checked by AVG - www.avg.com Version: 2015.0.5577 / Virus Database: 4253/8764 - Release Date: 12/19/14 -- Michael http://www.dewey.myzen.co.uk __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.