[R] [R-pkgs] survival3.1
Version 3.x of the survival package is now available from CRAN. This is a major update whose primary goal is to make the analysis of multi-state survival models as easy to do as simple Kaplan-Meier and Cox proportional hazard fits. The primary changes are -- in Surv(time1, time2, status) the status variable can now be a factor that specifies which state a subject has transitioned into. From a user's point of view this is the single key change. -- survfit() will then create multi-state Pr(state | time) curves, i.e., the Aalen-Johansen estimate. (Kaplan-Meier and cumulative incidence are special cases of the AJ). Printout and plotting are unchanged. -- coxph() will then create multi-state fits. There is an easy formula-like syntax for specifying shared coefficients or baseline hazards. -- all use standard data, i.e. the (time1, time2, status) format that is common for time-dependent covariates; there is no need to create special intermediate data sets. (An id variable is necessary to specify which rows go with whom.) -- robust variance estimates available throughout Many other CRAN packages depend on survival, which drove a second goal of "don't break it". Part of the testing involved running the test suits of all 679 reverse dependencies under the new version. Terry Therneau thern...@mayo.edu [[alternative HTML version deleted]] ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Type III tests and Cox models
Someone asked a question about this on the list a couple months ago. I replied that I didn't know any sound basis for it, and the counter-reply was SAS has it. Grant applications were due and I got testy, so there was no further progress at that point. As the author of the survival package I really did want to understand the issue however. I've now done my homework, and the result is contained in a 30+ page vignette that has been added to the most recent release of the package. The short summary: I was suspicious before, now I know for certain that it is misguided, and the phreg implementation of the idea is worse. If you are interested take a look. If significant debate arises about the statistical issues therein I don't think R-help is quite the right forum for discussion, and will entertain any idea wrt a more appropriate venue. Rehashing long and old standing controversies about type 3 for linear models is, however, not exciting to me. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Type III tests and Cox models
This last brought forth a good chuckle. Perhaps combine them into a single entry? On a serious note, to have someone of significant experience say that they hadn't heard of SAS's method for doing x, for nearly any x, marks a welcome climate change. (And took nearly that long.) The phreg procedure also added a Scheffe test of the overall hypothesis, by the way, which no one here has yet back-engineered to figure out what they are doing. An F-test? Terry T. On 01/20/2014 04:18 PM, peter dalgaard wrote: On 20 Jan 2014, at 23:05 , Göran Broström goran.brost...@umu.se wrote: On 01/20/2014 07:02 PM, peter dalgaard wrote: On 20 Jan 2014, at 18:47 , Terry Therneau thern...@mayo.edu wrote: The short summary: I was suspicious before, now I know for certain that it is misguided, and the phreg implementation of the idea is worse. A fortune candidate, if ever I saw one. OK, but please state clearly that 'phreg' refers to a SAS procedure, not the function with the same name in the R package 'eha' (I should have chosen another name, had I ever heard of SAS). That'll be the 2nd fortune candidate of this evening... Göran -pd __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coxph with time-dependent covariates
R throws out this event due to the missing time-dependent covariate at day 6. Is there a way I can keep this event without filling in a covariate value at day 6? No. The Cox model assesses the risk associated with each covariate by comparing, at each event time, the values of the subject who had the event to all those who were also present at that time. You can't compare to missing. Terry T. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tt() function (was Package dependencies in building R packages)
On 12/31/2013 05:00 AM, r-help-requ...@r-project.org wrote: Thanks for your kind response Duncan. To be more specific, I'm using the function mvrnorm from MASS. The issue is that MASS depends on survival and I have a function in my package named tt() which conflicts with a function in survival of the same name. I can think of 2 alternatives solutions to my problem, but I'm to an expert: 1) Copy mvrnorm into my package, which I thought was not a good idea 2) Rename my tt() function to something else in my package, but this is painful as I have it all over the place in other functions. Any suggestions would be much appreciated. Best, Axel. Version 2.37-5 of survival has just been submitted to CRAN. In this new release the tt() function was made externally invisible, so you should not have problems. This will take a few days to propogate out to the various mirrors (perhaps longer if the Windows build has issues). This question of how to best deal with functions that only have meaing within a coxph model statement came up on R-devel just over a year ago, and I'm just now implementing the workaround found there. In time the cluster(), pspline(), ridge() and frailty() functions are likely to receive the same treatment. Terry T __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cumulative incidence for mstate in Survival package in R
Question 1: How to get just 2 cumulative incidence curves when there are multiple covariates. I don't understand what you want. Assume that we have liver transplant and death while waiting for a transplant as my two events. There are overall curves (2), or one can create curves separately for each sex, or for different institutions. What do you mean by a curve for age? If you want competing risks after Cox model adjustment, see the mstate package. Question 2: mine data. There is no such data. This was a hypthetical example in the document, and I chose a poor name for the data set; your_data_set would have been better. I was using mine in the sense of this data set is mine, it belongs to me, and now see that it could confuse someone. The file sourcecode.pdf is intended to document the computational algorithms, but not how a user would approach the function. A vignette is planned, someday... Terry Therneau On 12/30/2013 04:04 PM, Jieyue Li wrote: Dear All, I want to have the cumulative incidence curves for 'mstate' data using Survival package in R. But I got some problems: I. Problem 1: 1. If I only use intercept without any covariates, I can have 'right' cumulative incidence curves (2 for 2 competing risks): library(Survival) fitCI - survfit(Surv(stop, status*as.numeric(event), type=mstate) ~ 1,data=mgus1, subset=(start==0)) plot(fitCI) 2. If I include one variate ('sex'), I get 4 curves (attached; I guess because there are two levels in 'sex' and 2 competing risks): fitCI - survfit(Surv(stop, status*as.numeric(event), type=mstate) ~sex,data=mgus1, subset=(start==0)) plot(fitCI) However, I want to just have 2 cumulative incidence curves estimated from several covariates (such as 'sex', 'age', 'alb', etc. in mgus1). Could you please help me to do that? Thank you very much! II. Problem 2: I try using an example from sourcecode.pdf: fit - survfit(Surv(time, status, type=’mstate’) ~ sex, data=mine) but where can I have the 'mine' data? Thank you! Best, Jieyue __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] estimating survival function from Cox model
The standard error of the curve cannot be extracted from the summary information you have. The variance is made up of two terms, one of which is a sum over all the death times, of a quadratic term per death time. That term involves the variance matrix of the Cox model coefficients, the target value for x (the curve you want to calculate) and the average value of x at that time in the data set from which the Cox model was created. Just like linear regression, the se are higher when you predict far from the center of the original data set. Terry Therneau On 12/18/2013 05:00 AM, r-help-requ...@r-project.org wrote: Hi, I have some questions on how to estimate the survival function from a Cox model. I know how to do this in R using survfit(). But let's say the model was done is another software, and I was only given the estimate of baseline cumulative hazard A0(t=10) at the specified time t=10 (baseline cumulative hazard refers to when covariate X=0)and the beta estimate b for the covariate used in Cox model X. So the survival function at time 10 for a given covariate value x can be calculated as: A(t=10|X=x) = exp(b*x)*A0(t=10) where A is cumulative hazard when X=x S(t=10|X=x) = exp(-A(t=10|X=x)) where S is survival function to be calculated Now I want to calculate confidence interval for S(t=10|X=x). I think I need to calculate the CI for cumulative hazard A(t=10|X=x) first and then exponentiate it to get CI for S, based on the relationship S = exp(-A). To get CI for A, I need to calculate the estimate of standard error of A. I know the other software can give me the standard error of A0, the baseline cumulative hazard. Based on the relationship A = exp(b*x)*A0, I guess I'll need the standard error for b. But how do I calculate the standard error for A based on standard errors for A0 and b? Any insights would be greatly appreciated! John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coxph convergence
I'll re-enter the fray. The data set is an example where coxph is incorrect; due to round off error it is treating a 5 column rank 3 matrix as if it were rank 4. This of course results in 0 digits of precision. Immediate fix, for the user, is to add toler.chol= 1e-10 to their coxph call. It is very likely that they will never ever have to change this value. I will look into changing the default from its current value of .Machine$double.eps^.75. However, I first have to check that this does not break anything else. All epsilon constants are a delicate dance between mutiple data sets, and anyone with long experience in numerical anlysis will tell you that it is impossible to find constants that will work for every data set. This is true for linear models, logistic, Cox, ... you name it. In summary: I appreciate the example. I'll add to my list of nasty problems. I may be able to fix long term, and maybe not. Changing the constant may break something else. I've given a good short term fix. Terry T. On 12/18/2013 05:00 AM, r-help-requ...@r-project.org wrote: Your comprehension of the issue seem to be entirely wrong. Between r11513 and r11516, some tuning of internal parmeters were done, so the process of finding the rank of a singular matrix no longer converges (within the time/tolerance implicitly specified). There are warnings issued, but then there are misc warnings before and after (and one gets desensitised about them). Also the nature of the problem, which is to test for possibility of interactions - or lacking thereof - outcome ~ factor A + factor B + factor A x factor B or just extra terms in outcome ~ factor A + factor B + ... as an exploration of auxiliary effects, more often than not extra terms won't make any difference and the matrix involved just isn't the nicest to manipulate; it is in the nature of that kind of exploratory work. Professor Therneau replied that it is possible to get the older convergent behaviour by manual tuning of some of the convergence criteria parameters; I have responded that while that is possible, often one is simultaneously exploring many models with many possible auxiliary effects (and lacking thereof), manual tuning for each is neither feasible nor appropriate; and we sort of left it at that. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] freetype 2.5.2, problem with the survival package, build R 2.15.x with gcc 4.8.x
I was sent a copy of the data, and this is what I get on a different machine: fit - clogit(cc ~ addContr(A) + addContr(C) + addContr(A.C) + strata(set), data=pscc) Warning messages: 1: In fitter(X, Y, strats, offset, init, control, weights = weights, : Loglik converged before variable 1,2,3,4 ; beta may be infinite. 2: In coxph(formula = Surv(rep(1, 13110L), cc) ~ addContr(A) + addContr(C) + : X matrix deemed to be singular; variable 5 fit coef exp(coef) se(coef) z p addContr(A)2 -0.14250 0.867 110812 -1.29e-06 1 addContr(C)2 0.00525 1.005 110812 4.74e-08 1 addContr(A.C)1-2 -0.30097 0.740 110812 -2.72e-06 1 addContr(A.C)2-1 -0.47712 0.621 110812 -4.31e-06 1 addContr(A.C)2-2 NANA0NA NA xmat - model.matrix(fit) svd(xmat)$d [1] 1.932097e+02 2.700101e+01 1.624731e+01 6.049630e-15 2.031334e-15 The primary issue is that the covariates matrix is singular, having rank 3 instead of rank 5. The coxph routine prints two warning messages that things are not good about the matrix. Warning messages should not be ignored! The insane se(coef) values in the printed result are an even bigger clue that the model fit is suspect. Unfortunately, some small change in the iteration path or numerics has put this data set over the edge from being seen as rank 3 (old run) to rank 4 (new run). Moral: coxph does pretty well at detecting redundat variables, but if you know of some it never hurts to help the routine out by removing them before the fit. Singularity of the X matrix in a Cox model is very difficult to detect reliably; the current threshold is the result of long experience and experiment to give as few false messages as possible. (The RMS package in particular used truncated power basis functions for the splines, which lead to X matrices that look almost singular numerically, but are not.) Setting a little less stringent threshold for declaring singularity in the cholesky decompostion sufficies for this data set. fit2 - clogit(cc ~ addContr(A) + addContr(C) + addContr(A.C) + strata(set), data=pscc, toler.chol=1e-10) I'll certainly add this to my list of test problems that I use to tune those constants. Terry Therneau On 12/11/2013 09:30 PM, Hin-Tak Leung wrote: Here is a rather long discussion etc about freetype 2.5.2, problem with the survival package, and build R 2.15.x with gcc 4.8.x. Please feel free to skip forward. - freetype 2.5.2: the fix to cope with one of the Mac OS X's system fonts just before the release of freetype 2.5.1 caused a regression, crashing over one of Microsoft windows' system fonts. So there is a 2.5.2. There are new 2.5.2 bundles for windows Mac OS X. The official win/mac binaries of R were built statically with 2+-years-old freetype with a few known problems. Most should upgrade/rebuild. http://sourceforge.net/projects/outmodedbonsai/files/R/ - problem with the survival package: Trying to re-run a vignette to get the same result as two years ago reveal a strange change. I went and bisected it down to r11513 and r11516 of the survival package. -- r11513 clogit(cc ~ addContr(A) + addContr(C) + addContr(A.C) + strata(set)) coef exp(coef) se(coef) z p addContr(A)2 -0.620 0.5380.217 -2.86 0.0043 addContr(C)2 0.482 1.6200.217 2.22 0.0270 addContr(A.C)1-2 -0.778 0.4590.275 -2.83 0.0047 addContr(A.C)2-1 NANA0.000NA NA addContr(A.C)2-2 NANA0.000NA NA Likelihood ratio test=26 on 3 df, p=9.49e-06 n= 13110, number of events= 3524 -- - r11516 - clogit(cc ~ addContr(A) + addContr(C) + addContr(A.C) + strata(set)) coef exp(coef) se(coef) z p addContr(A)2 -0.14250 0.867 110812 -1.29e-06 1 addContr(C)2 0.00525 1.005 110812 4.74e-08 1 addContr(A.C)1-2 -0.30097 0.740 110812 -2.72e-06 1 addContr(A.C)2-1 -0.47712 0.621 110812 -4.31e-06 1 addContr(A.C)2-2 NANA0NA NA Likelihood ratio test=26 on 4 df, p=3.15e-05 n= 13110, number of events= 3524 -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coxme time dependent
On 12/13/2013 05:00 AM, r-help-requ...@r-project.org wrote: i all, I am using the coxme function to fit random effects survival models. The base survival models show a typical J shaped curve where survival drops sharply then plateaus over time interval. I want to include a time covariate which could account for this non-linear decrease in survival over time. However, I have not found a clear way to do this in the coxme models. Any suggestions would be greatly appreciated. Below is some sample code and data. -Jared I don't understand what you need. One of the advantages of the Cox model is that the baseline hazard function automatically accommodates such features, for both coxph and coxme. Terry Therneau PS: I can't make out the pattern of your sample data. Perhaps you were depending on the html formatting? R-help uses simple text only. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to stop Kaplan-Meier curve at a time point
Here is the simplest answer that I know. outcome6 - ifelse(Survival_days 2190, 0, Outcome) survfit(Surv(Survival_days, outcome6) ~ Gender) This assumes that the variable Outcome is coded as 0/1. If it were instead FALSE/TRUE then change the 0 to FALSE in the first line. The logrank test adds a term to the computation at each death time, so changing all the deaths after time 2160 to censored effectively ignores any information after that point. There is no need to modify the time variable. David W's suggestion to add subset = (Survival_days = 2190) was incorrect, effectively removing all of the most successful outcomes from the study. It will thus lead to an underestimate of tooth survival. (This was a surprise - David usually bats 1000 on survival questions, and I greatly appreciate his input to this list.) Terry Therneau (author of the package) On 11/21/2013 05:00 AM, r-help-requ...@r-project.org wrote: Hello R users I have a question with Kaplan-Meier Curve with respect to my research. We have done a retrospective study on fillings in the tooth and their survival in relation to the many influencing factors. We had a long follow-up time (upto 8yrs for some variables). However, we decided to stop the analysis at the 6year follow up time, so that we can have uniform follow-up time for all the variables. I did play a bit with the formula and tried to stop the Kaplan-Meier curve at my desired time (2190days)or roughly 6 years. However, my question is I want to find the significance (log rank test) at this time point between the two curves; because I am not able to find a way to stop the survfit at this time point with my knowledge. Below is the code I used. Gender2-survfit(Surv(Survival_days, Outcome)~Gender) plot (Gender2, xmax=2190, mark.time=FALSE, col=c(1:2), xlab=Survival time (Days), ylab=Survival probability, main=Gender) # mark.time=FALSE will remove the censoring lines in the graph. If censoring lines are needed, then remove the mark.time section in the formula. legend(topright,c(Females, Males), col=(1:2), lwd=0.5) Am sure, the code in the first line has to be modified. Please help me with this and I will be very thankful to you. Thanks in advance __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] issues with calling predict.coxph.penal (survival) inside a function
Thanks for the reproducable example. I can confirm that it fails on my machine using survival 2-37.5, the next soon-to-be-released version, The issue is with NextMethod, and my assumption that the called routine inherited everything from the parent, including the environment chain. A simple test this AM showed me that the assumption is false. It might have been true for Splus. Working this out may take some time -- every other one of my wrestling matches with predict inside a function has -- and there is a reasonable chance that it won't make this already overdue release. In the meantime, here is a workaround that I have sometimes used in other situations. Inside your function do the following: fit a new coxph model with fixed coefficients, and do prediction on that. myfun - function(oldfit, subset) { newX - model.matrix(oldfit)[subset,] newY - oldfit$y[subset] newfit - coxph(newY ~ newX, iter=0, init=coef(oldfit)) newfit$var - oldfit$var predict(newfit) } If the subset is all of a particular strata, as you indicated, then all of the predictions will be correct. If not, then those that make use of the the baseline hazard (type= expect) will be incorrect but all others are ok. Terry Therneau On 11/14/2013 05:00 AM, r-help-requ...@r-project.org wrote: Hello everyone, I got an issue with calling predict.coxph.penal inside a function. Regarding the context: My original problem is that I wrote a function that uses predict.coxph and survfit(model) to predict a lot of survival-curves using only the basis-curves for the strata (as delivered by survfit(model) ) and then adapts them with the predicted risk-scores. Because there are cases where my new data has strata which didn't exist in the original model I exclude them, using a Boolean vector inside the function. I end up with a call like this: predict (coxph_model, newdata[subscript_vector,] ) This works fine for coxph.model, but when I fit a model with a spline (class coxph.penal), I get an error: Error in `[.data.frame`(newdata, [subscript_vector, ) : object '[subscript_vector ' not found I suppose this is because of NextMethod, but I am not sure how to work around it. I also read a little bit about all those matching-and-frame-issues, But must confess I am not really into it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival analysis with truncated data
I think that your data is censored, not truncated. For a fault introduced 1/2005 and erased 2/2006, duration = 13 months For a fault introduced 4/2010 and still in existence at the last observation 12/2010, duration 8 months. For a fault introduced before 2004, erased 3/2005, in a machine installed 2/1998, the duration is somewhere between 15 and 87 months. For a fault introduced before 2004, smachine installed 5/2000, still present 11/2010 at last check, the duration is 126 months. For type=interval2 the data would be (13,13), (8,NA), (15,87), (126, NA). Terry T. On 11/14/2013 05:00 AM, r-help-requ...@r-project.org wrote: Hi, I would like to know how to handle truncated data. My intend is to have the survival curve of a software fault in order to have some information about fault lifespan. I have some observations of a software system between 2004 and 2010. The system was first released in 1994. The event considered is the disappearance of a software fault. The faults can have been introduced at any time, between 1994 and 2010. But for fault introduced before 2004, there is not mean to know their age. I used the Surv and survfit functions with type interval2. For the faults that are first observed in 2004, I set the lower bound to the lifespan observed between 2004 and 2010. How could I set the upper bound ? Using 1994 as a starting point to not seems to be meaningful. Neither is using only the lower bound. Should I consider another survival estimator ? Thanks in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to obtain nonparametric baseline hazard estimates in the gamma frailty model?
I have responded to this particular misconception so often I begin to grow grumpy about it (not the particular fault of YH). The cumulative hazard function from fit - coxph( some model) sfit - survfit(fit, newdata= set of covariate values) gives the survival curve and cumulative hazard for that particular set of covariate values. There is nothing special about a baseline hazard. Any cumulative hazard is for some particular set of covariate values, including all values =0 which is what most textbooks refer to as the baseline. The survfit routine will by default return one centered at the means of the predictor variables, simply because there is less round off error if one stays near the center. Any baseline is as good as any other. cum haz at covariate values z = cumulative haz for values x * exp(beta * (z-x)) Note that for a random effect, the survfit routine uses 0 as the centering value. Terry Therneau On 11/05/2013 05:00 AM, r-help-requ...@r-project.org wrote: Hi Dr. Therneau, Yes, -log(sfit$surv) gives me the cumulative hazard but not the baseline cumulative hazard. I know that Nielsen and Klein have SAS Macros to get such estimates by using EM approach. I'm wondering if I can obtain the baseline hazard estimates from coxph() for gamma frailty model since I think coxph() is a very powerful function. I feel there may be some way that I don't know. Thanks, YH __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to obtain nonparametric baseline hazard estimates in the gamma frailty model?
On 11/03/2013 05:00 AM, r-help-requ...@r-project.org wrote: I can easily get the other parameter estimates by using coxph() but don't know how to get the baseline hazard estimates from it. Does anyone know how to obtain nonparametric baseline hazard estimates in the gamma frailty model? Thanks, YH I don't see what the problem is. fit1 - coxph(Surv(time, status) ~ age + ph.ecog + frailty(inst), data=lung) sfit - survfit(fit1) plot(sfit) Please give an actual example of the problem. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Re: How to obtain nonparametric baseline hazard estimates in the gamma frailty model?
Original Message Subject: Re: How to obtain nonparametric baseline hazard estimates in the gamma frailty model? Date: Mon, 04 Nov 2013 17:27:04 -0600 From: Terry Therneau therneau.te...@mayo.edu To: Y yuhan...@gmail.com The cumulative hazard is just -log(sfit$surv). The hazard is essentially a density estimate, and that is much harder. You'll notice that everyone does CDF curves for survival data ( Kaplan-Meier = estimate of 1-CDF), but no one does histograms, which estimate a density. That isn't because we wouldn't like density estimates. Terry T. On 11/04/2013 04:47 PM, Y wrote: Hi Dr. Therneau, Thanks very much for your kind help! Does survfit() just give me the survival curve? What I wanted is the baseline hazard estimates, i.e., lambda_{0} (t). How can I obtain this estimate from coxph()? Or using basehaz()? Thanks, YH On Mon, Nov 4, 2013 at 5:29 PM, Terry Therneau thern...@mayo.edu mailto:thern...@mayo.edu wrote: On 11/03/2013 05:00 AM, r-help-requ...@r-project.org mailto:r-help-requ...@r-project.org wrote: I can easily get the other parameter estimates by using coxph() but don't know how to get the baseline hazard estimates from it. Does anyone know how to obtain nonparametric baseline hazard estimates in the gamma frailty model? Thanks, YH I don't see what the problem is. fit1 - coxph(Surv(time, status) ~ age + ph.ecog + frailty(inst), data=lung) sfit - survfit(fit1) plot(sfit) Please give an actual example of the problem. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to obtain restricted estimates from coxph()?
On 10/16/2013 05:00 AM, r-help-requ...@r-project.org wrote: Hello, I'm trying to use coxph() function to fit a very simple Cox proportional hazards regression model (only one covariate) but the parameter space is restricted to an open set (0, 1). Can I still obtain a valid estimate by using coxph function in this scenario? If yes, how? Any suggestion would be greatly appreciated. Thanks!!! Easily: 1. Fit the unrestricted model. If the solution is in 0-1 you are done. 2. If it is outside, fix the coefficient. Say that the solution is 1.73, then the optimal solution under contraint is 1. Redo the fit adding the paramters init=1, iter=0. This forces the program to give the loglik and etc for the fixed coefficient of 1.0. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] frailtypack
I can't comment on frailtypack issues, but would like to mention that coxme will handle nested models, contrary to the statement below that frailtypack is perhaps the only for nested survival data. To reprise the original post's model cgd.nfm - coxme(Surv(Tstart, Tstop, Status) ~ Treatment + (1 | Center/ID), data=cgd.ag) And a note to the poster-- you should reprise the original message to which you are responding. Terry Therneau On 10/09/2013 05:00 AM, r-help-requ...@r-project.org wrote: Hello, I am encountering very similar problems with frailtypack (only, it seems, 3 years later). I would be incredibly grateful if you would be willing to share any solutions you happened upon for the problem you mention. I have a datafile that is even longer than yours (over 15000 events). Events follow a two-level nesting structure. When I attempt to run the nested survival model in frailtypack the program either stops running entirely or produces an error message. I have contacted the frailtypack maintainer but have not yet received a response. It seems that frailtypack is perhaps the only widely-available platform for the analysis of nested survival data, so the seeming bugginess of this program is troubling. Please let me know if you ended up coming up with a solution to your problem or, alternatively, another program able to run the nested analysis. Thank you so much! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interperting results of glmnet and coxph plot, Brier score and Harrel's C-Index - am I doing something wrong ???
To elaborate on Frank's response, the analysis plan of 1. Look at the data and select important variables 2. Put that truncated list into your favorite statistic procedure 3. Ask - are the p-values (c-statistic, coefficients, .) reliable? is a very old plan. The answer to the last question is always NO. The traditional step 1 involved graphs and t-tests, but replacing it by something modern does not change the overall outcome. Terry T __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Should I wrap more package examples in \dontrun{} ?
To give a specific example, the simple code for my test suite is given at the bottom of this message. A simpler (simple-minded maybe) approach than creating a new packge for testing. I now run this on the survival package every time that I submit a new version to CRAN. It takes a while, since there are over 200 dependencies. It creates a file progress containing each package name as it is run folllowed by either Ok or Failed along with a directory tests containing the results. Almost every run generates 1-3 hits. I have not automated this further because many runs also lead to exceptions, often packages that won't load because I don't have some ancillary piece of software installed that they depend on. (I can't seem to get JAVA set up sufficient to satisfy everyone, for example, and have very low motivation to work harder at the task.) And a small number have made it to the bad actors I give up don't even bother to test list. Note that any package I want to fully test was installed on this local machine using install.packages(xxx, dependencies=TRUE, INSTALL_opts=--install-tests) where xxx is the name of the package. Terry T. On 09/04/2013 05:00 AM, r-help-requ...@r-project.org wrote: n 03/09/2013 1:53 PM, Hadley Wickham wrote: As a user of your package, I would find it irritating if example(foo) didn't run anything. It would be more irritating (and would indicate sloppiness on your part) if the examples failed when I cut and pasted them. These both suggest leaving the examples running. As the author of your package, it sounds as though you find it quite irritating when other authors break your code. Isn't the right solution to this to work with the other package authors to come up with code that is unlikely to break? If that's not possible, then maybe don't use those packages that cause you trouble. It was my understanding that package authors are responsible for not breaking other CRAN packages without warning. For example, before I release a new version of plyr or ggplot2, I run R CMD check on every package that depends on my package. I then let the maintainers know if something is broken - sometimes it's because I introduced a bug, and other times it's because I'm enforcing a stricter check than I did previously It sounds as though you're doing the right thing. Can you describe how you determine the set of packages to check, and how you do your checks? It would be great if we could convince everyone to follow those steps. Duncan Murdoch tmt% cat checkdeps.R require(tools) # First set a repository to look at #chooseCRANmirror() # do it graphically #chooseBioCmirror() options(repos=c(CRAN=http://streaming.stat.iastate.edu/CRAN/;, BioC=http://bioconductor.org/packages/2.11/bioc/;)) # This function is provided by Uwe Wigges reverse - function(packages, which = c(Depends, Imports, LinkingTo), recursive = FALSE) { description - sprintf(%s/web/packages/packages.rds, getOption(repos)[CRAN]) con - if(substring(description, 1L, 7L) == file://) file(description, rb) else url(description, rb) on.exit(close(con)) db - readRDS(gzcon(con)) rownames(db) - NULL rdepends - package_dependencies(packages, db, which, recursive = recursive, reverse = TRUE) rdepends - sort(unique(unlist(rdepends))) pos - match(rdepends, db[, Package], nomatch = 0L) db[pos, c(Package, Version, Maintainer)] } survdep - reverse(survival)[,1] # I don't want to check coxme (since I maintain a more up to date # local copy), and there are a few known bad actors avoid - c(coxme, STAR, compareGroups) survdep - survdep[is.na(match(survdep, avoid))] # Some packages may have failed to install, don't test those inplace - installed.packages()[,Package] #ones we already have missed - is.na(match(survdep, inplace)) if (any(missed)) { message(Unable to load packages , paste(survdep[missed], collapse=, ), \n) survdep - survdep[!missed] } # Do the long list of tests unlink(progress) unlink(tests, recursive=TRUE) system(mkdir tests) pfile - file(progress, open=write) for (testpkg in survdep) { z - testInstalledPackage(testpkg, outDir=tests) cat(testpkg, c(Ok, Failed)[z+1], \n, file=pfile) } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help
The tt function is documented for coxph, and you are using cph. They are not the same. On 09/03/2013 05:00 AM, r-help-requ...@r-project.org wrote: tt- function(x) { obrien- function(x) { r- rank(x) (r - 0.5)/(0.5 + length(r) - r) } unlist(tapply(x, riskset, obrien)) } hi, i am newer in R. when dealing with a survival data, i have found the variable progression was not met the PH assumption.the picture show the residual agaist time.So i use Cox model for time-depandent varibles. i use the default tt in function coxph,but when i use tt in f-cph(Surv(os$Stime,os$Status==1)~Metastasis+Surgery+Post.chem. +Age+tt(Progression)+ ALP, data=os, x=T, y=T, surv=TRUE, time.inc=60),it didn't work. i don't kown what the argriskset is .i beg your help . can you help me write down a appropriate tt expression to let me use in cph. thanks. Zhongxin Dong __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] POSIXlt and months
The help page for as.POSIXlt suggests using it as a way to extract month, day, and year. However, I can't find any documentation on the results and am a bit surprised by the month portion. An example, run about 6:21 AM on Sept 3. unlist(unclass(as.POSIXlt(Sys.time( sec min hour mday mon year wday yday 43.24545 21.0 6.0 3.0 8.0 113.0 2.0 245.0 isdst 1.0 So: it appears that I need to add 1900 to year add 1 to month but other components are as I would expect. unlist(unclass(as.POSIXlt(as.Date(1953/03/10 sec min hour mday mon year wday yday isdst 0 0 010 253 268 0 Supports a 0 origin for everything except year and mday. A pointer to formal documentation of this would make me feel easier about using the function. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coxph diagnostics
That's the primary reason for the plot: so that you can look and think. The test statistic is based on whether a LS line fit to the plot has zero slope. For larger data sets you can sometimes have a significant p-value but good agreement with proportional hazards. It's much like an example from Lincoln Moses' begining statistics book (now out of print, so rephrasing from memory). Suppose that you flip a coin 10,000 times and get 5101 heads. What can you say? a. The coin is not perfectly fair (p.05). b. But it is darn close to perfect! As a referee I would be comfortable using that coin to start a football game. The Cox model gives an average hazard ratio, averaged over time. When proportional hazards holds that value is a complete summary-- nothing else is needed.When it does not hold, the average may still be useful, or not, depending on the degree of change over time. Terry Therneau On 08/13/2013 05:00 AM, r-help-requ...@r-project.org wrote: Thanks to Bert and G?ran for your responses. To answer G?ran's comment, yes I did plot the Schoenfeld residuals using plot.cox.zph and the lines look horizontal (slope = 0) to me, which makes me think that it contradicts the results of cox.zph. What alternatives do I have if I assume proportional assumption of coxph does not hold? Thanks! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Repeated measures Cox regression ??coxph??
Two choices. If this were a linear model, do you like the GEE approach or a mixed effects approach? Assume that subject is a variable containing a per-subject identifier. GEE approach: add + cluster(subject) to the model statement in coxph Mixed models approach: Add + (1|subject) to the model statment in coxme. When only a very few subjects have multiple events, the mixed model (random effect) approach may not be reliable, however. Multiple events per group are the fuel for estimation of the variance of the random effect, and with few of these the profile likelihood of the random effect will be very flat. You can get esssentially a random estimate of the variance of the subject effect. I'm still getting my arms around this issue, and it has taken me a long time. Frailty is an alternate label for random effects when all we have is a random intercept. Multiple labels for the same idea adds confusion, but nothing else. Terry Therneau On 07/25/2013 08:14 PM, Marc Schwartz wrote: On Jul 25, 2013, at 4:45 PM, David Winsemiusdwinsem...@comcast.net wrote: On Jul 25, 2013, at 12:27 PM, Marc Schwartz wrote: On Jul 25, 2013, at 2:11 PM, John Sorkinjsor...@grecc.umaryland.edu wrote: Colleagues, Is there any R package that will allow one to perform a repeated measures Cox Proportional Hazards regression? I don't think coxph is set up to handle this type of problem, but I would be happy to know that I am not correct. I am doing a study of time to hip joint replacement. As each person has two hips, a given person can appear in the dataset twice, once for the left hip and once for the right hip, and I need to account for the correlation of data from a single individual. Thank you, John John, See Terry's 'coxme' package: http://cran.r-project.org/web/packages/coxme/index.html When I looked over the description of coxme, I was concerned it was not really designed with this in mind. Looking at Therneau and Grambsch, I thought section 8.4.2 in the 'Multiple Events per Subject' Chapter fit the analysis question well. There they compared the use of coxph( ...+cluster(ID),,...) withcoxph( ...+strata(ID),,...). Unfortunately I could not tell for sure which one was being described as superio but I think it was the cluster() alternative. I seem to remember there are discussions in the archives. David, I think that you raise a good point. The example in the book (I had to wait to get home to read it) is potentially different however, in that the subject's eye's were randomized to treatment or control, which would seem to suggest comparable baseline characteristics for each pair of eyes, as well as an active intervention on one side where a difference in treatment effect between each eye is being analyzed. It is not clear from John's description above if there is one hip that will be treated versus one as a control and whether the extent of disease at baseline is similar in each pair of hips. Presumably the timing of hip replacements will be staggered at some level, even if there is comparable disease, simply due to post-op recovery time and surgical risk. In cases where the disease between each hip is materially different, that would be another factor to consider, however I would defer to orthopaedic physicians/surgeons from a subject matter expertise consideration. It is possible that the bilateral hip replacement data might be more of a parallel to bilateral breast cancer data, if each breast were to be tracked separately. I have cc'd Terry here, hoping that he might jump in and offer some insights into the pros/cons of using coxme versus coxph with either a cluster or strata based approach, or perhaps even a frailty based approach as in 9.4.1 in the book. Regards, Marc -- David. You also might find the following of interest: http://bjo.bmj.com/content/71/9/645.full.pdf http://www.ncbi.nlm.nih.gov/pubmed/6885 http://www.ncbi.nlm.nih.gov/pubmed/22078901 Regards, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prediction survival curves for coxph-models; how to extract the right strata per individual
It would help me to give advice if I knew what you wanted to do with the new curves. Plot, print, extract? A more direct solution to your question will appear in the next release of the code, btw. Terry T. On 07/25/2013 05:00 AM, r-help-requ...@r-project.org wrote: My problem is: I have a coxph.model with several strata and other covariables. Now I want to fit the estimated survival-curves for new data, using survfit.coxph. But this returns a prediction for each stratum per individual. So if I have 15 new individuals and 10 strata, I have 150 fitted surivival curves (or if I don't use the subscripts I have 15 predictions with the curves for all strata pasted together) Is there any possibility to get only the survival curves for the stratum the new individual belongs to? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multcomp on significant interaction in coxme model
--- most of the message deleted Performing exactly the same routine with the same data on a logistic model with family=binomial does not give this error message. So my question is, what am I missing here? Thanks, for any possible input I'm not a user of multcomp, so will follow with one question. Results from coxph and coxme do not contain an intercept column. Could this be the problem? A second possiblity is that coxme will in some cases return a bdsmatrix object as the variance matrix (to save space). In such a case fit$var - as.matrix(fit$var) might correct the problem. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lexical scoping is not what I expect
I second Ellison sentiments of almost never. One main reason is readability on later viewing. Yes, as Duncan says global variables can sometimes be handy and make functions quick to write, but using a formal argument in the call is always clearer. Terry Therneau On 06/27/2013 05:00 AM, r-help-requ...@r-project.org wrote: On 13-06-26 6:57 AM, S Ellison wrote: -Original Message- It may be helpful not to worry about the technical details, just to look at the source code defining the function: if it is defined in a place where a variable can be seen, it can see that variable. I too find R's lexical scoping rules straightforward. However, I'd say that if your code relies on lexical scoping to find something, you should probably rewrite your code. The number of times I've seen new R users get unexpected results because they haven't noticed that their function is referencing a parent environment instead of a locally defined variable or argument is past counting. Of course there are times when it's useful and sensible to have globally defined variables that can be accessed within a function. But they are very rare; as a default, I'd recommend avoiding it if at all possible. If your function needs something from outside, pass it as an argument. I would say the meaning of probably in your 2nd sentence depends quite a bit on the user. For beginners, it's almost certainly. For people who are comfortable with the concept, it's just maybe. I would agree that in most cases the only global objects that functions should reference are other functions, but small nested functions are quite safe, and it's sometimes useful to create functions in a local environment so they have persistent memory. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] censor=FALSE and id options in survfit.coxph
Yes, it is a bug. Thanks for providing a complete example. I'll look into it, but leave for a week's vacation in a couple of hours and have some other pressing tasks. Terry T. Terry, I recently noticed the censor argument of survfit. For some analyses it greatly reduces the size of the resulting object, which is a nice feature. However, when combined with the id argument, only 1 prediction is made. Predictions can be made individually but I'd prefer to do them all at once if that change can be made. Chris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lexical scoping is not what I expect
Duncan, I disagree with Duncan was not at all the intent of my note, but on rereading it does have that flavor. Chastisement accepted. Due to the documentation angle I'd simply change your original maybe to sometimes maybe. A bit more caution but the same message. Terry T. On 06/27/2013 07:24 AM, Duncan Murdoch wrote: On 13-06-27 8:18 AM, Terry Therneau wrote: I second Ellison sentiments of almost never. One main reason is readability on later viewing. Yes, as Duncan says global variables can sometimes be handy and make functions quick to write, but using a formal argument in the call is always clearer. I didn't say that. I said that in most cases global variables other than functions should not be used. I would generalize that a little: global constants are okay, changing globals is usually a bad idea. Duncan Murdoch Terry Therneau On 06/27/2013 05:00 AM, r-help-requ...@r-project.org wrote: On 13-06-26 6:57 AM, S Ellison wrote: -Original Message- It may be helpful not to worry about the technical details, just to look at the source code defining the function: if it is defined in a place where a variable can be seen, it can see that variable. I too find R's lexical scoping rules straightforward. However, I'd say that if your code relies on lexical scoping to find something, you should probably rewrite your code. The number of times I've seen new R users get unexpected results because they haven't noticed that their function is referencing a parent environment instead of a locally defined variable or argument is past counting. Of course there are times when it's useful and sensible to have globally defined variables that can be accessed within a function. But they are very rare; as a default, I'd recommend avoiding it if at all possible. If your function needs something from outside, pass it as an argument. I would say the meaning of probably in your 2nd sentence depends quite a bit on the user. For beginners, it's almost certainly. For people who are comfortable with the concept, it's just maybe. I would agree that in most cases the only global objects that functions should reference are other functions, but small nested functions are quite safe, and it's sometimes useful to create functions in a local environment so they have persistent memory. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survreg with measurement uncertainties
I will assume that you are talking about uncertainty in the response. Then one simple way to fit the model is to use case weights that are proprional to 1/variance, along with +cluster(id) in the model statement to get a correct variance for this case. In linear models this would be called the White or Horvitz-Thompsen or GEE working independence variance estimate, depending on which literature you happen to be reading (economics, survey sampling, or biostat). Now if you are talking about errors in the predictor variables, that is a much harder problem. Terry Therneau On 06/12/2013 05:00 AM, Kyle Penner wrote: Hello, I have some measurements that I am trying to fit a model to. I also have uncertainties for these measurements. Some of the measurements are not well detected, so I'd like to use a limit instead of the actual measurement. (I am always dealing with upper limits, i.e. left censored data.) I have successfully run survreg using the combination of well detected measurements and limits, but I would like to include the measurement uncertainty (for the well detected measurements) in the fitting. As far as I can tell, survreg doesn't support this. Does anyone have a suggestion for how to accomplish this? Thanks, Kyle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival aareg problem
On 06/05/2013 12:33 AM, r-help-requ...@r-project.org wrote: Dear friends - I'm on windows 7, R 2.15.2 when I run the example for aareg in survival package I see this: plot(lfit[4], ylim=c(-4,4)) error in xy.coords(x, y, xlabel, ylabel, log) : 'x' is a list, but does not have components 'x' and 'y' Is that a matter of an old R? Best wishes Troels Ring, MD Aalborg, Denmark It's an error in the survival package: the line S3method('[', aareg) was missing from the NAMESPACE file. Without it R doesn't find the correct code to perform the subscripting. (In earlier versions of R a declaration of every S3 method was not necessary.) You are the first to run into this. This will be fixed in the next (soon) release. In the meantime, simply read in a copy of the subscripting code into your session, either with source() or cut and paste. The code is below. Terry Therneau [.aareg - function(x, ..., drop=FALSE) { if (!inherits(x, 'aareg')) stop (Must be an aareg object) i - ..1 if (is.matrix(x$coefficient)) { x$coefficient - x$coefficient[,i, drop=drop] x$tweight - x$tweight[,i,drop=drop] } else stop(Subsripting impossible, coefficient component not a matrix) if (!is.null(x$dfbeta)){ x$dfbeta - x$dfbeta[,i,,drop=drop] x$test.var2 - x$test.var2[i,i,drop=drop] } x$test.statistic - x$test.statistic[i, drop=drop] x$test.var - x$test.var[i,i,drop=drop] x } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Mixed effects model with a phylogenetic tree/ distance, matrix as a random effect
Take a look at lmekin() in the coxme package. The motivating data set for my development of coxme was the Minnesota Family Breast Cancer project: 24050 subjects in 462 families. The random effect is an intercept per subject with sigma^2 K as its variance where K is the kinship matrix (1 for self-self, .5 for parent-child or sib-sib, .25 for uncle-neice, etc). lmekin is a linear models front end to the same underlying routines. I think you want lmekin(y ~ x1 + x2 + (1| subject), data=yourdata, varlist= D) or some such, where D is the similarity or correlation form of you distance matrix. A downside is that lmekin is sort of the poor cousin to comxe -- with finite time I've never gotton around to writing predict, residuals, plot, ... methods for it. The basic fit is fine though. Terry Therneau (In general I agree with Bert Ben to try the other list, but I don't happen to read it.) On 06/04/2013 05:00 AM, r-help-requ...@r-project.org wrote: Hi, I'm trying to build a mixed-effects model in which I'd like to include either a distance matrix or a phylogenetic tree as a random effect. The troubles I've had are that: 1. Function lmer() in package lme4 only accepts a data frame column as a random factor and not a distance matrix. 2. Function MCMCglmm() in package MCMCglmm only accepts a rooted and ultrametric phylogenetic tree as a pedigree argument while my tree is neither (and for various reasons I cannot construct one or coerce mine to be a rooted, ultrametric tree). Is there any way around it? I'd appreciate mostly a solution to problem 1. Roey __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to compute a P-value for a complex mixture of chi-squared distributions in R
You need to be more explicit about what you are doing. For this problem: y = (x1 + x2)/2 where x1 and x2 are chi-square random variables, you want to use the pchisqsum() routine found in the survey package. This is not a trivial computation. For the alternate problem where y is a random choice of either x1 or x2 y = ifelse(z, x1, x2) z is binomial and x1, x2 are chisq, then the suggestion by Peter Dalgaard is correct. Which of these two are you trying to solve? Terry Therneau On 06/02/2013 05:00 AM, r-help-requ...@r-project.org wrote: Em 01-06-2013 05:26, Tiago V. Pereira escreveu: Hello, R users! I am struggling with the following problem: I need to compute a P-value for a mixture of two chi-squared distributions. My P-value is given by: P = 0.5*prob(sqrt(chi2(1))= x) + 0.5*prob(sqrt(chi2(2))= x) In words, I need to compute the p-value for 50?50 mixture of the square root of a chi-squared random variable with 1 degree of freedom and the square root of a chi-squared with two degrees of freedom. Although I can quickly simulate data, the P-values I am looking for are at the tail of the distribution, that is, alpha levels below 10^-7. Hence, simulation is not efficient. Are you aware of smart approach? All the best, Tiago __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] counting process in survival
It is hard to know exactly what you mean with such a generic question. If you mean treat survival as a counting process, then the answer is yes. The survival package in S (which is the direct ancestor of the Splus package, which is the direct ancestor of the R package) was the very first to do this. I created the feature in 1984. Terry Therneau On 05/31/2013 05:00 AM, r-help-requ...@r-project.org wrote: HiI have a question, Is there a package to do counting process in survival analysis with R? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tt function and coxph
On 05/18/2013 05:00 AM, r-help-requ...@r-project.org wrote: hi all this command used tt function for all variables. How can i define a different function for each variable? exCox.all-coxph(Surv(SURVT,STATUS) ~ RX+LOGWBC+SEX+tt(RX)+tt(LOGWBC)+tt(SEX), data=rem.data,tt=function(x,t,...) log(t)*x)) thank you There is currently no way to do what you ask. Could you give me an example of why it is necessary? I'd never thought of adding such a feature. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R help - bootstrap with survival analysis
This comes up regularly. Type ?print.survfit and look at the comments there under value. Terry T. - begin included message Hi, I'm not sure if this is the proper way to ask questions, sorry if not. But here's my problem: I'm trying to do a bootstrap estimate of the mean for some survival data. Is there a way to specifically call upon the rmean value, in order to store it in an object? I've used print(...,print.rmean=T) to print the summary of survfit, but I'm not sure how to access only rmean because it does not show up under attributes for survfit. Thanks for any help in advance! Fayaaz Khatri __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing two different 'survival' events for the same subject using survdiff?
-Original Message- I have a dataset which for the sake of simplicity has two endpoints. We would like to test if two different end-points have the same eventual meaning. To try and take an example that people might understand better: Lets assume we had a group of subjects who all received a treatment. The could stop treatment for any reason (side effects, treatment stops working etc). Getting that data is very easy. Measuring if treatment stops working is very hard to capture... so we would like to test if duration on treatment (easy) is the same as time to treatment failure (hard). --- End The problem you describe is known as surrogate endpoints and addressing it is harder than you think. You will need to look in the literature to gain an understanding of the issues before you proceed. Your question is an important one and lots of folks have thought about it more deeply than I. Cohn JN (2004). Introduction to Surrogate Markers. Circulation (American Heart Association) 109 (25 Suppl 1): IV20–1. doi:10.1161/01.CIR.133441.05780.1d Fleming T, David D. Surrogate End Points in Clinical Trials: Are We Being Misled? Ann Intern Med. 1996 Oct 1;125(7):605-13 Terry T. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble Computing Type III SS in a Cox Regression
You've missed the point of my earlier post, which is that type III is not an answerable question. 1. There are lots of ways to compare Cox models, LRT is normally considered the most reliable by serious authors. There is usually not much difference between score, Wald, and LRT tests though, and the other two are more convenient in many situations. 2. Type III is a question that can't be addressed. SAS prints something out with that label, but since they don't document what it is, and people with in-depth knowlegde of Cox models (like me) cannot figure out what a sensible definition could actually be, there is nowhere to go. How to do this in R can't be answered. (It has nothing to do with interactions.) 3. If you have customers who think that the earth is flat, global warming is a conspiracy, or that type III has special meaning this is a re-education issue, and I can't much help with that. Terry T. On 04/25/2013 07:59 AM, Paul Miller wrote Hi Dr. Therneau, Thanks for your reply to my question. I'm aware that many on the list do not like type III SS. I'm not particularly attached to the idea of using them but often produce output for others who see value in type III SS. You mention the problems with type III SS when testing interactions. I don't think we'll be doing that here though. So my type III SS could just as easily be called type II SS I think. If the SS I'm calculating are essentially type II SS, is that still problematic for a Cox model? People using type III SS generally want a measure of whether or not a variable is contributing something to their model or if it could just as easily be discarded. Is there a better way of addressing this question than by using type III (or perhaps type II) SS? A series of model comparisons using a LRT might be the answer. If it is, is there an efficient way of implementing this approach when there are many predictors? Another approach might be to run models through step or stepAIC in order to determine which predictors are useful and to discard the rest. Is that likely to be any good? Thanks, Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Trouble Computing Type III SS in a Cox Regression
I should hope that there is trouble, since type III is an undefined concept for a Cox model. Since SAS Inc fostered the cult of type III they have recently added it as an option for phreg, but I am not able to find any hints in the phreg documentation of what exactly they are doing when you invoke it. If you can unearth this information, then I will be happy to tell you whether a. using the test (whatever it is) makes any sense at all for your data set b. if a is true, how to get it out of R I use the word cult on purpose -- an entire generation of users who believe in the efficacy of this incantation without having any idea what it actually does. In many particular instances the SAS type III corresponds to a survey sampling question, i.e., reweight the data so that it is balanced wrt factor A and then test factor B in the new sample. The three biggest problems with type III are that 1: the particular test has been hyped as better when in fact it sometimes is sensible and sometimes not, 2: SAS implemented it as a computational algorithm which unfortunately often works even when the underlying rationale does not hold and 3: they explain it using a notation that completely obscures the actual question. This last leads to the nonsense phrase test for main effects in the presence of interactions. There is a survey reweighted approach for Cox models, very closely related to the work on causal inference (marginal structural models), but I'd bet dollars to donuts that this is not what SAS is doing. (Per 2 -- type III was a particular order of operations of the sweep algorithm for linear models, and for backwards compatability that remains the core definition even as computational algorthims have left sweep behind. But Cox models can't be computed using the sweep algorithm). Terry Therneau On 04/24/2013 12:41 PM, r-help-requ...@r-project.org wrote: Hello All, Am having some trouble computing Type III SS in a Cox Regression using either drop1 or Anova from the car package. Am hoping that people will take a look to see if they can tell what's going on. Here is my R code: cox3grp- subset(survData, Treatment %in% c(DC, DA, DO), c(PTNO, Treatment, PFS_CENSORED, PFS_MONTHS, AGE, PS2)) cox3grp- droplevels(cox3grp) str(cox3grp) coxCV- coxph(Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data=cox3grp, method = efron) coxCV drop1(coxCV, test=Chisq) require(car) Anova(coxCV, type=III) And here are my results: cox3grp- subset(survData, + Treatment %in% c(DC, DA, DO), + c(PTNO, Treatment, PFS_CENSORED, PFS_MONTHS, AGE, PS2)) cox3grp- droplevels(cox3grp) str(cox3grp) 'data.frame': 227 obs. of 6 variables: $ PTNO: int 1195997 104625 106646 1277507 220506 525343 789119 817160 824224 82632 ... $ Treatment : Factor w/ 3 levels DC,DA,DO: 1 1 1 1 1 1 1 1 1 1 ... $ PFS_CENSORED: int 1 1 1 0 1 1 1 1 0 1 ... $ PFS_MONTHS : num 1.12 8.16 6.08 1.35 9.54 ... $ AGE : num 72 71 80 65 72 60 63 61 71 70 ... $ PS2 : Ord.factor w/ 2 levels YesNo: 2 2 2 2 2 2 2 2 2 2 ... coxCV- coxph(Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data=cox3grp, method = efron) coxCV Call: coxph(formula = Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data = cox3grp, method = efron) coef exp(coef) se(coef) z p AGE0.00492 1.005 0.00789 0.624 0.530 PS2.L -0.34523 0.708 0.14315 -2.412 0.016 Likelihood ratio test=5.66 on 2 df, p=0.0591 n= 227, number of events= 198 drop1(coxCV, test=Chisq) Single term deletions Model: Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2 DfAICLRT Pr(Chi) none 1755.2 AGE 1 1753.6 0.3915 0.53151 PS2 1 1758.4 5.2364 0.02212 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 require(car) Anova(coxCV, type=III) Analysis of Deviance Table (Type III tests) LR Chisq Df Pr(Chisq) AGE 0.3915 10.53151 PS2 5.2364 10.02212 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Both drop1 and Anova give me a different p-value than I get from coxph for both my two-level ps2 variable and for age. This is not what I would expect based on experience using SAS to conduct similar analyses. Indeed SAS consistently produces the same p-values. Namely the ones I get from coxph. My sense is that I'm probably misusing R in some way but I'm not sure what I'm likely to be doing wrong. SAS prodcues Wald Chi-Square results for its type III tests. Maybe that has something to do with it. Ideally, I'd like to get type III values that match those from coxph. If anyone could help me understand better, that would be greatly appreciated. Thanks, Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide
Re: [R] Weighted Kaplan-Meier estimates with R
There are two ways to view weights. One is to treat them as case weights, i.e., a weight of 3 means that there were actually three identical observations in the primary data, which were collapsed to a single observation in the data frame to save space. This is the assumption of survfit. (Most readers of this list will be too young to remember when computer memory was so small that we had to use tricks like this.) The second assumption is that the weights are sampling weights and a Horvitz-Thompsen like estimator should be used for variance. This is the assumption of the svykm program in the survey package. It appears you want the second behavior. Terry Therneau On 03/26/2013 06:00 AM, r-help-requ...@r-project.org wrote: As part of a research paper, I would like to draw both weighted and unweighted Kaplan-Meier estimates, the weight being the ?importance? of the each project to the mass of projects whose survival I?m trying to estimate. I know that the function survfit in the package survival accepts weights and produces confidence intervals. However, I suspect that the confidence intervals may not be correct. The reason why I suspect this is that depending on how I define the weights, I get very different confidence intervals, e.g. require(survival) s- Surv(c(50,100),c(1,1)) sf- survfit(s~1,weights=c(1,2)) plot(sf) vs. require(survival) s- Surv(c(50,100),c(1,1)) sf- survfit(s~1,weights=c(100,200)) plot(sf) Any suggestions would be more than welcome! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] confidence interval for survfit
The first thing you are missing is the documentation -- try ?survfit.object. fit - survfit(Surv(time,status)~1,data) fit$std.err will contain the standard error of the cumulative hazard or -log(survival) The standard error of the survival curve is approximately S(t) * std(hazard), by the delta method. This is what is printed by the summary function, because it is what user's expect, but it has very poor performance for computing confidence intervals. A much better one is exp(-1* confidence interval for the cumulative hazard), which is the default. In fact there are lots of better ones whose relative ranking depends on the details of your simulation study. About the only really consistent result is that anything thoughtful beats S(t) +- 1.96 se(S), easily. The default in R is the one that was best in the most recent paper I had read at the time I set the default. If I were to rank them today using an average over all the comparison papers it would be second or third, but the good methods are so close that in practical terms it hardly matters. Terry Therneau On 03/15/2013 06:00 AM, r-help-requ...@r-project.org wrote: Hi, I am wondering how the confidence interval for Kaplan-Meier estimator is calculated by survfit(). For example,? summary(survfit(Surv(time,status)~1,data),times=10) Call: survfit(formula = Surv(rtime10, rstat10) ~ 1, data = mgi) ?time n.risk n.event survival std.err lower 95% CI upper 95% CI ?? 10 ?? 168? 55??? 0.761? 0.0282??? 0.707??? 0.818 I am trying to reproduce the upper and lower CI by using standard error. As far I understand, the default method for survfit() to calculate confidence interval is on the log survival scale, so: upper CI = exp(log(0.761)+qnorm(0.975)*0.0282) = 0.804 lower CI = exp(log(0.761)-qnorm(0.975)*0.0282) = 0.720 they are not the same as the output from survfit(). Am I missing something? Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing Cox model with Competing Risk model
-- begin included message -- I have a competing risk data where a patient may die from either AIDS or Cancer. I want to compare the cox model for each of the event of interest with a competing risk model. In the competing risk model the cumulative incidence function is used directly. -end inclusion --- If you do want to persue the Fine-Gray model I would suggest using software that already exists. Find the Task Views tab on CRAN, and follow it to survival and then look at the competing risks section. There is a lot to offer. I would trust it more than rolling your own function. As an aside, modeling the subdistribution function is ONE way of dealing with competing risks, but not everyone thinks that it is the best way to proceed. The model corresponds to a biology that I find unlikely, though it makes for nice math. Since the alternative is discussed in a vignette that I haven't-yet-quite-written we won't persue that any further, however. :-) Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survfit plot question
On 03/05/2013 05:00 AM, r-help-requ...@r-project.org wrote: Hello, I create a plot from a coxph object called fit.ads4: plot(survfit(fit.ads4)) Questions: 1. What is the cross mark in the plot ? 2. How does the cross mark in the plot relate to either the rmean or the median from survfit ? Try help(plot.survfit). Read the description of mark.time. 3. What is the meaning of the restricted mean ? The upper limit noted in the output is the end of the observation period (i.e., it is always the Stop value in the Censored observation) help(print.survfit) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Results from clogit out of range?
I'm late to this discussion, but let me try to put it in another context. Assume that I wanted to know whether kids who live west of their school or east of their shool are more likely to be early (some hypothesis about walking slower if the sun is in their eyes). So I create a 0/1 variable east/west and get samples of 10 student arrival times at each of 100 different schools. Fit the model lm(arrive ~ factor(school) + east.west) where arrive is in some common scale like minutes since midnight. Since different schools could have different starting times for their first class we need an intercept per school. Two questions: 1. Incremental effect: the coefficient of east/west measures the incredmental effect across all schools. With n of 1000 it is likely estimated with high precision. 2. Absolute: predict the average arrival time (on the clock) for students. Conditional logistic is very like this. We have a large number of strata (schools) with a small number of observations in each (often only 2 per strata). One can ask incremental questions about variables common to all strata, but absolute prediction is pretty worthless. a. You can only do it for schools (strata) that have already been seen and b. there are so few subjects in each of them that the estimates are very noisy. The default prediction from clogit is focused on questions of type 1. The documentation doesn't even bother to mention predictions of type 2, which would be probabilities of events. I can think of a way to extract such output from the routine (being the author gives some insight), but why would I want to? Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cox model convergence
On 02/16/2013 05:00 AM, r-help-requ...@r-project.org wrote: Then I perform cox regression as follows m2_1-coxph(Surv(X_t0,X_t, vlsupp) ~ nvp + as.factor(cd4pccat) + as.factor(vlcat) + as.factor(agecat) + as.factor(whostage) + as.factor(hfacat) + as.factor(wfacat) + as.factor(wfhcat) + as.factor(resistance) + as.factor(postrantb) + cluster(id),data=myimp$imputations[[1]],method=breslow,robust=TRUE) summary(m2_1) The I get the following eWarning message: In fitter(X, Y, strats, offset, init, control, weights = weights, : Ran out of iterations and did not converge Failure to converge is highly unusual for coxph unless the model is near singular. A good rule for Cox models is to have 10-20 events for each coefficient. When models get below 2 events/coef the results can be unreliable, both numerically and biologically, and some version of a shrinkage model is called for. What are the counts for your data set? A vector of inital values, if supplied, needs to be of the same length as the coefficients. Make it the same length and in the same order as the printed coefficients from your run that did not converge. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NA/NaN/Inf in foreign function call (arg 6) error from coxph function
The NaN/Inf message is almost certainly from a singular fit. I would be interested in a copy of the data set to test it: data sets that cause this are rare, and help me to tune the convergence/signularity criteria in the program. Note: it is much easier to use coxph(survobj ~ therapy + ReceptorA + ReceptorB, data=sample.data) than to put sample.data$ in front of every variable name; and easier to read as well. Terry Therneau (author of coxph function) On 02/14/2013 05:00 AM, r-help-requ...@r-project.org wrote: I am trying to fit a multivariate Cox proportional hazards model, modelling survival outcome as a function of treatment and receptor status. The data look like below: __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Avoid re-building a vignette
I once knew the answer to this, but can't find where I found it. Say I have a vignette that I don't want R to automatically rebuild, perhaps it uses some data I can't share or takes too long to run or whatever. Say it is mypackage/vignettes/charlie.Rnw It can create the pdf from an R session that actually has the data, place that result in inst/doc/charlie.pdf, then run R CMD build --no-vignettes mypackage So far so good. But then when I do R CMD INSTALL mypackage.tar.gz the intall process tries to run Sweave, resulting in a failure. How do I get it to skip the Sweave call? Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating the cumulative incidence function
You can do this directly with the survfit function (survival verstion 2.37 or greater) If status is a factor variable with levels of censored, relapse and death fit - survfit(Surv(time, status) ~ sex, data=mydata) plot(fit, xlab=... The primary advantage of this over the cuminc package is that all of the usual options for survival plots carry forward. Terry Therneau On 02/05/2013 05:00 AM, r-help-requ...@r-project.org wrote: Hello, I have a problem regarding calculation of Cumulative Incidence Function. The event of interest is failure of bone-marrow transplantation, which may occur due to relapse or death in remission. The data set that I have consists of- lifetime variable, two indicator variables-one for relapse and one for death in remission, and the other variables are donor type (having 3 categories), disease type(having 3 categories), disease stage(having 3 categories) and karnofsky score (having 2 categories). I want to create an R function for cumulative incidence function for the event (failure of bone-marrow transplantation) due to relapse (cause-1) and also for the event (failure of bone-marrow transplantation) due to death in remission (cause-2). Can anyone help me please? Tas. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] First R package, advice
On 02/07/2013 05:00 AM, r-help-requ...@r-project.org wrote: I'd argue that there's an important distinction between documenting a function (how to use it) vs. documenting an algorithm (how it works). I think noweb can work well for describing how something works, but it's not so good for describing how to use it I'm a big fan of noweb for the source code, take a look a the coxme package for extensive use of this. That contains the most complex mathematics of any code I have written, and it is a big help to have real equations and discussion mixed in with the code. The sparse matrix algorithms aren't very transparent either so it is a plus on that front too. A big help to me whenever I look back at something. That given, I agree with the above that documentation of the algorithm and documentation of useage are disjoint tasks. The pdf file from my noweb source is of critical importance to someone changing the code, and of essentially no use to anyone else. Use vignettes for the latter. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] obtainl survival curves for single strata
Stephen, I can almost but not quite get my arms around what you are asking. A bit more detail would help. Like cph.approve = what kind of object, generated by function __ But, let me make a guess: cph is the result of coxph, and the model has both covariates and a strata You want predictioned survival curves, more than 1, of the type covariates = a, b,c, strata=1 covariates = d,e, f, strata=2, ... for arbitrary covariates and strata. Now, the predicted survival curves for different strata are different lengths. The survfit.coxph routine gets around this by being verbose: it expects you to give it covariate sets, and returns all of the strata for each covariate. This allows it to give a compact result. You can always do: newpred - survfit(cox-model-fit, newdata=something) newpred[5,17] # survival curve for the 5th strata, covariates from the 17th row of newdata But, you want to put the results into a matrix, for some pre-specifed set of time points. Take it one step further. mytimepoints - seq(0, 5*365, by=30) # every 30 days z - summary(newpred[5,17], time=mytimepoints, extend=TRUE)$surv The summary.survfit function's time argument was originally written for people who only wanted to print certain time points, but it works as well for those who only want to extract certain ones. It correctly handles the fact that the curve is a step function. Terry Therneau On 02/01/2013 05:00 AM, r-help-requ...@r-project.org wrote: What is the syntax to obtain survival curves for single strata on many subjects? I have a model based on Surv(time,response) object, so there is a single row per subject and no start,stop and no switching of strata. The newdata has many subjects and each subject has a strata and the survival based on the subject risk and the subject strata is needed. If I do newpred- survfit(cph.approve,new=newapp,se=F) I get all strata for every subject. Attempting newpred- survfit(cph.approve,new=newapp,id=CertId,se=F) Error in survfit.coxph(cph.approve, new = newapp, id = CertId, se = F) : The individual option is only valid for start-stop data newpred- survfit(cph.approve,new=newapp,indi=T,se=F) Error in survfit.coxph(cph.approve, new = newapp, indi = T, se = F) : The individual option is only valid for start-stop data Please, advise if obtaining a single strata for a basic (time,response) model is possible? Due to differing lengths of the surv for different strata this will not go in a wide data.frame without padding. Thanks everybody and have a great day. Stephen B __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predicted HR in coxph with psline
The normalization is the same as is found when you have type=terms from a gam model: each term is centered so that mean(predicted) = 0. For a simple linear term beta*age this implies that the predicted value will be 0 at the mean of age, for a polynomial or spline this does not translate to a defined centering point. The key idea is that if you were to replace age with age + 10 the figure should not change. By the way, thanks for including such clear example code for your question. Terry T. - begin included message -- I have some questions about the predicted HR in coxph function including psline covariate. If just fitting covariate as linear form, by default, the reference value for each of predictions type (linear predictor, risk and terms) is the mean covariate within strata. If the psline is specified for the covariate, what's the reference value for the predictions? I did some test code below: ### par(mfrow = c(1,3)) test = coxph(Surv(time, status) ~ age, cancer) test2 = coxph(Surv(time, status) ~ pspline(age, df=2), cancer) test3 = coxph(Surv(time, status) ~ pspline(age, df=5), cancer) pptest3=predict(test3, type=lp) pptest2= predict(test2, type=lp) pptest = predict(test, type=lp) plot(cancer$age, exp(pptest)) abline(v=mean(cancer$age)) abline(h=1) plot(cancer$age, exp(pptest2)) abline(v=mean(cancer$age)) abline(h=1) plot(cancer$age, exp(pptest3)) abline(v=mean(cancer$age)) abline(h=1) In the first model, age is just fitted as linear, and in the plot, when HR =1 when age equals to the mean of age. However, it's not the case when age is fitted as using psline. So what are the reference value of age when psline is used? Any help or idea will be very much appreciated! Best, Carrie-- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nesting in CoxPH with survival package
In a Cox model, the baseline hazard takes the place of an intercept. Try the following: dummy - rnorm(nrow(goodexp)) testfit - lm(dummy ~ ExpTemp + Stability + Period, data=goodexp, x=T) Now look at testfit$x, which is the design matrix for the linear model with an intercept. Examination of it will first of all clarify for you exactly what dummy variable coding was used for the model. If this X matrix is singular, the coxph fit will be singular in exactly the same way and at the same place. If you regress the last column of testfit$x on the others you should get a perfect fit, given the output below. Other than the intercept the X matrix within coxph is the same as that for lm; the intercept is still there in the form of a baseline hazard, it just cannot be summarized as a single coefficient. (All R modeling functions use the same internal routine to generate the design matrix.) My guess is that the row sums of testfit$x are constant, but that's just a guess. Terry Therneau PS -- use the spacebar more when showing an example. It makes it a lot easier for the rest of us to read. On 01/24/2013 05:00 AM, r-help-requ...@r-project.org wrote: Thank you for the suggestions. Just to clarify, my first question was more on what actual coding I should be using to indicate a nested variable when using the coxph() function. I asked this after consulting several times with a local statistician, but unfortunately neither of us are very familiar with R. After further consultation, I have changed the design to a 2*2 design (2 levels of ExpTemp and Stability each) with blocking (Period). I am still getting the x matrix deemed to be singular error. LOEmod3alt=coxph(LOE.fit~ExpTemp+Stability+Period,data=goodexp) Warning message: In coxph(LOE.fit ~ ExpTemp + Stability + Period, data = goodexp) : X matrix deemed to be singular; variable 5 summary(LOEmod3alt) Call: coxph(formula = LOE.fit ~ ExpTemp + Stability + Period, data = goodexp) n= 184, number of events= 105 coef exp(coef) se(coef) z Pr(|z|) ExpTemp -3.17825 0.04166 0.53105 -5.985 2.17e-09 *** StabilityStatic -0.84129 0.43115 0.20470 -4.110 3.96e-05 *** PeriodB 1.06794 2.90937 0.22859 4.672 2.98e-06 *** PeriodC 1.23853 3.45054 0.58457 2.119 0.0341 * PeriodD NANA 0.0 NA NA --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 exp(coef) exp(-coef) lower .95 upper .95 ExpTemp 0.0416624.0047 0.01471 0.118 StabilityStatic 0.43115 2.3194 0.28866 0.644 PeriodB 2.90937 0.3437 1.85877 4.554 PeriodC 3.45054 0.2898 1.0972310.851 PeriodDNA NANANA Concordance= 0.833 (se = 0.03 ) Rsquare= 0.591 (max possible= 0.995 ) Likelihood ratio test= 164.4 on 4 df, p=0 Wald test= 111.1 on 4 df, p=0 Score (logrank) test = 179.9 on 4 df, p=0 with(redo, table(LOEStatusfull, Period,ExpTemp)) , , ExpTemp = FIVE Period LOEStatusfull A B C D 0 42 0 35 0 1 4 0 11 0 , , ExpTemp = FOUR Period LOEStatusfull A B C D 0 0 0 0 2 1 0 46 0 44 As best as I can tell, none of my variables are collinear. Are there any other suggestions of how to deal with this error, or any more information I can provide to help understand why I would be getting this? Thank you for your time and your help, Katie O [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] coxph with smooth survival
For your first question -- read the manual. ?survfit.coxph will reveal the censor argument, which controls the inclusion of points where the curve does not drop. For your second, smooth is in the eye of the beholder, literally. If the reason for a smooth curve is to plot it, you need to decide how many x-values you want across the range of the plotting window, I find 50 to 100 to be enough. You also need to decide how many degrees of freedom for the spline. If you are using the interpolates for prediction then you need to think through what the statistical properties of the smoothed estimates might be. Here's an example where the default df of smooth.spline doesn't work so well testfit - coxph(Surv(time, status) ~ age + sex , data=lung, subset=(ph.ecog==2)) testcurve - survfit(testfit, data.frame(age=65, sex=2), censor=FALSE) # prediction for 65 yr old female testspl - smooth.spline(testcurve$time, testcurve$surv) plot(testcurve) xx - seq(0, 800, length=100) lines(predict(testspl, xx), col=2) Terry T. --- begin included message Hello users, I would like to obtain a survival curve from a Cox model that is smooth and does not have zero differences due to no events for those particular days. I have: sum((diff(surv))==0) [1] 18 So you can see 18 days where the survival curve did not drop due to no events. Is there a way to ask survfit to fit a nice spline for the survival?? Note: I tried survreg and it did not work, but maybe I did not do it properly?? Thank you very much. Stephen B --- end inclusion --- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] update.packages problem
I've updated to R-devel on my development machine and have lots of packages. The update.packages() script ends up with 33 failures, all due to out-of-order reloading. That is, if package abc depends on package xyz, then the reinstall of abc fails with a message that version of xyz is built before R 3.0.0: please re-install it. So I ran it a second time, and got 32 failures. There should be a better way. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot survival analysis with time dependent variables
On 01/01/2013 05:00 AM, r-help-requ...@r-project.org wrote: Dear all, Is there an implementation of Simon Makuch method of plotting the survival function with time-dependent variables. I?m only able to find event.chart in Hmisc for the purpose and I would prefer the Simon and Makuch method. I believe stata has it implemented for this purpose, but I cannot find it on CRAN. Simon R, Makuch RW. A non-parametric graphical representation of the relationship between survival and the occurrence of an event: application to responder versus non-responder bias. Statistics in Medicine 1984; 3: 35-44. Short answer: it's easy, but I don't think the result is a survival curve. Long answer: 1. Write out your data in (start, stop] form, exaclty as you would for a time-dependent Cox model. For example: id time1 time2 status response 1 0 24 1 0 2 0 15 0 0 215 43 1 1 3 0 31 0 0 ... Subject 1 has died without response at time 24. Subject 2 responded at time 15 and then died at time 45. Subject 3 is censored at 31 and has as yet not responded, etc. The time dependent Cox model is fit - coxph(Surv(time1, time2, status) ~ response, data=yourdata). Everyone agrees that this is the right model, and if you read the score test line of summary(fit) you will have the test discussed in equation 1 and 2 of the Simon and Makuch paper. (Well almost- if there are 2 deaths tied on the same day there is an n-1 vs n-2 disagreement between authors with respect to the variance formula, but the effect is trivial.) 2. Hazard functions make sense for time dependent covariates. It is what the Cox model is founded on and we all agree. Simon and Makuck now make the argument a. Hazard functions are nice, but my clients are used to survival curves b. When there aren't time-dependent covariates S(t) = exp(-H(t)) where S=survival and H=hazard c. Let's pretend that the formula in b still makes sense for this case, and apply it. The R solution for c is trivial: replace coxph with survfit in the code above. You will get their curves (with proper confidence intervals) and can make plots in the usual way. 3. I don't agree with the argument. On the positive side, if H1 = H2 (no impact of response) then the Cox model will say nothing here and the Simon curves will give a picture that agrees, as opposed to some of the more boneheaded (but common in the medical literature) curves that they show in their figures 2b and 3b. The problem is that the result is not a survival curve --- I've never figured out exactly WHAT it is. For something to be interpreted as a survival curve there needs to be a set of subjects for whom I could hold up the graph and say this is your future. Put another way, assume that you have the above curves in hand and there is a new patient sitting across the desk from you. What is your answer when he/she says Doc, which curve am I?. I tend to view these curves as primum non nocere, if faced with a client who absolutely won't bend when told the truth that there is no good survival curve for time-dependent covariates. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 118, Issue 20
I don't know of one. If building your own you could use rpart with the maxdepth=1 as the tool to find the best split at each node. Terry Therneau On 12/20/2012 05:00 AM, r-help-requ...@r-project.org wrote: Hi, I've searched R-help and haven't found an answer. I have a set of data from which I can create a classification tree using rpart. However, what I'd like to do is predefine the blank structure of the binary tree (i.e., which nodes to include) and then use a package like rpart to fit for the optimal splitting criteria at each of the predefined nodes. Does such a package exist? Thanks, Lee __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitting a gamma frailty model (coxph)
I looked at your data: table(x, cluster) 1 2 3 4 5 6 0 0 48 0 48 48 0 1 48 0 48 0 0 48 Your covariate x is perfectly predicted by the cluster variable. If you fit a fixed effects model: coxph(Surv(time, event) ~ factor(cluster) +x) then the x variable is declared redundant. When the variance of the random effect is sufficiently large, the same happens in the gamma model when the variance is sufficiently large. Your model approaches this limit, and the solution fails. As mentioned in the manual page, the coxme function is now preferred. Last, your particular error message is caused by an invalid value for sparse. I'll add a check to the program. You likely want sparse=10 to force non-sparse computation. Terry Therneau On 12/04/2012 05:00 AM, r-help-requ...@r-project.org wrote: Dear all, I have a data sethttp://yaap.it/paste/c11b9fdcfd68d02b#gIVtLrrme3MaiQd9hHy1zcTjRq7VsVQ8eAZ2fol1lUc=with 6 clusters, each containing 48 (possibly censored, in which case event = 0) survival times. The x column contains a binary explanatory variable. I try to describe that data with a gamma frailty model as follows: library(survival) mod- coxph(Surv(time, event) ~ x + frailty.gamma(cluster, eps=1e-10, method=em, sparse=0), outer.max=1000, iter.max=1, data=data) Here is the error message: Error in if (history[2, 3] (history[1, 3] + 1)) theta- mean(history[1:2, : missing value where TRUE/FALSE needed Does anyone have an idea on how to debug? Yours sincerely, Marco __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting and plotting a coxph with survfit, package(surv)
I answered a similar question yesterday: The survfit routine will produce predicted survival curves for any requested combination of the covariates in the original model. This is not the same thing as an adjusted survival curve. Confusion on this is prevalent, however. True adjustment requires a population average over the confounding factors and is closely related to the standardized incidence ratio concept found in epidemiology. To do this you need to define a poplation of ages. See chapter 10 of the book by Therneau and Grambsch for an explantion of the issues and examples of how to get the population value. It's hard to distill 20 pages down into an email message. Terry Therneau -- begin included message - I have a database with 18000 observations and 20 variables. I am running cox regression on five variables and trying to use survfit to plot the survival based on a specific variable without success. Lets say I have the following coxph: library(survival) fit - coxph(Surv(futime, fustat) ~ age + rx, data = ovarian) fit what I am trying to do is plot a survival comparing objects based on rx. Using this plot(survfit(fit, newdata=data.frame(rx =c(1:2), age=c(60)), xscale=365.25, xlab = Years, ylab=Survival)) I get the survival for patients at 60, but is there an option to get a survfit for the patients regardless of the value in variable age? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting an adjusted survival curve
First a statistical issue: The survfit routine will produce predicted survival curves for any requested combination of the covariates in the original model. This is not the same thing as an adjusted survival curve. Confusion on this is prevalent, however. True adjustment requires a population average over the confounding factors and is closely related to the standardized incidence ratio concept found in epidemiology. To answer your technical question: fit - coxph(Surv(. mysurv - survfit(fit, newdata= mydata) This will give a set of predicted curves, one for each observation in mydata. If we assume 2 treatments and 4 ethnicities, this means that there are 8 possible predicted curves. You can certainly take the curves for trt=1, white and trt=2, white, plot them together on one graph, and call this your adjusted survival curves; the mydata data set would have two observations. This is not a correct label but is certainly common. Terry Therneau On 11/26/2012 05:00 AM, r-help-requ...@r-project.org wrote: Dear R-users I am trying to make an adjusted Kaplan-Meier curve (using the Survival package) but I am having difficulty with plotting it so that the plot only shows the curves for the adjusted results. My data come from a randomised controlled trial, and I would like the adjusted Kaplan-Meier curve to only show two curves for the adjusted survival: one for those on treatment (Treatment==1) and another curve for those on placebo (Treatment==0). My problem is that when I plot the survfit of my coxph, I think it displays a curve for every single individual factor in my coxph, whereas I would like it to only display the adjusted curves for when Treatment==1 and Treatment==0. How can I do this? A simplified example of my code with only one effect-modifier is: simple.cox.ethnicity- coxph(Surv(whenfailed,failed) ~ factor(Treatment) + factor(ethnicity)) #I've my data are attached already survfit.simple.cox.ethnicity- survfit(simple.cox.ethnicity,survmat) #survmat is a data.frame that contains Treatment and ethnicity plot(survfit.simple.cox.ethnicity, col=c(red,black), main=survfit.simple.cox, xlab=survival time, ylab=propotion surviving) Thank you so much for your help. Yours gratefully, Brent Caldwell __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survfit number of variables != number of variable names
I can't reproduce the problem. Tell us what version of R and what version of the survival package. Create a reproducable example. I don't know if some variables are numeric and some are factors, how/where the surv object was defined, etc. Terry Therneau On 11/17/2012 05:00 AM, r-help-requ...@r-project.org wrote: This works ok: cox = coxph(surv ~ bucket*(today + accor + both) + activity, data = data) fit = survfit(cox, newdata=data[1:100,]) but using strata leads to problems: cox.s = coxph(surv ~ bucket*(today + accor + both) + strata(activity), data = data) fit.s = survfit(cox.s, newdata=data[1:100,]) Error in model.frame.default(data = data[1:100, ], formula = ~bucket + : number of variables != number of variable names Note that the following give rise to the same error: fit.s = survfit(cox.s, newdata=data) Error in model.frame.default(data = data, formula = ~bucket + today + : number of variables != number of variable names but if I use data implicitly, all is working fine: fit.s = survfit(cox.s) Any idea on how I could solve this? Best, and thank you, ge __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survreg gompertz
The short answer is sort of. Medium is that survreg implements the model framework found in Kalbfeisch and Prentice, The statistical analysis of failure time data, chapter, chapter 2.2. The ime variable T has f(time) ~ X' beta + sigma * W where W is an error term from some distribution and f is a fixed function (usually identity or log). Any distribution that can be shoehorned into this framework can be fit with survreg. A longer and more detailed discussion is found in Division of Biostatistics Technical Report #53, Mayo Clinic, A package for survival analysis in S. One day I need to update this and make it a vignette for the package http://mayoresearch.mayo.edu/mayo/research/biostat/techreports.cfm Terry Therneau begin included message --- Hi all, Sorry if this has been answered already, but I couldn't find it in the archives or general internet. Is it possible to implement the gompertz distribution as survreg.distribution to use with survreg of the survival library? I haven't found anything and recent attempts from my side weren't succefull so far. I know that other packages like 'eha' and 'flexsurv' offer functions similar to survreg with gompertz support. However, due to the run-time environment were this needs to be running in the end, I can't use these packages :( Same questions for the gompertz-makeham distribution. Many thanks! Matthias __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] FW: replace repeated id in a pedigree list
I use kinship package of R and it doesn't create a kinship matrix with repeated id. The kinship package is out of date, use the kinship2 and coxme packages (it was split into two parts). Then the problem you describe no longer exists -- the kinship function no longer requires each subject id to be unique across all pedigrees. pedlist - with(mydata, pedgiree(id, fa_id, mo_id, sex, famid=famid)) kmat - kinship(pedlist) fit - coxme(Surv(time, status) ~ x1 + x2 +... + (1| famid/id), varlist=kmat) The pedlist object contains all the families, plot(pedlist[4]) would pull out and plot the 4th family. The column labels of kmat will be of the form famid/id To solve your original problem (though you don't need to) create a single unified id. uid - paste(famid, id, sep='/') idlist - unique(uid) newid - match(uid, idlist) newmom - match(paste(famid, mo_id, sep='/'), idlist) newdad - match(paste(famid, fa_id, sep='/'), idlist) Terry Therneau author of kinship and coxme, but not :-) the maintainer of kinship2 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kaplan Meier Post Hoc?
No, you didn't miss anything in the survival package. I've never found post-hoc tests interesting so have little motivation to add such (and a very long to do list of things I would like to add). If you simply must have them, why not do all pairwise tests? chisq - matrix(0., 4,4) for (i in 1:4) { for (j in (1:4)[-i]) { temp - survdiff(Surv(time, status) ~ group, data=mydata, subset=(group %in% (unique(group))[c(i,j)])) chisq[i,j] - temp$chisq } } Terry Therneau On 10/25/2012 05:00 AM, r-help-requ...@r-project.org wrote: This is more of a general question without data. After doing 'survdiff', from the 'survival' package, on strata including four groups (so 4 curves on a Kaplan Meier curve) you get a chi squared p-value whether to reject the null hypothesis or not. Is there a method to followup with pairwise testing on the respective groups? I have searched the library but have come up with nothing. Perhaps I am mistaken in something here. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Expected number of events, Andersen-Gill model fit via coxph in package survival
Therneau doesn't know the answer either. The predictions are positively correlated since they all depend on the same beta-hat from the original model. The same will be true for any model: logistic, poisson, linear, ... Terry T On 10/20/2012 06:06 PM, Omar De la Cruz C. wrote: I have a follow-up question (for either Dr. Therneau, or anybody who might know). sum(zz) (see below) estimates the number of events for the cohort. Now, how can I compute a confidence interval for sum(zz)? Or a standard error? My obvious choice, square root of the sum of the squares of the standard errors for zz provided by predict.coxph, turns out to be too small. Thank you for any suggestions. Omar. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Looping survdiff
The number of recent questions from umn.edu makes me wonder if there's homework involved Simpler for your example is to use get and subset. dat - structure(.as found below var.to.test - names(dat)[4:6] #variables of interest nvar - length(var.to.test) chisq - double(nvar) for (i in 1:nvar) { tfit - survdiff(Surv(time, completion==2) ~ get(var.to.test[i]), data=dat, subset=(group==3)) chisq[i] - tfit$chisq } write.csv(data.frame(var.to.test, chisq)) On 10/19/2012 05:00 AM, r-help-requ...@r-project.org wrote: Hello, I am trying to set up a loop that can run the survdiff function with the ultimate goal to generate a csv file with the p-values reported. However, whenever I try a loop I get an error such as invalid type (list) for variable 'survival_data_variables[i]. This is a subset of my data: structure(list(time = c(1.516667, 72, 72, 25.78333, 72, 72, 72, 72, 72, 72, 1.18, 0.883, 1.15, 0.867, 72, 1.03, 72, 1.05, 72, 22.76667), group = c(2L, 1L, 3L, 3L, 3L, 4L, 4L, 1L, 3L, 3L, 3L, 3L, 4L, 3L, 3L, 4L, 3L, 4L, 3L, 4L), completion = structure(c(2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c(1, 2), class = factor), var1 = structure(c(2L, 2L, 3L, 1L, 1L, 3L, 1L, 1L, 1L, 3L, 2L, 2L, 4L, 3L, 2L, 4L, 2L, 4L, 2L, 4L), .Label = c(1, 2, 3, 4), class = factor), var2 = structure(c(3L, 3L, 1L, 1L, 2L, 4L, 3L, 3L, 2L, 4L, 2L, 1L, 2L, 1L, 2L, 2L, 4L, 4L, 2L, 3L), .Label = c(1, 2, 3, 4), class = factor), var3 = structure(c(4L, 2L, 3L, 1L, 3L, 4L, 4L, 2L, 2L, 4L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 3L, 4L, 1L), .Label = c(1, 2, 3, 4), class = factor)), .Names = c(time, group, completion, var1, var2, var3), row.names = c(NA, 20L), class = data.frame) The loop I have been trying for just group 3 is: d=data.frame() for(i in 4:6){ a=assign(paste(p-value,i,sep=), survdiff(Surv(time, completion==2)~dat[i], data=dat[group==3,], rho=0)) b=as.matrix(a$chisq) d=rbind(d, b) write.csv(d, file=C:/.../junk.csv, quote=FALSE)} Perhaps I am making this more difficult than it needs to be. Thanks for any help, Charles __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kaplan-Meier plotting quirks
Better would be to use interval censored data. Create your data set so that you have (time1, time2) pairs, each of which describes the interval of time over which the tag was lost. So an animal first captured at time 10 sans tag would be (0,10); with tag at 5 and without at 20 would be (5,20), and last seen with tag at 30 would be (30, NA). Then survit(Surv(time1, time2, type='interval2') ~ 1, data=yourdata) will give a curve that accounts for interval censoring. As a prior poster suggested, if the times are very sparse then you may be better off assuming a smooth curve. Use the survreg function with the same equation as above; see help(predict.survreg) for an example of how to draw the resulting survival curve. Terry Therneau On 10/18/2012 05:00 AM, r-help-requ...@r-project.org wrote: -Original Message- From: Michael Rentz [mailto:rent0...@umn.edu] Sent: Tuesday, October 16, 2012 12:36 PM To:r-help@r-project.org Subject: [R] R Kaplan-Meier plotting quirks? Hello. I apologize in advance for the VERY lengthy e-mail. I endeavor to include enough detail. I have a question about survival curves I have been battling off and on for a few months. No one local seems to be able to help, so I turn here. The issue seems to either be how R calculates Kaplan-Meier Plots, or something with the underlying statistic itself that I am misunderstanding. Basically, longer survival times are yielding steeper drops in survival than a set of shorter survival times but with the same number of loss and retention events. As a minor part of my research I have been comparing tag survival in marked wild rodents. I am comparing a standard ear tag with a relatively new technique. The newer tag clearly ?wins? using survival tests, but the resultant Kaplan-Meier plot does not seem to make sense. Since I am dealing with a wild animal and only trapped a few days out of a month the data is fairly messy, with gaps in capture history that require assumptions of tag survival. An animal that is tagged and recaptured 2 days later with a tag and 30 days later without one could have an assumed tag retention of 2 days (minimum confirmed) or 30 days (maximum possible). Both are significant with a survtest, but the K-M plots differ. A plot of minimum confirmed (overall harsher data, lots of 0 days and 1 or 2 days) yields a curve with a steep initial drop in ?survival?, but then a leveling off and straight line thereafter at about 80% survival. Plotting the maximum possible dates (same number of losses/retention, but retention times are longer, the length to the next capture without a tag, typically 25-30 days or more) does not show as steep of a drop in the first few days, but at about the point the minimum estimate levels off this one begins dropping steeply. 400 days out the plot with minimum possible estimates has tag survival of about 80%, whereas the plot with the same loss rate but longer assumed survival times shows only a 20% assumed survival at 400 days. Complicating this of course is the fact that the great majority of the animals die before the tag is lost, survival of the rodents is on the order of months. I really am not sure what is going on, unless somehow the high number of events in the first few days followed by few events thereafter leads to the assumption that after the initial few days survival of the tag is high. The plotting of maximum lengths has a more even distribution of events, rather than a clumping in the first few days, so I guess the model assumes relatively constant hazards? As an aside, a plot of the mean between the minimum and maximum almost mirrors the maximum plot. Adding five days to the minimum when the minimum plus 5 is less than the maximum returns a plot with a steeper initial drop, but then constant thereafter, mimicking the minimum plot, but at a lower final survival rate. Basically, I am at a loss why surviving longer would*decrease* the survival rate??? My co-author wants to drop the K-M graph given the confusion, but I think it would be odd to publish a survival paper without one. I am not sure which graph to use? They say very different things, while the actual statistics do not differ that greatly. I am more than happy to provide the data and code for anyone who would like to help if the above is not explanation enough. Thank you in advance. Mike. -- Michael S. Rentz PhD Candidate, Conservation Biology University of Minnesota 5122 Idlewild Street Duluth, MN 55804 (218) 525-3299 rent0...@umn.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with coxph and survfit in a stratified model with interactions
Your df object (newdata) has to have ALL the variables in it. Normally you wouldn't need the strata variable, but in this case cov1 is also a covariate. Look at the example I gave earlier. Terry T. On 10/15/2012 05:00 AM, r-help-requ...@r-project.org wrote: Many thanks for your very quick reply! I tried replacing ?~ strata(cov1) + cov1:cov2 ? with ?~ strata(cov1):cov2 ?. As a result, I get an error message saying that ?object 'cov1' not found?. I guess R refers to the data frame. Any idea how to fix the error? Complete code follows. Best regards, Roland require(survival) data(lung) # lung$cov1- as.factor(lung$ph.ecog) lung$cov2- as.factor(lung$sex) levels(lung$cov1)[levels(lung$cov1)==0]- zero levels(lung$cov1)[levels(lung$cov1)==1]- one levels(lung$cov1)[levels(lung$cov1)==2]- two levels(lung$cov1)[levels(lung$cov1)==3]- three levels(lung$cov2)[levels(lung$cov2)==1]- male levels(lung$cov2)[levels(lung$cov2)==2]- female # df- data.frame( cov2=factor(female, levels = levels(lung$cov2)) ) sCox- coxph(Surv(time, status) ~ strata(cov1):cov2, data=lung) sfCox- survfit(sCox,newdata=df) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with coxph and survfit in a stratified model, with interactions
First, here is your message as it appears on R-help. On 10/14/2012 05:00 AM, r-help-requ...@r-project.org wrote: I?m trying to set up proportional hazard model that is stratified with respect to covariate 1 and has an interaction between covariate 1 and another variable, covariate 2. Both variables are categorical. In the following, I try to illustrate the two problems that I?ve encountered, using the lung dataset. The first problem is the warning: To me, it seems that there are too many dummies generated. The second problem is the error: Please try to fix this in the future (Nabble issue?) As to the problems: handling strata by covariate interactions turns out to be a bit of a pain in the posteriorin the survival code. It would have worked, however, if you had done the following: fit - coxph(Surv(time, status) ~ strata(cov1) * cov2, data=...) or ~ strata(cov1):cov2 or ~ strata(cov1):cov2 + cov2 But by using ~ strata(cov1) + cov1:cov2 you fooled the program into thinking that there was no strata by covariate interaction, and so it did not follow the special logic necessary for that case. Second issue: The model.matrix function of R, common to nearly all the modeling functions (including coxph) tries to guess which dummy variables will be redundant, and thus can be removed from the X matrix before the fit. Such an approach is doomed to failure. I'm actually surprised at how often R guesses correctly, because until a matrix decomposition is actually performed the only thing possible is an informed guess. Your particular case gives rise to a larger than usual number of NA coefs (redundant columns), but short of building your own X matrix by hand there isn't anything to be done about it. Just ignore them. Terry Therneau [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on survival
It is easy to get a cumulative hazard curve. First, decide what values of age, a, and b you want curves for tdata- data.frame(age=55, a=2, b=6) Get the curves, there will be one for each strata in the output sfit- survfit(coxPhMod, newdata= tdata) Plot them plot(sfit, fun='cumhaz', col=1:4, xlab= etc etc) Hazard functions are something else again, estimating these rigorously is akin to density estimation. A quick and dirty method is to use smooth.spline. temp- sfit[1] #grab the first curve tfit- smooth.spline(temp$time, -log(temp$surv), df= 5) #smooth the cum haz plot(predict(tfit, deriv=1)) That value of df=5 is made up -- you need to decide for yourself how much smoothing to do. I make no claims that this is statistically well grounded, it's just a good way to get a quick idea. PS; There is no such thing as THE baseline hazard function; predictions are always for some particular value of the covariates. In a book it is sometimes useful to pick a particular set of x values as a default in order to simplify notation, often x=0, and label that as a baseline. But in actual computation all zeros is usually crazy (age=0, weight=0, blood pressure=0, etc). Terry Therneau Hi, I'm going crazy trying to plot a quite simple graph. i need to plot estimated hazard rate from a cox model. supposing the model i like this: coxPhMod=coxph(Surv(TIME, EV) ~ AGE+A+B+strata(C) data=data) with 4 level for C. how can i obtain a graph with 4 estimated (better smoothed) hazard curve (base-line hazard + 3 proportional) to highlight the effect of C. thanks!! laudan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival prediction
Dear All, I have built a survival cox-model, which includes a covariate * time interaction. (non-proportionality detected) I am now wondering how could I most easily get survival predictions from my model. My model was specified: coxph(formula = Surv(event_time_mod, event_indicator_mod) ~ Sex + ageC + HHcat_alt + Main_Branch + Acute_seizure + TreatmentType_binary + ICH + IVH_dummy + IVH_dummy:log(event_time_mod) And now I was hoping to get a prediction using survfit and providing new.data for the combination of variables I am doing the predictions: survfit(cox, new.data=new) Some comments: 1. even though it is in the SAS manual and some literature, I have myself never used X * log(time) as a fix for lack of proportionality. Is it really true that when you use fit - coxph(Surv(event_time_mod, event_indicator_mod) ~ Sex + ageC + HHcat_alt + Main_Branch + Acute_seizure + TreatmentType_binary + ICH + IVH_dummy) zfit - cox.zph(fit, transform=log) plot(zfit[8]) that the estimated function is linear? I have not yet seen such a simple time effect and would find it interesting. 2. The code you wrote does not fit the time dependent model that you suppose; it treats event_time_mod as a fixed covariate. To fit the model see the relevant vignette for the survival package. Essentially the program has to build a large (start, stop) data set behind the scenes. (SAS does the same thing). Defining proper residuals for said data set is hard and the R code does not yet do this. (Last I checked, SAS did the same thing.) 3. The survival curve for a time dependent covariate is something that is not easily defined. Read chapter 10.2.4 of the Therneau and Grambch book for a discussion of this (largely informed by the many mistakes I've myself made.) Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Expected number of events, Andersen-Gill model fit via coxph in package survival
I am interested in producing the expected number of events, in a recurring events setting. I am using the Andersen-Gill model, as fit by the function coxph in the package survival. I need to produce expected numbers of events for a cohort, cumulatively, at several fixed times. My ultimate goal is: To fit an AG model to a reference sample, then use that fitted model to generate expected numbers of events for a new cohort; then, comparing the expected vs. the observed numbers of events would give us some idea of whether the new cohort differs from the reference one. From my reading of the documentation and the text by Therneau and Grambsch, it seems that the function survexp is what I need. But using it I am not able to obtain expected numbers of events that match reasonably well the observed numbers *even for the same reference population.* So, I think I am misunderstanding something quite badly. You've hit a common confusion. Observed versus expected events computations are done on a cumulative hazard scale H, not the surivival scale S; S = exp(-H). Relating this back to simple Poisson models H(t) would be the expected number of events by time t and S(t) the probability of no events before time t. G. Berry (Biometrics 1983) has a classic ane readable article on this (especially if you ignore the proofs). Using your example: cphfit - coxph(Surv(start,stop,event)~rx+number+size+cluster(id),data=bladder2) zz - predict(cphfit, type='expected') c(sum(zz), sum(bladder2$event)) [1] 112 112 tdata - bladder2[1:10] #new data set (lazy way) predict(cphfit, type='expected', newdata=tdata) [1] 0.0324089 0.3226540 0.4213402 1.0560768 0.6702130 0.2163531 0.6490665 [8] 0.8864808 0.2932915 0.5190647 You can also do this using survexp and the cohort=FALSE argument, which would return S(t) for each subject and we would then use -log(result) to get H. This is how it was done when I wrote the book, but the newer predict function is easier. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] variances of random effects in coxme
You are right, those look suspicious. What version of R and of the coxme package are you running? Later version of coxme use multiple starting estimates due to precisely this kind of problem. Also, when the true MLE is variance=0 the program purposely never quite gets there, in order to avoid log(0). Compare the log-lik to a fixed effects model with those covariates. I can't do more than guess without a reproducable example. Terry Therneau On 10/08/2012 05:00 AM, r-help-requ...@r-project.org wrote: Dear R users, I'm using the function coxme of the package coxme in order to build Cox models with complex random effects. Unfortunately, I sometimes get surprising estimations of the variances of the random effects. I ran models with different fixed covariates but always with the same 3 random effects defined by the argument varlist=coxmeMlist(list(mat1,mat2,mat3), rescale = F, pdcheck = F, positive=F). I get a few times exactly the same estimations of the parameters of the random effects whereas the fixed effects of the models are different: Random effects Group Variable Std DevVariance idp Vmat.1 0.1000 0.0100 Vmat.2 0.02236068 0.0005 Vmat.3 0.02449490 0.0006 The variances are round figures, so I have the feeling that the algorithm didn't succeed in fitting the model. Has anyone ever faced to this problem? Thanks, Hugo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problems with printing and plotting aareg
It's a bug in summary.aareg which no one found until now. What's wrong: If dfbeta=TRUE then there is a second estimate of variance calculated, labeled as test.var2. If maxtime is set, then both estimates of variance need to be recalculated by the summary routine. An incorrect if-then-else flow led it to look for test.var2 when it wasn't relevant. My test cases with maxtime also happened to have dfbeta=TRUE. Short term solution: set dfbeta=TRUE. A bit more computation time though. Long term: I'll fix it, and a new version of survival will one day appear. (With 200+ packages that depend on survival, new releases now require a lot of checking. No overnight fixes). Terry T On 10/05/2012 05:00 AM, r-help-requ...@r-project.org wrote: Hi all, I've ventured into the world of nonparametric survival and I would like to use the maxtime option for printing and plotting my aareg fit. However, my fit does not have test.var2 and this stops the print and plot when adding a maxtime. My code is as follows: Response-Surv(Time,Event) Model-aareg(Response~Factor1*Factor2) Model2-aareg(Response~Factor1+Factor2) #Just did this to see if the interaction term had anything to do with it Model, print(Model), summary(Model), and plot(Model) seem to work fine, but as soon as I try summary/print/plot(Model, maxtime=400) it tells me that test.var2 is not found and when I look at summary(Model), there is indeed a NULL under test.var2. Anyone know why it doesn't include test.var2? Is this a compatibility problem? I'm using R version 2.13 (I know it's quite old, but updating is a pain when you don't have admin rights to your computer) and just updated the survival package (no warning messages). Any input would be much appreciated. Cheers, Freya __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] erasing a polygon
I'm updating some (very) old code, and one particular option of its plot method depends on a once-was-true trick polygon(x, y, border=TRUE, col=0) polygon(x, y, border=TRUE, density=0) would draw the polygon AND erase whatever was underneath it back to background color. Is there a reliable way to do this with the current R (standard graphics)? Terry Therneau PS For the inquiring, the routine is text.rpart with the fancy=T option, and the original target was the postscript driver on Splus 3.4. (I said it was old.) The plot.rpart routine draws the branches, and text.rpart then wants to lay down some ellipses, erasing what is underneath them. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] erasing a polygon
I was able to puzzle it out with the help of the book R Graphics (Murrell). When par(bg) = transparenent one needs to use col=white, otherwise the old code col=0 works correctly. The default for pdf and x11, the two I use, is transparent. Terry Therneau On 09/27/2012 08:48 AM, PIKAL Petr wrote: Hi It seems that it still works. x-c(3,7,7,3) y-c(4,4,6,6) par(bg=pink) plot(1:10,1:10) polygon(x, y, border=TRUE, col=0) Sent: Thursday, September 27, 2012 2:37 PM To: r-help@r-project.org Subject: [R] erasing a polygon I'm updating some (very) old code, and one particular option of its plot method depends on a once-was-true trick polygon(x, y, border=TRUE, col=0) polygon(x, y, border=TRUE, density=0) would draw the polygon AND erase whatever was underneath it back to background color. Is there a reliable way to do this with the current R (standard graphics)? Terry Therneau PS For the inquiring, the routine is text.rpart with the fancy=T option, and the original target was the postscript driver on Splus 3.4. (I said it was old.) The plot.rpart routine draws the branches, and text.rpart then wants to lay down some ellipses, erasing what is underneath them. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extrapolating Cox predicted risk
On 09/26/2012 05:00 AM, r-help-requ...@r-project.org wrote: I generated predicted risk of death for each subject in the study by means of Cox proportional hazards model at 8 year of follow-up, a time point at which follow-up was more than 90% complete. It is possible to extrapolate to 10-year the predicted risk of each subjet by assuming an exponential distribution? Any help would be greatly appreciated. Thanks for your consideration. The ability to extrapolate over long distances is exactly why industrial reliability work uses parametric models (weibull or log-normal, see survreg) instead of the Cox model. If any of your data goes out to 10 years, then the predictions for coxph will go out that far, just like they would for a Kaplan-Meier. But, just like the KM, if there are only a handful of people out that far the extrapolated curve will be very noisy. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Correlation between random effects in the package coxme
On 09/14/2012 05:00 AM, r-help-requ...@r-project.org wrote: Hello, Why the correlation between the random effects is negative? library(coxme) rats1- coxme(Surv(time, status) ~ (1|litter), rats) random.effects(rats1)[[1]] #one value for each of the 50 litters print(rats1) rats2- lmekin(time ~ (1|litter), rats) fixed.effects(rats2) random.effects(rats2)[[1]] #one value for each of the 50 litters print(rats2) cor(random.effects(rats1)[[1]],random.effects(rats2)[[1]]) Thanks Because coxph models the death rate, so large coefficients indicate bad news, and lme is modeling the surival time where large coefficients are good news. Terry T __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] p value from lmekin()
On 09/06/2012 05:00 AM, r-help-requ...@r-project.org wrote: Hi, R experts I am currently using lmekin() function in coxme package to fit a mixed effect model for family based genetic data. How can I extract the p value from a lmekin object? When I print the object in R console, I can see the p value and Z value are just over there. But I can not extract them by the coef() function. kinfit$coefficient$fixed (kinfit is the name of the lmekin object) just include the intercept and the value of fixed effects. Where are p and Z values? Thank you! Similar to lme and lmer, lmekin and coxme have fixef(fit) returns the coefficients of the fixed effects vcov(fit) returns the variance-covariance matrix of the fixed effects so fixef(fit)/ sqrt(diag(vcov(fit)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coxph not converging with continuous variable
On 09/03/2012 05:00 AM, r-help-requ...@r-project.org wrote: The coxph function in R is not working for me when I use a continuous predictor in the model. Specifically, it fails to converge, even when bumping up the number of max iterations or setting reasonable initial values. The estimated Hazard ratio from the model is incorrect (verified by an AFT model). I've isolated it to the x1 variable in the example below, which is log-normally distributed. The x1 here has extreme values, but I've been able to reproduce the problem intermittently with less extreme values. It seemed odd that I couldn't find this question asked anywhere, so I'm wondering if I'm just not seeing a mistake I've made. Alex Keil UNC Chapel Hill Congratulations-- it's been a long time since someone managaed to break the iteration code in coxph. I used your example, but simplifed to using n=1000 and a 1 variable model. The quantiles of your x1 variable are round(quantile(xx, c(0, 5:10/10)),2) 0% 50% 60% 70% 80% 90% 100% 0.06 2.67 3.75 5.74 8.93 15.04 98.38 For a coefficient of 1 (close to the real solution) you have one subject with a risk of death that 999 times the average risk (he should die before his next heartbeat) and another with relative risk of 1.99e-40 (an immortal). The key components of a Cox model iteration are, it turns out, weighted means and variances. For this data 99.99929 % of the weight falls on a single observation, i.e., at beta=1 you have an effective sample size of 1. The optimal coefficient is the one that best predicts that single subject's death time. Due to the computational round off error that results, the routine takes a step of size 1.7 from a starting estimate of 1.0 when it should take a stop of size of about .05, then falls into step halving to overcome the mistake. Rinse and repeat. I could possibly make coxph resistant to this data set, but at the cost of a major rewrite and significantly slower performance. Can you convince me that this data set has any relevance? Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] basehaz() in package survival and warnings with coxph
Since fit3.1 and fit2 are based on different data sets, why would I expect the same number of events? Also, when you have a large number of variables, are observations being deleted due to missing values? And to echo David W's comments -- it is hard for me to imagine a data set where this many variables can be looked at simultaneoulsy, and obtain a meaningful result. Terry Therneau On 08/09/2012 07:52 PM, Nasib Ahmed wrote: My sessionInfo is as follows: R version 2.15.1 (2012-06-22) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] mi_0.09-16 arm_1.5-05 foreign_0.8-50 abind_1.4-0 [5] R2WinBUGS_2.1-18 coda_0.15-2 lme4_0.99-0 Matrix_1.0-6 [9] lattice_0.20-6 car_2.0-12 nnet_7.3-4 MASS_7.3-20 [13] MuMIn_1.7.11 survival_2.36-14 loaded via a namespace (and not attached): [1] grid_2.15.1 nlme_3.1-104 stats4_2.15.1 It will be difficult to reproduce an example here as the data set I am using in very large. I can give you an example: fit3.1- coxph(formula = y ~ sex + ns(ageyrs, df = 2) + AdmissionSource + + X1 + X2 + X3 + X5 + X6 + X7 + X11 + X12 + X13 + X14 + X15 + + X16 + X17 + X18 + X19 + X20 + X22 + X24 + X25 + X26 + X27 + + X28 + X29 + X32 + X33 + X35 + X38 + X39 + X40 + X41 + X42 + + X43 + X44 + X47 + X49 + X53 + X54 + X55 + X58 + X59 + X62 + + X68 + X69 + X78 + X80 + X81 + X84 + X85 + X86 + X93 + X95 + + X98 + X100 + X101 + X102 + X105 + X107 + X108 + X109 + X110 + + X112 + X113 + X114 + X115 + X116 + X117 + X121 + X122 + X125 + + X127 + X128 + X129 + X131 + X132 + X133 + X134 + X138 + X140 + + X143 + X145 + X146 + X148 + X150 + X151 + X153 + X157 + X158 + + X159 + X164 + X197 + X200 + X202 + X203 + X204 + X205 + X211 + + X214 + X217 + X224 + X228 + X233 + X237 + X244 + X249 + X254 + + X258 + X259 + X260 + CharlsonIndex + ethnic + day + season + + ln, data = dat2) haz-basehaz(fit3.1) # gives 507 unique haz$time, time points fit2-coxph(y~ns(ageyrs,df=2)+day+ln+sex+AdmissionSource+season+CharlsonIndex,data=dat1) haz-basehaz(fit2) # gives 611 unique haz$time, time points I get the following warnings() with fit3.1: Warning message: In fitter(X, Y, strats, offset, init, control, weights = weights, : Loglik converged before variable ; beta may be infinite. Also the coefficients of the variables that the error occurs for are very high. The Wald test suggests dropping these terms where as the LRT suggests keeping them. What should I do in terms of model selection? On Thu, Aug 9, 2012 at 2:00 PM, Terry Therneau thern...@mayo.edu mailto:thern...@mayo.edu wrote: I've never seen this, and have no idea how to reproduce it. For resloution you are going to have to give me a working example of the failure. Also, per the posting guide, what is your sessionInfo()? Terry Therneau On 08/09/2012 04:11 AM, r-help-requ...@r-project.org mailto:r-help-requ...@r-project.org wrote: I have a couple of questions with regards to fitting a coxph model to a data set in R: I have a very large dataset and wanted to get the baseline hazard using the basehaz() function in the package : 'survival'. If I use all the covariates then the output from basehaz(fit), where fit is a model fit using coxph(), gives 507 unique values for the time and the corresponding cumulative hazard function. However if I use a subset of the varaibles, basehaz() gives 611 values for the time and cumulative hazard. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] basehaz() in package survival and warnings with coxph
I've never seen this, and have no idea how to reproduce it. For resloution you are going to have to give me a working example of the failure. Also, per the posting guide, what is your sessionInfo()? Terry Therneau On 08/09/2012 04:11 AM, r-help-requ...@r-project.org wrote: I have a couple of questions with regards to fitting a coxph model to a data set in R: I have a very large dataset and wanted to get the baseline hazard using the basehaz() function in the package : 'survival'. If I use all the covariates then the output from basehaz(fit), where fit is a model fit using coxph(), gives 507 unique values for the time and the corresponding cumulative hazard function. However if I use a subset of the varaibles, basehaz() gives 611 values for the time and cumulative hazard. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Building a web risk calculator based on Cox, PH--definitive method for calculating probability?
Here is an example of how to do it. library(survival) vfit - coxph(Surv(time, status) ~ celltype + trt, data=veteran) userinput - data.frame(celltype=smallcell, trt = 1) usercurve - survfit(vfit, newdata=userinput) #the entire predicted survival curve user2 - summary(usercurve, time= 2*365.25)# 2 year time point user2$surv [1] 0.0007438084 Some comments: 1. The summary function for survival curves was written so that people could print out shorter summaries, but it works nicely for picking off a particular time point. Since the curve is a step function and there isn't likely to be a step at exactly x years, this is a bit more work to do yourself. You might want to include the confidence limits in your web report as well. 2. Put the whole formula into your coxph call. I have never ever understood why people use the construct tempvar - with(data, Surv(time, status)) coxph(tempvar ~ age + sex + It leaves you with harder to read code, poorer documentation (the printout from coxph no longer shows the actual response variable), leads to hard-to-diagnose failures for certain uses of predict, ... the list goes on. I have not yet thought of a single good reason for doing it other than because you can. 3. Make the user data the same as the original. In the veteran cancer data set trt is a numeric 0/1 variable, you had it as a factor in the new data set. 4. Your should get your keyboard fixed -- it appears that the spacebar is disabled when writing code :-) 5. If you plot the survival curve for the veterans cancer data set it only reaches to about 2 1/2 years, so the summary for 5 years will return NULL. Terry Therneau On 07/18/2012 05:00 AM, r-help-requ...@r-project.org wrote: I am a medical student and as a capstone for my summer research project I am going to create a simple online web calculator for users to input their relevant data, and a probability of relapse within 5 years will be computed and returned based on the Cox PH model I have developed. The issue I'm having is finding a definitive method/function to feed the user's newdata and return the probability of relapse within 5 years. I have googled this and the answers seems to be inconsistent; I have variously seen people recommend survest(), survfit(), and predict.coxph(). Terry had a useful answer http://r.789695.n4.nabble.com/how-to-calculate-predicted-probability-of-Cox-model-td4515265.html here but I didn't quite understand what he meant in his last sentence. Here is some code for you to quickly illustrate what you suggest. library(rms) library(survival) library(Hmisc) data(veteran) dd=datadist(veteran) options(datadist='dd') options(digits=4) obj=with(veteran,Surv(time,status)) vetcoxph=coxph(obj~celltype+trt,data=veteran)#I will fit models from both the survival and rms packages so you can #use what you like vetcph=cph(obj~celltype+trt,data=veteran,surv=TRUE,time.inc=5*365,x=T,y=T) #let's say the user inputted that their cell type was smallcell and their treatment was 1. userinput=data.frame(celltype='smallcell',trt=factor(1)) I really appreciate your recommendations Best, Jahan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Power analysis for Cox regression with a time-varying covariate
Marc gave the referencer for Schoenfeld's article. It's actually quite simple. Sample size for a Cox model has two parts: 1. Easy part: how many deaths to I need d = (za + zb)^2 / [var(x) * coef^2] za = cutoff for your alpah, usually 1.96 (.05 two-sided) zb = cutoff for power, often 0.84 = qnorm(.8) = 80% power var(x) = variance of the covariate you are testing. For a yes/no variable like treatment this would be p(1-p) where p = fraction on the first arm coef = the target coefficient in your Cox model. For an increase in survival of 50% we need exp(coef)=1.5 or coef=.405 All leading to the value I've memorized by now of (1.96 + 0.84)^2 /(.25* .405^2) = 191 deaths for a balanced two arm study to detect a 50% increase in survival. 2. Hard part: How many patients will I need to recruit, over what interval of time, and with how much total follow-up to achieve this number of events? I never use the canned procedures for sample size because this second part is so study specific. And frankly, it's always a guesstimate. Death rates for a condidtion will usually drop by 1/3 as soon as you start enrolling subjects. Terry T. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 113, Issue 13
http://www.ncbi.nlm.nih.gov/pubmed/21418051 for the full reference. I don't have an electronic copy, but I do have that issue of Biometrics in my office. I'll have a copy sent over. Terry On 07/10/2012 04:08 PM, r-help-requ...@r-project.org wrote: Send R-help mailing list submissions to r-help@r-project.org To subscribe or unsubscribe via the World Wide Web, visit https://stat.ethz.ch/mailman/listinfo/r-help or, via email, send a message with subject or body 'help' to r-help-requ...@r-project.org You can reach the person managing the list at r-help-ow...@r-project.org When replying, please edit your Subject line so it is more specific than Re: Contents of R-help digest... Today's Topics: 1. how can I show the xlab and ylab information while using layout (Jie Tang) 2. Re: how can I show the xlab and ylab information while using layout (Sarah Goslee) 3. Re: Extracting arithmetic mean for specific values from multiple .txt-files (Rui Barradas) 4. Re: How to use external image with R plot? (Michael Sumner) 5. Re: is it possible to insert a figure into into another new figure by r script (Thomas Adams) 6. image.plot transparent? (Chris82) 7. Re: How to add marker in Stacked bar plot? (Jorge I Velez) 8. Re: Predicted values for zero-inflated Poisson (Laura Lee) 9. Re: how can I show the xlab and ylab information while using layout (Sarah Goslee) 10. Re: image.plot transparent? (Sarah Goslee) 11. RGB components of plot() colours ( (Ted Harding)) 12. Re: number of decimal places in a number? (Martin Ivanov) 13. define stuff to be only usable in the same file (Jessica Streicher) 14. fill 0-row data.frame with 1 line of NAs (Liviu Andronic) 15. Re: RGB components of plot() colours (Duncan Murdoch) 16. Re: define stuff to be only usable in the same file (Duncan Murdoch) 17. Re: RGB components of plot() colours (Sarah Goslee) 18. Re: Predicted values for zero-inflated Poisson (Achim Zeileis) 19. Re: fill 0-row data.frame with 1 line of NAs (Rui Barradas) 20. Re: Plotting rpart trees with long list of class members (Jean V Adams) 21. Re: image.plot transparent? (Prof Brian Ripley) 22. customize packages' help index ( 00index.html file ) (Damien Georges) 23. identify.hclust() doesn't cut tree at the vertical position of the mouse pointer (WATSON Mick) 24. Use of Sappy and Tappy for Mathematical Calculation (Rantony) 25. fitting power growth (Thomas Hoffmann) 26. Mac OS X R uninstallation question (Alastair) 27. Count of elements in coulmns of a matrix (Rantony) 28. Re: Questions about doing analysis based on time (APOCooter) 29. gdata: Problem reading excel document containing non-US characters (=?iso-8859-1?Q?Rolf_Marvin_B=F8e_Lindgren?=) 30. Re: Count of elements in coulmns of a matrix (Sarah Goslee) 31. Re: define stuff to be only usable in the same file (Jessica Streicher) 32. Re: Mac OS X R uninstallation question (Prof Brian Ripley) 33. Re: fill 0-row data.frame with 1 line of NAs (Peter Ehlers) 34. Re: define stuff to be only usable in the same file (Jessica Streicher) 35. estimation of NA by predict command (eliza botto) 36. Re: fill 0-row data.frame with 1 line of NAs (Brian Diggs) 37. Help with vectors and rollapply (Raghuraman Ramachandran) 38. Re: multiple comparisons with generalised least squares (Ariel) 39. RGL 3D curvilinear shapes (PatGauthier) 40. Re: estimation of NA by predict command (arun) 41. Re: Count of elements in coulmns of a matrix (arun) 42. Re: Help with vectors and rollapply (William Dunlap) 43. Re: Predicted values for zero-inflated Poisson (Laura Lee) 44. Re: Use of Sappy and Tappy for Mathematical Calculation (Nordlund, Dan (DSHS/RDA)) 45. Re: Questions about doing analysis based on time (Rui Barradas) 46. Re: Extracting arithmetic mean for specific values from multiple .txt-files (vimmster) 47. Revolutions blog: June Roundup (David Smith) 48. calculating the difference between days? (C W) 49. Re: R to winbugs interface (Uwe Ligges) 50. Re: Predicted values for zero-inflated Poisson (Highland Statistics Ltd) 51. Re: Extracting arithmetic mean for specific values from multiple .txt-files (Rui Barradas) 52. Re: fill 0-row data.frame with 1 line of NAs (Peter Ehlers) 53. Re: fill 0-row data.frame with 1 line of NAs (Rui Barradas) 54. Re: Predicted values for zero-inflated Poisson (Laura Lee) 55. Re: Specify model with polynomial interaction terms up to degree n (YTP) 56. Re: Use of Sappy and Tappy for Mathematical Calculation (arun) 57. problem for installing rgdal (stanislas rebaudet) 58. HELP me please with import of csv to R (F86) 59. R code help to change table format (peziza) 60. Re: Skipping lines and incomplete rows (arun)
Re: [R] differences between survival models between STATA and R
Without more information, we can only guess what you did, or what you are seeing on the page that is different. I'll make a random guess though. There are about 5 ways to paramaterize the Weibull distribution. The standard packages that I know, however, tend to use the one found in the Kalbfleisch and Prentice book The Statistical Analysis of Failure time Data. This includes the survreg funciton in R and lifereg in SAS, and likely stata tthought I don't know that package. The aftreg function in the eha package uses something different. About 1/2 the weibull questions I see are due to a change in parameters. Terry T. begin included message - Dear Community, I have been using two types of survival programs to analyse a data set. The first one is an R function called aftreg. The second one an STATA function called streg. Both of them include the same analyisis with a weibull distribution. Yet, results are very different. Shouldn't the results be the same? Kind regards, J __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to specify newdata in a Cox-Modell with a time dependent interaction term?
I'm finally back from vacation and looking at your email. 1. The primary mistake is in your call, where you say fit - survfit(mod.allison.5, newdata.1, id=Id) This will use the character string Id as the value of the identifier, not the data. The effect is exactly the same as the difference between print(x) and print('x'). 2. In reply to John's comment that all the id values are the same. It is correct. Normally the survfit routine is used to produce multiple curves, one curve per line of the input data, for time-independent variables. The presence of an id argument is used to tell it that there are multiple lines per subject in the data, e.g. time-dependent covariates. So even though there is only one curve being produced we need an id statement to trigger the behavior. If you only want one curve for one individual, then individual=TRUE is an alternate, as John pointed out. 3. It's very important to specify the Surv object and the formula directly in the coxph function ... Yes, I agree. I always use your suggested form because it gives better documentation -- variable names are directly visible in the coxph call. I don't understand the attraction of the other form, but lot's of people use it. Why did it go wrong? Because the survfit function was evaluating Surv(Rossi.2$start, Rossi.2$stop, Rossi.2$arrest.time) ~ fin + age + age:stop + pro, data=newdata.1 The length of the variables will be different. The error message comes from the R internals, not my program. Terry Therneau On 06/16/2012 08:04 AM, Jürgen Biedermann wrote: Dear Mr. Therneau, Mr. Fox, or to whoever, who has some time... I don't find a solution to use the survfit function (package: survival) for a defined pattern of covariates with a Cox-Model including a time dependent interaction term. Somehow the definition of my newdata argument seems to be erroneous. I already googled the problem, found many persons having the same or a similar problem, but still no solution. I want to stress that my time-dependent covariate does not depend on the failure of an individual (in this case it doesn't seem sensible to predict a survivor function for an individual). Rather one of my effects declines with time (time-dependent coefficient). For illustration, I use the example of John Fox's paper Cox Proportional - Hazards Regression for Survival Data. http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf Do you know any help? See code below Thanks very much in advance Jürgen Biedermann # #Code Rossi - read.table(http://cran.r-project.org/doc/contrib/Fox-Companion/Rossi.txt;, header=T) Rossi.2 - fold(Rossi, time='week', event='arrest', cov=11:62, cov.names='employed') # see below for the fold function from John Fox # modeling an interaction with time (Page 14) mod.allison.5 - coxph(Surv(start, stop, arrest.time) ~ fin + age + age:stop + prio, data=Rossi.2) mod.allison.5 # Attempt to get the survivor function of a person with age=30, fin=0 and prio=5 newdata.1 - data.frame(unique(Rossi.2[c(start,stop)]),fin=0,age=30,prio=5,Id=1,arrest.time=0) fit - survfit(mod.allison.5,newdata.1,id=Id) Error message: Fehler in model.frame.default(data = newdata.1, id = Id, formula = Surv(start, : Variablenlängen sind unterschiedlich (gefunden für '(id)') -- failure, length of variables are different. #- fold - function(data, time, event, cov, cov.names=paste('covariate', '.', 1:ncovs, sep=), suffix='.time', cov.times=0:ncov, common.times=TRUE, lag=0){ vlag - function(x, lag) c(rep(NA, lag), x[1:(length(x)-lag)]) xlag - function(x, lag) apply(as.matrix(x), 2, vlag, lag=lag) all.cov - unlist(cov) if (!is.list(cov)) cov - list(cov) ncovs - length(cov) nrow - nrow(data) ncol - ncol(data) ncov - length(cov[[1]]) nobs - nrow*ncov if (length(unique(c(sapply(cov, length), length(cov.times)-1))) 1) stop(paste( all elements of cov must be of the same length and \n, cov.times must have one more entry than each element of cov.)) var.names - names(data) subjects - rownames(data) omit.cols - if (!common.times) c(all.cov, cov.times) else all.cov keep.cols - (1:ncol)[-omit.cols] nkeep - length(keep.cols) if (is.numeric(event)) event - var.names[event] times - if (common.times) matrix(cov.times, nrow, ncov+1, byrow=T) else data[, cov.times] new.data - matrix(Inf, nobs, 3 + ncovs + nkeep) rownames - rep(, nobs) colnames(new.data) - c('start', 'stop', paste(event, suffix, sep=), var.names[-omit.cols], cov.names) end.row - 0 for (i in 1:nrow){ start.row - end.row + 1 end.row - end.row + ncov start - times
Re: [R] How to specify newdata in a Cox-Modell with a time dependent interaction term?
I've been out for a week, with one more to go. I'll look at this in earnest when I return. Terry T On 06/17/2012 04:07 AM, Jürgen Biedermann wrote: Dear John, Thank you very much for your help! It was very important for getting along further. I found out some additional things which I want to give back to you as feedback (maybe other persons will have the same problems). It's also possible to leave away the individual=T argument. Instead you can use: newdata.1- data.frame(unique(Rossi.2[c(start,stop)]),fin=0,age=30,prio=5,Id=1,arrest.time=0) fit - survfit(mod.allison.5, newdata.1, id=Id) *but not* fit - survfit(mod.allison.5, newdata.1, id=Id) so you have to leave away the in specifying the id variable -- maybe the help file could be more precise here. Additionally I found out some more things while dealing with my original data. It's very important to specify the Surv object and the formula directly in the coxph function, *so don't do this:* SurvO - Surv(Rossi.2$start, Rossi.2$stop, Rossi.2$arrest.time) form - as.formula(SurvO ~ fin + age + age:stop + prio) mod.allison.5- coxph(form, data=Rossi.2) mod.allison.5 newdata.1- data.frame(unique(Rossi.2[c(start,stop)]),fin=0,age=30,prio=5,arrest.time=0) fit - survfit(mod.allison.5, newdata.1, individual=T) Fehler in model.frame.default(data = newdata.1, formula = SurvO ~ fin + : Variablenlängen sind unterschiedlich (gefunden für 'fin') -- Error, length of variables are different I think the error message is not optimal here, anyway. The problem obviously consists in the survfit-function not recognizing the names of the variables anymore. It seems completely logic, but anyway, it took me many hours. Thanks again, Jürgen - Korrespondenz -- Betreff: Re: How to specify newdata in a Cox-Modell with a time dependent interaction term? Von: John Fox j...@mcmaster.ca An: J?rgen Biedermann juergen.biederm...@googlemail.com Datum: 16.06.2012 17:22 Dear Jurgen, fit- survfit(mod.allison.5, newdata.1, individual=TRUE) fit Call: survfit(formula = mod.allison.5, newdata = newdata.1, individual = TRUE) records n.max n.start events median 0.95LCL 0.95UCL 19809 432 432 114 NA NA NA plot(fit) # shows one survival curve See ?survfit.coxph. Best, John On Sat, 16 Jun 2012 16:23:41 +0200 Jürgen Biedermannjuergen.biederm...@googlemail.com wrote: Dear John, Thank you very much for the quick answer. I will use the mentioned citation of your book in the future. However: The Id argument is necessary for the function to know that the different rows belong to one individual. Or how the Help-File says: id: optional variable name of subject identifiers. If this is present, then each group of rows with the same subject id represents the covariate path through time of a single subject, and the result will contain one curve per subject. If the coxph fit had strata then that must also be specified in newdata. If missing, then each individual row of newdata is presumed to represent a distinct subject and there will be nrow(newdata) times the number of strata curves in the result (one for each strata/subject combination). I tried your proposal anyway, but, as expected, fifty different survival curves appeared. So, there is still no solution. Best regards Jürgen -- --- Jürgen Biedermann Bergmannstraße 3 10961 Berlin-Kreuzberg Mobil: +49 176 247 54 354 Home: +49 30 940 53 044 e-mail:juergen.biederm...@gmail.com - Korrespondenz -- Betreff: Re: How to specify newdata in a Cox-Modell with a time dependent interaction term? Von: John Foxj...@mcmaster.ca An: J?rgen Biedermannjuergen.biederm...@googlemail.com Datum: 16.06.2012 15:55 Dear Jürgen, All the values of your Id variable are equal to 1; how can that make sense? Removing the argument Id=id may get you what you want. By the way, the document to which you refer was an appendix to a 2002 book that has been superseded by Fox and Weisberg, An R Companion to Applied Regression, Second Edition (Sage, 2011). Appendices for that book are available athttp://socserv.socsci.mcmaster.ca/jfox/Books/Companion/appendix.html. I hope this helps, John John Fox Sen. William McMaster Prof. of Social Statistics Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ On Sat, 16 Jun 2012 15:04:48 +0200 Jürgen Biedermannjuergen.biederm...@googlemail.com wrote: Dear Mr. Therneau, Mr. Fox, or to whoever, who has some time... I don't find a solution to use the survfit function (package: survival) for a defined pattern of covariates with a Cox-Model including a time dependent interaction term. Somehow the definition of my newdata argument seems to be erroneous. I
Re: [R] (bug?) in survfit in R-2-15.0
The predict methods are designed to work with a dataframe. In the case of survfit(cox model) the code has, for backwards compatability purposes, the ability to use a vector as the newdata argument. This feature is not documented in the help file, on purpose, and is expected to go away one day. This backwards compatatibilty works in all my test cases, but not in the one you present. This may accelerate my plans for removal. The best solution for you is to use a dataframe for both the coxph fit and the predict call. Avoid using objects that are in the global data or are attached. Below I reprise your example in safer code. The coxph/survfit calls both work from a temporary data frame called dummy. Your use of the coxph/survfit pair to get a survival curve for glmnet is clever, by the way. I like it. library(glmnet) library(survival) load(system.file(doc,VignetteExample.rdata,package=glmnet)) fit - with(patient.data, glmnet(x[-1,], Surv(time[-1], status[-1]), family=cox, alpha=.05, maxit=1)) max.dev.index - which.max(fit$dev.ratio) optimal.beta - fit$beta[, max.dev.index] nonzero.coef - (optimal.beta != 0) dummy - with(patient.data, data.frame(time=time, status=status, x=x[,nonzero.coef])) coxfit - coxph(Surv(time, status) ~ ., data=dummy, subset= -1, iter=0, init=optimal.beta[nonzero.coef]) sfit - survfit(coxfit, newdata=dummy[1,]) Terry Therneau begin included message --- Dear all, I am using glmnet + survival and due to the latest release of glmnet 1.7.4 I was forced to use the latest version of R 2.15.0. My previous version of R was 2.10.1. I changed glmnet version and R version and when I started to get weird results I was not sure where the bug was. After putting glmnet back to previous version, I have found that the bug or weird behaviour happens in survival survfit. My script is below in email and it is uses glmnet 1.7.3. I get two completely different answers in different versions of R and in my opinion the older version of survival package returns correct value. in R 2-10-1 I get ? cat(dim(sfit$surv)) ? ? cat(length(sfit$surv)) 32 ? ? in R-2-15-0 I get cat(dim(sfit$surv)) 62 99 ? cat(length(sfit$surv)) 6138 ? ?Kind regardsDK library(survival) library(glmnet) load(system.file(doc,VignetteExample.rdata,package=glmnet)) attach(patient.data) trainX? -x[-1,] trainTime?? -time[-1] trainStatus - status[-1] fit - glmnet(trainX,Surv(trainTime,trainStatus),family=cox,alpha=0.5,maxit=1) max.dev.index - which.max(fit$dev.ratio) optimal.lambda - fit$lambda[max.dev.index] optimal.beta? - fit$beta[,max.dev.index] nonzero.coef - abs(optimal.beta)0 selectedBeta - optimal.beta[nonzero.coef] selectedTrainX?? - trainX[,nonzero.coef] coxph.model- coxph(Surv(trainTime,trainStatus)~ selectedTrainX,init=selectedBeta,iter=0) selectedTestX - x[1,nonzero.coef] sfit- survfit(coxph.model,newdata=selectedTestX) cat(\ndim(sfit$surv)) cat(dim(sfit$surv)) cat(\nlength(sfit$surv)) cat(length(sfit$surv)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lmekin bug
Version 2.2-3 of the coxme package has been posted to CRAN. It should propogate to the mirrors over the next 1-2 days. Version 2.2-2 had a serious bug in the lmekin function. If the variance matrix of the random effects was not diagonal the answers were wrong: at one spot an upper triangular matrix U was being used when tr(U) was called for. This affects genetic models in particular. A new test case with known solution has now been added as to the package as well. There was no issue with any of the other functions, e.g., coxme. This pointed out to me by Paola Sebastiani just after 2-2 was posted and just before I left on a trip. I really appreciate the bug report, but the timing was clearly Murphy's law at work. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Estimating survival times with glmnet and coxph
What you are doing is correct. However because it is not informed about the penalty parameters used in creating the fit, the coxph command is not able to correctly reconstruct the variance matrix; this is the source of all of the ³singular² messages. The confidence bands for the survival curves will be incorrect (likely too wide), but the curves themselves are fine. Terry Therneau begin included message Dear all, I am using glmnet (Coxnet) for building a Cox Model and to make actual prediction, i.e. to estimate the survival function S(t,Xn) for a new subject Xn. If I am not mistaken, glmnet (coxnet) returns beta, beta*X and exp(beta*X), which on its own cannot generate S(t,Xn). We miss baseline survival function So(t). Below is my code which takes beta coefficients from glmnet and creates coxph object (iter=0, init=beta) in order to later calculate survival estimates with survfit. I would be grateful if someone could confirm that my solution with coxph object (iter=0, init=beta) is correct and that the warning I get 'X matrix deemed to be singular' when creating a coxph object is of no concern. Below is my code which uses example from Coxnet: Regularized Cox Regression. Thanks in advance. DK ... [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] need help with avg.surv (Direct Adjusted Survival Curve), Message-ID:
Well, I would suggest using the code already in place in the survival package. Here is my code for your problem. I'm using a copy of the larynx data as found from the web resources for the Klein and Moeschberger book. larynx - read.table(larynx.dat, skip=12, col.names=c(stage, time, age, year, death)) larynx$stage - factor(larynx$stage) larynx$age.grp - cut(larynx$age, c(0, 55, 70, 90), labels=c(40-55, 55-70, 70-90)) fit1 - coxph(Surv(time, death) ~ age.grp + stage + I(year =74), larynx) #fit a Cox model km - survfit(Surv(time, death) ~ stage, data=larynx)# KM plot(km) direct - survexp( ~stage, data=larynx, ratetable=fit1) # direct adjusted survival lines(direct, col=2) --- A few further comments 1. It would help those of us who advise if you would simplify your code. I didn't need to see 20+ lines of data set manipulation and curve drawing. Give the essence of the problem. 2. The direct estimate above was first created for expected survival curves based on population rate tables and publised by Edererer in 1961. The same idea was rediscovered in the context of the Cox model and renamed the direct adjusted estimate; I like to give credit to the oridingal. 3. I did not try to debug your function. Terry Therneau --- begin included message --- Hello R users, I am trying to obtain a direct adjusted survival curve. I am sending my whole code (see below). It's basically the larynx cancer data with Stage 1-4.I am using the cox model using coxph option, see the fit3 coxph. When I use the avg.surv option on fit3, I get the following error: fits-avg.surv(fit3, var.name=stage.fac, var.values=c(1,2,3,4), data=larynx) Error in `contrasts-`(`*tmp*`, value = contrasts.arg[[nn]]) : contrasts apply only to factors If try using var.values = c (1,2,3,4) option, I get the following error: Error in xmat %*% cfit$coef : non-conformable arguments In addition: Warning message: In model.matrix.default(Terms, mf, contrasts = contrast.arg) : variable 'stage.fac' converted to a factor Please advise me on what I am doing wrong! Regards, Manish Dalwani University of Colorado library(survival) library(ggplot2) setwd(/Users/manishdalwani/surv_6646/) #file.show(Larynx.txt) #Stage of disease (1=stage 1, 2=stage2, 3=stage 3, 4=stage 4) #Time to death or on-study time, months #Age at diagnosis of larynx cancer #Year of diagnosis of larynx cancer #Death indicator (0=alive, 1=dead) larynx -read.table(file =larynx.dat,col.names =c(stage,time,age,year,death),colClasses =c(factor,numeric,numeric,numeric,numeric)) #the death indicator needs to be numeric for the survfit function to work # splitting age into three intervals: 55, 50-70, 70 age.larynx - rep(0, length(larynx[,1])) age.larynx [larynx$age 55] - 1 age.larynx [larynx$age = 55 larynx$age = 70] - 2 age.larynx [larynx$age 70] - 3 larynx - cbind(larynx, age.larynx) larynx$age.larynx - factor(larynx$age.larynx, labels=c(1, 2, 3)) larynx$stage.fac-as.factor(larynx$stage) year.larynx - rep(0, length(larynx[,1])) year.larynx [larynx$year = 74] - 1 year.larynx [larynx$year 74] - 2 larynx - cbind(larynx, year.larynx) head(larynx) summary(larynx) kapmeier.fit - survfit( Surv(time, death) ~ stage, data = larynx, type = kaplan-meier) sumkap - summary(kapmeier.fit) attributes(kapmeier.fit) attributes(sumkap) plot(kapmeier.fit, lty=2:3, fun=cumhaz, xlab=Months,ylab=Cumulative Hazard of death) legend(4,.4,c(stage1,stage2,stage3,stage4),lty=1:2) plot(kapmeier.fit, lty=2:3, xlab=Months,ylab=Survival Function) #construct a data frame for the plots plotdata - data.frame(time = sumkap$time, surv = sumkap$surv, strata = sumkap$strata) fact.stage-factor(larynx$stage) fit1-coxph(Surv(time, death) ~ fact.stage + age.larynx + factor(year=74), data=larynx, method=c(efron)) summary(fit1) avg.surv - function(cfit, var.name, var.values, data, weights) { if(missing(data)){ if(!is.null(cfit$model)) mframe - cfit$model elsemframe -model.frame(cfit,sys.parent()) } else mframe - model.frame(cfit, data) var.num - match(var.name, names(mframe)) data.patterns - apply(data.matrix(mframe[, - c(1, var.num)]), 1, paste, collapse = ,) data.patterns - factor(data.patterns,levels=unique(data.patterns)) if(missing(weights)) weights - table(data.patterns) else weights - tapply(weights, data.patterns, sum) kp - !duplicated(data.patterns) mframe - mframe[kp,] obs.var - mframe[,var.num] lps - (cfit$linear.predictor)[kp] tframe - mframe[rep(1,length(var.values)),] tframe[,var.num] - var.values xmat - model.matrix(cfit,data=tframe)[,-1] tlps - as.vector(xmat%*%cfit$coef) names(tlps) - var.values obs.off - tlps[as.character(obs.var)] explp.off - exp(outer(lps - obs.off ,tlps,+)) bfit - survfit.coxph(cfit, se.fit = F) fits - outer(bfit$surv,explp.off,function(x,y) x^y) avg.fits - apply(sweep(fits,2,weights
Re: [R] Survreg
On 04/22/2012 05:00 AM, r-help-requ...@r-project.org wrote: I am trying to run Weibull PH model in R. Assume in the data set I have x1 a continuous variable and x2 a categorical variable with two classes (0= sick and 1= healthy). I fit the model in the following way. Test=survreg(Surv(time,cens)~ x1+x2,dist=weibull) My questions are 1. Is it Weibull PH model or Weibull AFT model? Call: survreg(formula = Surv(time, delta) ~ x1 + x2, data = nn, dist = weibull) Value Std. Error z p (Intercept) 5.6521553.54e-02159.8 0.00e+00 x10.4925921.92e-0225.7 3.65e-145 x2 -0.0002125.64e-06 -37.60.00e+00 Log(scale) -0.269219 1.57e-02 -17.1 1.24e-65 Scale= 0.764 The Weibull model can be veiwed as either. The cumulative hazard for a Weibull is t^p, viewed as an AFT model we have (at)^p [multiply time], viewed as PH we have a(t^p) [multiply the hazard]. The survreg routing uses the AFT parameterization found in Chapter 2 of Kalbfleisch and Prentice, The statistical analysis of failure time data. For the routine our multiplier a above is exp(X beta), for the usual reason that negative multipliers should be avoided -- it would correspond to time running backwards. In the above x1 makes time run faster, x2 time run slower. Terry T __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R-help Digest, Vol 110, Issue 23
Yes, the (start, stop] formalism is the easiest way to deal with time dependent data. Each individual only needs to have sufficient data to describe them, so for if id number 4 is in house 1, their housemate #1 was eaten at time 2, and the were eaten at time 10, the following is sufficient data for that subject: id house time1 time2 status discovered 4 10 20 false 4 12 10 1 true We don't need observations for each intermediate time, only that from 0-2 they were not yet discovered and that from 2-10 they were. The status variable tells whether an interval ended in disaster. Use Surv((time1, time2, status) on the left side of the equation. Since the time scale is discrete you should technically use method='exact' in a Cox model, but the default Efron approximation will be very close. Interval censoring isn't necessary. You will have a model of time to discovery instead of time to eaten, but with a fixed examination schedule such as you have there is no information in the data to help you move from one to the other. The standard interval approach would just assume deaths happened at the midpoint between examinations. Terry T. On 04/21/2012 05:00 AM, r-help-requ...@r-project.org wrote: Dear R users, I fear this is terribly trivial but I'm struggling to get my head around it. First of all, I'm using the survival package in R 2.12.2 on Windows Vista with the RExcel plugin. You probably only need to know that I'm using survival for this. I have data collected from 180 or so individuals that were checked 7 times throughout a trial with set start and end times. Once the event happens (death by predator) there are no more checks for that individual. This means that I check on each individual up to 7 times with either an event recorded or the final time being censored. At the moment, I have a data sheet with one observation per individual; that is either the event time (the observation time when the individual had had an event) or the censored time. However, I'd like to add a time dependent factor and I also wonder if this data should be treated as interval censored. The time dependent factor is like this. The individuals are grouped in houses and once one individual in a group has an event, it makes biological sense that the rest of them should be at greater risk, as the predator is likely to have discovered the others in the house as well (the predator is able to consume many individuals). At the moment I'm coding this as a normal two level factor (discovered) where all individuals alive after the first event in that house are TRUE and the first individuals in a house to be eaten are FALSE. All individuals in houses that were not discovered at al are also FALSEl. Obviously, all individuals that were eaten, were first discovered, then eaten. However, the first individuals in a house to be eaten, had not been previously discovered by the predator (not observably so, anyway). Should I write up this data set with a start and stop time for every check I made so each individual has up to 7 records, one for each time I checked? Is there a quick and easy way to do this in R or would I have to go through the data set manually? Does coding the discovered factor the way I have, make statistical sense? Should I worry about proportional hazards of the discovered factor? It seems to me that it would often turn out not proportional because of its nature. Sorry, lots of stats questions. I don't mind if you don't answer all of these. Just knowing how to best feed this data into R would help me no end. The rest I can probably glean from the millions of survival analysis books I have lying about. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Test-Predict R survival analysis
On 04/18/2012 05:00 AM, r-help-requ...@r-project.org wrote: Hi, I'm trying to use the R Survival analysis on a windows 7 system. The input data format is described at the end of this mail. 1/ I tried to perform a survival analysis including stratified variables using the following formula. cox.xtab_miR=coxph(Surv(time, status) ~ miR + strata(sex,nbligne, age), data=matrix) and obtain the following error message Warning message: In fitter(X, Y, strats, offset, init, control, weights = weights, : Ran out of iterations and did not converge Is this due to the model (error in formula) or is the number of stratified variables fixed? The Cox model compares the deaths to the non-deaths, separately within each stratum, then adds up the result. Your data set and model combination puts each subject into their own strata, so there is no one to compare them to. The fit has no data to use and so must fail. (I admit the error message is misleading, but I hadn't ever seen someone make this particular mistake before.) The following model works much better coxph(Surv(time, status) ~ miR + age + nbligne + strata(sex)) coef exp(coef) se(coef) z p miR 2.75e-05 1.00 9.35e-06 2.941 0.0033 age 3.39e-03 1.00 1.01e-02 0.334 0.7400 nbligne 7.14e-02 1.07 1.32e-01 0.542 0.5900 Likelihood ratio test=5.87 on 3 df, p=0.118 n= 70, number of events= 59 (1 observation deleted due to missingness) Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Censoring data
On 04/14/2012 05:00 AM, r-help-requ...@r-project.org wrote: Hello, ?I want to estimate weibull parameters with 30% censored data. I have below the code for the censoring ?but how it must be put into the likelihood equation to obtain the desire estimate is where i have a problem with, ?can some body help? ?My likelihood equation is for a random type-I censoring where time for the censored units is different for each unit. ? n=50;r=35 p=0.8;b=1.5 t-rweibull(50,shape=p,scale=b) meantrue-gamma(1+(1/p))*b meantrue d-meantrue/0.30 cen- runif(50,min=0,max=d) cen s-ifelse(t=cen,1,0) s Use the survreg function in the 'survival' package to get the estimates, It uses a location-scale parameterization, read the examples in the manual page, help('survreg'), to see the transformation to and from the form used by rweibull. Terry T. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kaplan Meier analysis: 95% CI wider in R than in SAS
On 04/14/2012 05:00 AM, r-help-requ...@r-project.org wrote: Am replicating in R an analysis I did earlier using SAS. See this as a test of whether I'm ready to start using R in my day-to-day work. ? Just finished replicating a Kaplan Meier analysis. Everything seems to work out fine except for one thing. The 95% CI around my estimate for the median is substantially larger in R than in SAS. For example, in SAS I have a median of 3.29 with a 95% CI of [1.15, 5.29]. In R, I get a median of 3.29 with a 95% CI of [1.35,?13.35]. ? Can anyone tell me why I get this difference? The confidence interval for the median is based on the confidence intervals for the curves. There are several methods for computing confidence intervals for the curves: plain, log, log-log, or logit scale. There are opinions on which is best, and it is a close race: except for the first of these. The type plain intervals are awful, it's like putting me in one lane of a championship 100 meter dash. Until about version 9 the only option in SAS was plain, then for a time it was still the default. By 9.2 they finally went to loglog. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survreg output - interpretation
--- begin included message --- Hello R users, I am analizing survival data (mostly uncensored) and want to extract the most out of it. Since I have more than one factor, I?ve read that the survival regression can help to test the interactions between factors, and then decide how to do the comparisons using the Log-rank test (survdiff). 1- if I chose the Weibull distribution, does the output inform the goodness of fit to it? perhaps in this part of the output... Weibull distribution Loglik(model)= -1302.8 Loglik(intercept only)= -1311 Chisq= 16.49 on 11 degrees of freedom, p= 0.12 Number of Newton-Raphson Iterations: 7 n= 873 2- one of my factors is gender (2 levels). With survreg, it appears as significant, but if I compare them with log-rank it turns not significant. Are they comparing different things? or is it a test power issue? --- end inclusion --- 1. To understand goodness of fit you need to look at the residuals in multiple ways. (The same answer applies to ordinary linear regression.) 2. You have not given us enough information to answer the questions. If the data is p=.049 vs p=.051, the the answers are in agreement even though the artificial label of significant changes. The logrank test and survreg are not the same model. If the data is p=.02 vs p=.8, then you have an error in the code. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get the confidence interval of area under the time dependent roc curve
On 04/07/2012 05:00 AM, r-help-requ...@r-project.org wrote: It is possible to calculate the c-index for time dependent outcomes (such as disease) using the survivalROC package in R. My question is : is it possible to produce a p-value for the c-index that is calculated (at a specific point in time)?How to get the confidence interval of area under then time dependent roc curve (or by bootstrap)? The current version of survival computes the c-index and it's standard error for time-dependent covariates. Use summary(cox model fit) or the survConcordance function. Terry T. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to calculate the Deviance for test data based on a Cox model
kevinchiang...@hotmail.com wrote: Dear List, If I got a Cox model based on training set, then how should I calculate the Cox log partial likelihood for the test data? Actually I am trying to calculate the deviance on test dataset to evaluate the performance of prediction model, the equation is as follows: D = -2{L(test)[beta_train] - L(test)[0]}. It means using the beta coefficients got from training set to calculate the likelihood of test data. I know I can got log likelihood for training model, but how to get it for test data? -- If you are attending the ENAR meeting next week, go find Cindy Crowson's poster which has a very nice discussion of how to evaluate a prior Cox model on new data. Given the timeline for statistics journals, her draft paper (sitting on my desk) might yet be years away from print even if reviewers like it. Terry T. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to calculate predicted probability of Cox model
-- begin included message --- Because Cox proportional hazards model didn't give the baseline hazard function, how to calculate the predictive probability for each test sample at a special time point,such as 5-year or 10-year ? In survival package, predict.coxph() function gives three different type of predicted value, for the type of expected, Does it mean the expected number of events for a special sample during the total follow-up time? What's the relationship between this value and predicted probability? -- end inclusion --- The default printout for coxph does not include the baseline hazard (too lengthy), but it is lurking behind the scenes. Yes, the type expected is the expected number of events (= value of the cumulative hazard) for a given set of covariates at a particular time. Prob(event by time t) = exp(- cumulative hazard at time t). To get a prediction of type = expected you data set has to have both covariate AND survival time values for each subject. Terry T. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.