[R] [R-pkgs] survival3.1

2019-11-07 Thread Terry Therneau
Version 3.x of the survival package is now available from CRAN.  This is a
major update whose primary goal is to make the analysis of multi-state
survival models as easy to do as simple Kaplan-Meier and Cox proportional
hazard fits. The primary changes are

-- in Surv(time1, time2, status) the status variable can now be a
factor that specifies which state a subject has transitioned into.  From a
user's point of view this is the single key change.

-- survfit() will then create multi-state Pr(state | time) curves,
i.e., the Aalen-Johansen estimate.  (Kaplan-Meier and cumulative incidence
are special cases of the AJ).  Printout and plotting are unchanged.

-- coxph() will then create multi-state fits.  There is an easy
formula-like syntax for specifying shared coefficients or baseline hazards.

-- all use standard data, i.e. the (time1, time2, status) format that
is common for time-dependent covariates; there is no need to create special
intermediate data sets.  (An id variable is necessary to specify which rows
go with whom.)

-- robust variance estimates available throughout

Many other CRAN packages depend on survival, which drove a second goal of
"don't break it".  Part of the testing involved running the test suits of
all 679 reverse dependencies under the new version.

Terry Therneau
thern...@mayo.edu

[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Type III tests and Cox models

2014-01-20 Thread Terry Therneau
 Someone asked a question about this on the list a couple months ago. I replied that I 
didn't know any sound basis for it, and the counter-reply was SAS has it.  Grant 
applications were due and I got testy, so there was no further progress at that point.
  As the author of the survival package I really did want to understand the issue 
however.  I've now done my homework, and the result is contained in a 30+ page vignette 
that has been added to the most recent release of the package.  The short summary: I was 
suspicious before, now I know for certain that it is misguided, and the phreg 
implementation of the idea is worse.
  If you are interested take a look.  If significant debate arises about the statistical 
issues therein I don't think R-help is quite the right forum for discussion, and will 
entertain any idea wrt a more appropriate venue.  Rehashing long and old standing 
controversies about type 3 for linear models is, however, not exciting to me.


Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Type III tests and Cox models

2014-01-20 Thread Terry Therneau

This last brought forth a good chuckle. Perhaps combine them into a single 
entry?

On a serious note, to have someone of significant experience say that they hadn't heard of 
SAS's method for doing x, for nearly any x, marks a welcome climate change.  (And took 
nearly that long.)
The phreg procedure also added a Scheffe test of the overall hypothesis, by the way, 
which no one here has yet back-engineered to figure out what they are doing.  An F-test?


Terry T.

On 01/20/2014 04:18 PM, peter dalgaard wrote:


On 20 Jan 2014, at 23:05 , Göran Broström goran.brost...@umu.se wrote:


On 01/20/2014 07:02 PM, peter dalgaard wrote:


On 20 Jan 2014, at 18:47 , Terry Therneau thern...@mayo.edu wrote:


The short summary: I was suspicious before, now I know for certain
that it is misguided, and the phreg implementation of the idea is
worse.


A fortune candidate, if ever I saw one.


OK, but please state clearly that 'phreg' refers to a SAS procedure, not the 
function with the same name in the  R  package 'eha' (I should have chosen 
another name, had I ever heard of SAS).


That'll be the 2nd fortune candidate of this evening...



Göran



-pd





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coxph with time-dependent covariates

2014-01-14 Thread Terry Therneau

R throws out this event due to the missing time-dependent covariate at day
6. Is there a way I can keep this event without filling in a covariate
value at day 6?


No.

The Cox model assesses the risk associated with each covariate by comparing, at each event 
time, the values of the subject who had the event to all those who were also present at 
that time.  You can't compare to missing.


Terry T.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tt() function (was Package dependencies in building R packages)

2014-01-09 Thread Terry Therneau



On 12/31/2013 05:00 AM, r-help-requ...@r-project.org wrote:

Thanks for your kind response Duncan. To be more specific, I'm using the
function mvrnorm from MASS. The issue is that MASS depends on survival and
I have a function in my package named tt() which conflicts with a function
in survival of the same name. I can think of 2 alternatives solutions to my
problem, but I'm to an expert:

1) Copy mvrnorm into my package, which I thought was not a good idea
2) Rename my tt() function to something else in my package, but this is
painful as I have it all over the place in other functions.

Any suggestions would be much appreciated.

Best,
Axel.



Version 2.37-5 of survival has just been submitted to CRAN. In this new release the tt() 
function was made externally invisible, so you should not have problems.  This will take a 
few days to propogate out to the various mirrors (perhaps longer if the Windows build has 
issues).


This question of how to best deal with functions that only have meaing within a coxph 
model statement came up on R-devel just over a year ago, and I'm just now implementing the 
workaround found there.  In time the cluster(), pspline(), ridge() and frailty() functions 
are likely to receive the same treatment.


Terry T

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cumulative incidence for mstate in Survival package in R

2013-12-31 Thread Terry Therneau

Question 1: How to get just 2 cumulative incidence curves when there are 
multiple covariates.
  I don't understand what you want.  Assume that we have liver transplant and death 
while waiting for a transplant as my two events.  There are overall curves (2), or one 
can create curves separately for each sex, or for different institutions.  What do you 
mean by a curve for age?

  If you want competing risks after Cox model adjustment, see the mstate 
package.

Question 2: mine data.  There is no such data.  This was a hypthetical example in the 
document, and I chose a poor name for the data set; your_data_set would have been 
better.  I was using mine in the sense of this data set is mine, it belongs to me, and 
now see that it could confuse someone.  The file sourcecode.pdf is intended to document 
the computational algorithms, but not how a user would approach the function.  A vignette 
is planned, someday...


Terry Therneau


On 12/30/2013 04:04 PM, Jieyue Li wrote:

Dear All,

I want to have the cumulative incidence curves for 'mstate' data using Survival 
package in
R. But I got some problems:
I. Problem 1:
1. If I only use intercept without any covariates, I can have 'right' 
cumulative incidence
curves (2 for 2 competing risks):
library(Survival)
fitCI - survfit(Surv(stop, status*as.numeric(event), type=mstate) ~ 
1,data=mgus1,
subset=(start==0))
plot(fitCI)
2. If I include one variate ('sex'), I get 4 curves (attached; I guess because 
there are
two levels in 'sex' and 2 competing risks):
fitCI - survfit(Surv(stop, status*as.numeric(event), type=mstate) 
~sex,data=mgus1,
subset=(start==0))
plot(fitCI)
However, I want to just have 2 cumulative incidence curves estimated from 
several
covariates (such as 'sex', 'age', 'alb', etc. in mgus1). Could you please help 
me to do
that? Thank you very much!
II. Problem 2:
I try using an example from sourcecode.pdf:
fit - survfit(Surv(time, status, type=’mstate’) ~ sex, data=mine)
but where can I have the 'mine' data? Thank you!

Best,

Jieyue



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] estimating survival function from Cox model

2013-12-18 Thread Terry Therneau

The standard error of the curve cannot be extracted from the summary 
information you have.
The variance is made up of two terms, one of which is a sum over all the death times, of a 
quadratic term per death time.  That term involves the variance matrix of the Cox model 
coefficients, the target value for x (the curve you want to calculate) and the average 
value of x at that time in the data set from which the Cox model was created.
  Just like linear regression, the se are higher when you predict far from the center 
of the original data set.


Terry Therneau


On 12/18/2013 05:00 AM, r-help-requ...@r-project.org wrote:

Hi, I have some questions on how to estimate the survival function from a Cox 
model. I know how to do this in R using survfit().


But let's say the model was done is another software, and I was only given the estimate of baseline cumulative hazard 
A0(t=10) at the specified time t=10 (baseline cumulative hazard refers to when covariate 
X=0)and the beta estimate b for the covariate used in Cox model X.


So the survival function at time 10 for a given covariate value x can be 
calculated as:

A(t=10|X=x) = exp(b*x)*A0(t=10) where A is cumulative hazard when X=x
S(t=10|X=x) = exp(-A(t=10|X=x)) where S is survival function to be calculated

Now I want to calculate confidence interval for S(t=10|X=x). I think I need to 
calculate the CI for cumulative hazard A(t=10|X=x) first and then exponentiate 
it to get CI for S, based on the relationship S = exp(-A).

To get CI for A, I need to calculate the estimate of standard error of A. I 
know the other software can give me the standard error of A0, the baseline 
cumulative hazard. Based on the relationship A = exp(b*x)*A0, I guess I'll need 
the standard error for b. But how do I calculate the standard error for A based 
on standard errors for A0 and b?

Any insights would be greatly appreciated!

John


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Coxph convergence

2013-12-18 Thread Terry Therneau

I'll re-enter the fray.
The data set is an example where coxph is incorrect; due to round off error it 
is treating
a 5 column rank 3 matrix as if it were rank 4.  This of course results in 0 digits of 
precision.

  Immediate fix, for the user, is to add toler.chol= 1e-10 to their coxph 
call. It is
very likely that they will never ever have to change this value.

I will look into changing the default from its current value of 
.Machine$double.eps^.75.
However, I first have to check that this does not break anything else.  All 
epsilon
constants are a delicate dance between mutiple data sets, and anyone with long 
experience
in numerical anlysis will tell you that it is impossible to find constants that 
will work
for every data set.  This is true for linear models, logistic, Cox, ... you 
name it.

In summary:
I appreciate the example.
I'll add to my list of nasty problems.
I may be able to fix long term, and maybe not.  Changing the constant may break something 
else.

I've given a good short term fix.

Terry T.



On 12/18/2013 05:00 AM, r-help-requ...@r-project.org wrote:

Your comprehension of the issue seem to be entirely wrong. Between r11513 and r11516, 
some tuning of internal parmeters were done, so the process of finding the rank of a 
singular matrix no longer converges (within the time/tolerance implicitly specified). 
There are warnings issued, but then there are misc warnings before and after (and one 
gets desensitised about them). Also the nature of the problem, which is to 
test for possibility of interactions - or lacking thereof -

outcome ~ factor A + factor B + factor A x factor B

or just extra terms in outcome ~ factor A + factor B + ... as an exploration 
of auxiliary effects, more often than not extra terms won't make
any difference and the matrix involved just isn't the nicest to manipulate; it 
is in the nature of that kind of exploratory work.

Professor Therneau replied that it is possible to get the older convergent 
behaviour by manual tuning of some of the convergence criteria parameters; I 
have responded that while that is possible, often one is simultaneously 
exploring many models with many possible auxiliary effects (and lacking 
thereof), manual tuning for each is neither feasible nor appropriate; and we 
sort of left it at that.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] freetype 2.5.2, problem with the survival package, build R 2.15.x with gcc 4.8.x

2013-12-13 Thread Terry Therneau

I was sent a copy of the data, and this is what I get on a different machine:

fit - clogit(cc ~ addContr(A) + addContr(C) + addContr(A.C) + strata(set),

 data=pscc)
Warning messages:
1: In fitter(X, Y, strats, offset, init, control, weights = weights,  :
  Loglik converged before variable  1,2,3,4 ; beta may be infinite.
2: In coxph(formula = Surv(rep(1, 13110L), cc) ~ addContr(A) + addContr(C) +  :
  X matrix deemed to be singular; variable 5


fit

 coef exp(coef) se(coef) z  p
addContr(A)2 -0.14250 0.867   110812 -1.29e-06  1
addContr(C)2  0.00525 1.005   110812  4.74e-08  1
addContr(A.C)1-2 -0.30097 0.740   110812 -2.72e-06  1
addContr(A.C)2-1 -0.47712 0.621   110812 -4.31e-06  1
addContr(A.C)2-2   NANA0NA NA


xmat - model.matrix(fit)
svd(xmat)$d

[1] 1.932097e+02 2.700101e+01 1.624731e+01 6.049630e-15 2.031334e-15

The primary issue is that the covariates matrix is singular, having rank 3 
instead of rank 5.
The coxph routine prints two warning messages that things are not good about the matrix. 
Warning messages should not be ignored!  The insane se(coef) values in the printed result 
are an even bigger clue that the model fit is suspect. Unfortunately, some small change in 
the iteration path or numerics has put this data set over the edge from being seen as rank 
3 (old run) to rank 4 (new run).  Moral: coxph does pretty well at detecting redundat 
variables, but if you know of some it never hurts to help the routine out by removing them 
before the fit.


Singularity of the X matrix in a Cox model is very difficult to detect reliably; the 
current threshold is the result of long experience and experiment to give as few false 
messages as possible.  (The RMS package in particular used truncated power basis functions 
for the splines, which lead to X matrices that look almost singular numerically, but are 
not.)  Setting a little less stringent threshold for declaring singularity in the cholesky 
decompostion sufficies for this data set.


fit2 - clogit(cc ~ addContr(A) + addContr(C) + addContr(A.C) + strata(set),
 data=pscc, toler.chol=1e-10)

I'll certainly add this to my list of test problems that I use to tune those 
constants.

Terry Therneau

On 12/11/2013 09:30 PM, Hin-Tak Leung wrote:

Here is a rather long discussion etc about freetype 2.5.2, problem with the 
survival
package, and build R 2.15.x with gcc 4.8.x. Please feel free to skip forward.

- freetype 2.5.2:

the fix to cope with one of the Mac OS X's system fonts just before the release 
of
freetype 2.5.1 caused a regression, crashing over one of Microsoft windows' 
system fonts.
So there is a 2.5.2. There are new 2.5.2 bundles for windows  Mac OS X. The 
official
win/mac binaries of R were built statically with 2+-years-old freetype with a 
few known
problems. Most should upgrade/rebuild.

http://sourceforge.net/projects/outmodedbonsai/files/R/

- problem with the survival package:

Trying to re-run a vignette to get the same result as two years ago
reveal a strange change. I went and bisected it down to
r11513 and r11516 of the survival package.

-- r11513 
clogit(cc ~ addContr(A) + addContr(C) + addContr(A.C) + strata(set))


coef exp(coef) se(coef) z  p
addContr(A)2 -0.620 0.5380.217 -2.86 0.0043
addContr(C)2  0.482 1.6200.217  2.22 0.0270
addContr(A.C)1-2 -0.778 0.4590.275 -2.83 0.0047
addContr(A.C)2-1 NANA0.000NA NA
addContr(A.C)2-2 NANA0.000NA NA

Likelihood ratio test=26  on 3 df, p=9.49e-06  n= 13110, number of events= 3524
--

- r11516 -
clogit(cc ~ addContr(A) + addContr(C) + addContr(A.C) + strata(set))


  coef exp(coef) se(coef) z  p
addContr(A)2 -0.14250 0.867   110812 -1.29e-06  1
addContr(C)2  0.00525 1.005   110812  4.74e-08  1
addContr(A.C)1-2 -0.30097 0.740   110812 -2.72e-06  1
addContr(A.C)2-1 -0.47712 0.621   110812 -4.31e-06  1
addContr(A.C)2-2   NANA0NA NA

Likelihood ratio test=26  on 4 df, p=3.15e-05  n= 13110, number of events= 3524
--



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Coxme time dependent

2013-12-13 Thread Terry Therneau



On 12/13/2013 05:00 AM, r-help-requ...@r-project.org wrote:

i all,



I am using the coxme function to fit random effects survival models.  The
base survival models show a typical J shaped curve where survival drops
sharply then plateaus over time interval.  I want to include a time
covariate which could account for this non-linear decrease in survival over
time.  However, I have not found a clear way to do this in the coxme
models.  Any suggestions would be greatly appreciated.  Below is some
sample code and data.



-Jared



I don't understand what you need.  One of the advantages of the Cox model is that the 
baseline hazard function automatically accommodates such features, for both coxph and coxme.


Terry Therneau

PS: I can't make out the pattern of your sample data.  Perhaps you were depending on the 
html formatting?  R-help uses simple text only.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to stop Kaplan-Meier curve at a time point

2013-11-21 Thread Terry Therneau

Here is the simplest answer that I know.

   outcome6 - ifelse(Survival_days  2190, 0, Outcome)
   survfit(Surv(Survival_days, outcome6) ~ Gender)

This assumes that the variable Outcome is coded as 0/1.  If it were instead FALSE/TRUE 
then change the 0 to FALSE in the first line.
The logrank test adds a term to the computation at each death time, so changing all the 
deaths after time 2160 to censored effectively ignores any information after that point. 
 There is no need to modify the time variable.


David W's suggestion to add subset = (Survival_days = 2190) was incorrect, effectively 
removing all of the most successful outcomes from the study.  It will thus lead to an 
underestimate of tooth survival.  (This was a surprise - David usually bats 1000 on 
survival questions, and I greatly appreciate his input to this list.)


Terry Therneau
(author of the package)


On 11/21/2013 05:00 AM, r-help-requ...@r-project.org wrote:

Hello R users

I have a question with Kaplan-Meier Curve with respect to my research. We
have done a retrospective study on fillings in the tooth and their survival
in relation to the many influencing factors. We had a long follow-up time
(upto 8yrs for some variables). However, we decided to stop the analysis at
the 6year follow up time, so that we can have uniform follow-up time for
all the variables.

I did play a bit with the formula and tried to stop the Kaplan-Meier curve
at my desired time (2190days)or roughly 6 years. However, my question is I
want to find the significance (log rank test) at this time point between
the two curves; because I am not able to find a way to stop the survfit at
this time point with my knowledge. Below is the code I used.

Gender2-survfit(Surv(Survival_days, Outcome)~Gender)

plot (Gender2, xmax=2190, mark.time=FALSE, col=c(1:2), xlab=Survival time
(Days), ylab=Survival probability, main=Gender) # mark.time=FALSE will
remove the censoring lines in the graph. If censoring lines are needed,
then remove the mark.time section in the formula.

legend(topright,c(Females, Males), col=(1:2), lwd=0.5)

Am sure, the code in the first line has to be modified. Please help me with
this and I will be very thankful to you.

Thanks in advance


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] issues with calling predict.coxph.penal (survival) inside a function

2013-11-14 Thread Terry Therneau
Thanks for the reproducable example.  I can confirm that it fails on my machine using 
survival 2-37.5, the next soon-to-be-released version,


The issue is with NextMethod, and my assumption that the called routine inherited 
everything from the parent, including the environment chain.  A simple test this AM showed 
me that the assumption is false.  It might have been true for Splus.  Working this out may 
take some time -- every other one of my wrestling matches with predict inside a function 
has -- and there is a reasonable chance that it won't make this already overdue release.


In the meantime, here is a workaround that I have sometimes used in other situations. 
Inside your function do the following: fit a new coxph model with fixed coefficients, and 
do prediction on that.


myfun - function(oldfit, subset) {
   newX - model.matrix(oldfit)[subset,]
   newY - oldfit$y[subset]
   newfit - coxph(newY ~ newX, iter=0, init=coef(oldfit))
   newfit$var - oldfit$var
   predict(newfit)
   }

If the subset is all of a particular strata, as you indicated, then all of the predictions 
will be correct.  If not, then those that make use of the the baseline hazard (type= 
expect) will be incorrect but all others are ok.


Terry Therneau


On 11/14/2013 05:00 AM, r-help-requ...@r-project.org wrote:

Hello everyone,



I got an issue with calling predict.coxph.penal inside a function.



Regarding the context: My original problem is that I wrote a function that
uses predict.coxph and survfit(model) to predict

a lot of survival-curves using only the basis-curves for the strata (as
delivered by survfit(model) ) and then adapts them with

the predicted risk-scores. Because there are cases where my new data has
strata which didn't exist in the original model I exclude

them, using a Boolean vector inside the function.

I end up with a call like this: predict (coxph_model,
newdata[subscript_vector,] )



This works fine for coxph.model, but when I fit a model with a spline
(class coxph.penal), I get an error:

Error in `[.data.frame`(newdata, [subscript_vector, ) : object
'[subscript_vector ' not found



I suppose this is because of NextMethod, but I am not sure how to work
around it. I also read a little bit about all those
matching-and-frame-issues,

But must confess I am not really into it.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Survival analysis with truncated data

2013-11-14 Thread Terry Therneau

I think that your data is censored, not truncated.
  For a fault introduced 1/2005 and erased 2/2006, duration = 13 months
  For a fault introduced 4/2010 and still in existence at the last observation 12/2010, 
duration 8 months.
  For a fault introduced before 2004, erased  3/2005, in a machine installed 2/1998, the 
duration is somewhere between 15 and 87 months.
  For a fault introduced before 2004, smachine installed 5/2000, still present 11/2010 at 
last check, the duration is  126 months.


For type=interval2 the data would be (13,13), (8,NA), (15,87), (126, NA).

Terry T.


On 11/14/2013 05:00 AM, r-help-requ...@r-project.org wrote:

Hi,

I would like to know how to handle truncated data.
My intend is to have the survival curve of a software fault in order
to have some information
about fault lifespan.

I have some observations of a software system between 2004 and 2010.
The system was first released in 1994.
The event considered is the disappearance of a software fault. The
faults can have been
introduced at any time, between 1994 and 2010. But for fault
introduced before 2004,
there is not mean to know their age.

I used the Surv and survfit functions with type interval2.
For the faults that are first observed in 2004, I set the lower bound
to the lifespan
observed between 2004 and 2010.

How could I set the upper bound ? Using 1994 as a starting point to not seems
to be meaningful. Neither is using only the lower bound.

Should I consider another survival estimator ?

Thanks in advance.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to obtain nonparametric baseline hazard estimates in the gamma frailty model?

2013-11-05 Thread Terry Therneau
I have responded to this particular misconception so often I begin to grow grumpy about it 
(not the particular fault of YH).  The cumulative hazard function from

fit - coxph( some model)
sfit - survfit(fit, newdata=  set of covariate values)

gives the survival curve and cumulative hazard for that particular set of 
covariate values.

There is nothing special about a baseline hazard.  Any cumulative hazard is for some 
particular set of covariate values, including all values =0 which is what most textbooks 
refer to as the baseline.  The survfit routine will by default return one centered at the 
means of the predictor variables, simply because there is less round off error if one 
stays near the center.  Any baseline is as good as any other.


  cum haz at covariate values z = cumulative haz for values x * exp(beta * 
(z-x))

Note that for a random effect, the survfit routine uses 0 as the centering 
value.

Terry Therneau


On 11/05/2013 05:00 AM, r-help-requ...@r-project.org wrote:

Hi Dr. Therneau,

Yes, -log(sfit$surv) gives me the cumulative hazard but not the baseline
cumulative hazard. I know that Nielsen and Klein have SAS Macros to get
such estimates by using EM approach. I'm wondering if I can obtain the
baseline hazard estimates from coxph() for gamma frailty model since I
think coxph() is a very powerful function. I feel there may be some way
that I don't know.

Thanks,
YH


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to obtain nonparametric baseline hazard estimates in the gamma frailty model?

2013-11-04 Thread Terry Therneau



On 11/03/2013 05:00 AM, r-help-requ...@r-project.org wrote:

I can easily get the other parameter estimates by using coxph() but don't
know how to get the baseline hazard estimates from it. Does anyone know how
to obtain nonparametric baseline hazard estimates in the gamma frailty
model?

Thanks,
YH



I don't see what the problem is.

 fit1 - coxph(Surv(time, status) ~ age + ph.ecog + frailty(inst), data=lung)
 sfit - survfit(fit1)
 plot(sfit)


Please give an actual example of the problem.

Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Re: How to obtain nonparametric baseline hazard estimates in the gamma frailty model?

2013-11-04 Thread Terry Therneau



 Original Message 
Subject: Re: How to obtain nonparametric baseline hazard estimates in the gamma 
frailty model?
Date: Mon, 04 Nov 2013 17:27:04 -0600
From: Terry Therneau therneau.te...@mayo.edu
To: Y yuhan...@gmail.com

The cumulative hazard is just -log(sfit$surv).
The hazard is essentially a density estimate, and that is much harder.  You'll 
notice that
everyone does CDF curves for survival data ( Kaplan-Meier = estimate of 1-CDF), 
but no one
does histograms, which estimate a density.  That isn't because we wouldn't like 
density
estimates.

Terry T.


On 11/04/2013 04:47 PM, Y wrote:

Hi Dr. Therneau,

Thanks very much for your kind help! Does survfit() just give me the survival 
curve? What
I wanted is the baseline hazard estimates, i.e., lambda_{0} (t). How can I 
obtain this
estimate from coxph()? Or using basehaz()?

Thanks,
YH










On Mon, Nov 4, 2013 at 5:29 PM, Terry Therneau thern...@mayo.edu
mailto:thern...@mayo.edu wrote:



On 11/03/2013 05:00 AM, r-help-requ...@r-project.org
mailto:r-help-requ...@r-project.org wrote:

I can easily get the other parameter estimates by using coxph() but 
don't
know how to get the baseline hazard estimates from it. Does anyone know 
how
to obtain nonparametric baseline hazard estimates in the gamma frailty
model?

Thanks,
YH


I don't see what the problem is.

fit1 - coxph(Surv(time, status) ~ age + ph.ecog + frailty(inst), data=lung)
sfit - survfit(fit1)
plot(sfit)


Please give an actual example of the problem.

Terry Therneau




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to obtain restricted estimates from coxph()?

2013-10-16 Thread Terry Therneau



On 10/16/2013 05:00 AM, r-help-requ...@r-project.org wrote:

Hello,

I'm trying to use coxph() function to fit a very simple Cox proportional
hazards regression model (only one covariate) but the parameter space is
restricted to an open set (0, 1). Can I still obtain a valid estimate by
using coxph function in this scenario? If yes, how? Any suggestion would be
greatly appreciated. Thanks!!!


Easily:
   1.  Fit the unrestricted model.  If the solution is in 0-1 you are done.
   2.  If it is outside, fix the coefficient.  Say that the solution is 1.73, 
then the
optimal solution under contraint is 1.
   Redo the fit adding the paramters  init=1, iter=0.  This forces the program to 
give the loglik and etc for the fixed coefficient of 1.0.


Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] frailtypack

2013-10-09 Thread Terry Therneau
I can't comment on frailtypack issues, but would like to mention that coxme will handle 
nested models, contrary to the statement below that frailtypack is perhaps the only  
for nested survival data.

To reprise the original post's model

   cgd.nfm - coxme(Surv(Tstart, Tstop, Status) ~ Treatment + (1 | Center/ID), 
data=cgd.ag)


And a note to the poster-- you should reprise the original message to which you are 
responding.



Terry Therneau

On 10/09/2013 05:00 AM, r-help-requ...@r-project.org wrote:

Hello,

I am encountering very similar problems with frailtypack (only, it seems, 3
years later).  I would be incredibly grateful if you would be willing to
share any solutions you happened upon for the problem you mention. I have a
datafile that is even longer than yours (over 15000 events). Events follow a
two-level nesting structure. When I attempt to run the nested survival model
in frailtypack the program either stops running entirely or produces an
error message. I have contacted the frailtypack maintainer but have not yet
received a response.  It seems that frailtypack is perhaps the only
widely-available platform for the analysis of nested survival data, so the
seeming bugginess of this program is troubling.  Please let me know if you
ended up coming up with a solution to your problem or, alternatively,
another program able to run the nested analysis. Thank you so much!



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interperting results of glmnet and coxph plot, Brier score and Harrel's C-Index - am I doing something wrong ???

2013-09-30 Thread Terry Therneau

To elaborate on Frank's response, the analysis plan of
  1. Look at the data and select important variables
  2. Put that truncated list into your favorite statistic procedure
  3. Ask - are the p-values (c-statistic, coefficients, .) reliable?

is a very old plan.  The answer to the last question is always NO.
The traditional step 1 involved graphs and t-tests, but replacing it by something modern 
does not change the overall outcome.


Terry T

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Should I wrap more package examples in \dontrun{} ?

2013-09-04 Thread Terry Therneau
To give a specific example, the simple code for my test suite is given at the bottom of 
this message.  A simpler (simple-minded maybe) approach than creating a new packge for 
testing.  I now run this on the survival package every time that I submit a new version to 
CRAN.  It takes a while, since there are over 200 dependencies.  It creates a file 
progress containing each package name as it is run folllowed by either Ok or Failed 
along with a directory tests containing the results.  Almost every run generates 1-3 hits.
  I have not automated this further because many runs also lead to exceptions, often 
packages that won't load because I don't have some ancillary piece of software installed 
that they depend on.  (I can't seem to get JAVA set up sufficient to satisfy everyone, for 
example, and have very low motivation to work harder at the task.)  And a small number 
have made it to the bad actors I give up don't even bother to test list.


Note that any package I want to fully test was installed on this local machine 
using
install.packages(xxx, dependencies=TRUE, INSTALL_opts=--install-tests)
where xxx is the name of the package.

Terry T.


On 09/04/2013 05:00 AM, r-help-requ...@r-project.org wrote:

n 03/09/2013 1:53 PM, Hadley Wickham wrote:

As a user of your package, I would find it irritating if example(foo) 
didn't
run anything.   It would be more irritating (and would indicate sloppiness
on your part) if the examples failed when I cut and pasted them.  These 
both
suggest leaving the examples running.
  
As the author of your package, it sounds as though you find it quite
irritating when other authors break your code.
  
Isn't the right solution to this to work with the other package authors to
come up with code that is unlikely to break?  If that's not possible, then
maybe don't use those packages that cause you trouble.


  It was my understanding that package authors are responsible for not
  breaking other CRAN packages without warning.  For example, before I
  release a new version of plyr or ggplot2, I run R CMD check on every
  package that depends on my package. I then let the maintainers know if
  something is broken - sometimes it's because I introduced a bug, and
  other times it's because I'm enforcing a stricter check than I did
  previously

It sounds as though you're doing the right thing.   Can you describe how
you determine the set of packages to check, and how you do your checks?
It would be great if we could convince everyone to follow those steps.

Duncan Murdoch

tmt% cat checkdeps.R
require(tools)

# First set a repository to look at
#chooseCRANmirror() # do it graphically
#chooseBioCmirror()
options(repos=c(CRAN=http://streaming.stat.iastate.edu/CRAN/;,
BioC=http://bioconductor.org/packages/2.11/bioc/;))

# This function is provided by Uwe Wigges
reverse -
function(packages, which = c(Depends, Imports, LinkingTo),
 recursive = FALSE)
{
description - sprintf(%s/web/packages/packages.rds,
   getOption(repos)[CRAN])
con - if(substring(description, 1L, 7L) == file://)
file(description, rb)
else
url(description, rb)
on.exit(close(con))
db - readRDS(gzcon(con))
rownames(db) - NULL

rdepends - package_dependencies(packages, db, which,
 recursive = recursive,
 reverse = TRUE)
rdepends - sort(unique(unlist(rdepends)))
pos - match(rdepends, db[, Package], nomatch = 0L)

db[pos, c(Package, Version, Maintainer)]
}

survdep - reverse(survival)[,1]

# I don't want to check coxme (since I maintain a more up to date
# local copy), and there are a few known bad actors
avoid - c(coxme, STAR, compareGroups)
survdep - survdep[is.na(match(survdep, avoid))]

# Some packages may have failed to install, don't test those
inplace - installed.packages()[,Package]  #ones we already have
missed -  is.na(match(survdep, inplace))
if (any(missed)) {
message(Unable to load packages ,
paste(survdep[missed], collapse=, ), \n)
survdep - survdep[!missed]
}

# Do the long list of tests
unlink(progress)
unlink(tests, recursive=TRUE)
system(mkdir tests)
pfile - file(progress, open=write)
for (testpkg in survdep) {
z - testInstalledPackage(testpkg, outDir=tests)
cat(testpkg, c(Ok, Failed)[z+1], \n, file=pfile)
}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help

2013-09-03 Thread Terry Therneau

The tt function is documented for coxph, and you are using cph.   They are not 
the same.

On 09/03/2013 05:00 AM, r-help-requ...@r-project.org wrote:

tt- function(x) {
 obrien- function(x) {
   r- rank(x)
   (r - 0.5)/(0.5 + length(r) - r)
 }
 unlist(tapply(x, riskset, obrien))
 }
hi,  i am newer in R. when dealing  with a survival data, i have found the variable progression was 
not met the PH assumption.the picture show the residual agaist time.So i  use Cox model for 
time-depandent varibles.   i  use the default tt in function coxph,but when i use tt in 
f-cph(Surv(os$Stime,os$Status==1)~Metastasis+Surgery+Post.chem. +Age+tt(Progression)+ ALP, 
data=os, x=T, y=T, surv=TRUE, time.inc=60),it didn't work. i don't kown what the 
argriskset is .i beg your help . can you help me write down a appropriate tt expression to 
let me use in cph. thanks.

   Zhongxin 
Dong



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] POSIXlt and months

2013-09-03 Thread Terry Therneau

The help page for as.POSIXlt suggests using it as a way to extract month, day, 
and year.
However, I can't find any documentation on the results and am a bit surprised by the month 
portion.


An example, run about 6:21 AM on Sept 3.

 unlist(unclass(as.POSIXlt(Sys.time(

  sec   min  hour  mday   mon  year  wday  yday
 43.24545  21.0   6.0   3.0   8.0 113.0   2.0 245.0
isdst
  1.0


So: it appears that I need to
  add 1900 to year
  add 1 to month
but other components are as I would expect.

 unlist(unclass(as.POSIXlt(as.Date(1953/03/10
  sec   min  hour  mday   mon  year  wday  yday isdst
0 0 010 253 268 0

Supports a 0 origin for everything except year and mday.

 A pointer to formal documentation of this would make me feel easier about using the 
function.


Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coxph diagnostics

2013-08-13 Thread Terry Therneau

That's the primary reason for the plot: so that you can look and think.

The test statistic is based on whether a LS line fit to the plot has zero slope.  For 
larger data sets you can sometimes have a significant p-value but good agreement with 
proportional hazards.  It's much like an example from Lincoln Moses' begining statistics 
book (now out of print, so rephrasing from memory).

   Suppose that you flip a coin 10,000 times and get 5101 heads.  What can you 
say?
   a. The coin is not perfectly fair (p.05).  b. But it is darn close to 
perfect! 
As a referee I would be comfortable using that coin to start a football game.

The Cox model gives an average hazard ratio, averaged over time.  When proportional 
hazards holds that value is a complete summary-- nothing else is needed.When it does 
not hold, the average may still be useful, or not, depending on the degree of change over 
time.


Terry Therneau



On 08/13/2013 05:00 AM, r-help-requ...@r-project.org wrote:

Thanks to Bert and G?ran for your responses.

To answer G?ran's comment, yes I did plot the Schoenfeld residuals using
plot.cox.zph and the lines look horizontal (slope = 0) to me, which makes
me think that it contradicts the results of cox.zph.

What alternatives do I have if I assume proportional assumption of coxph
does not hold?

Thanks!


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Repeated measures Cox regression ??coxph??

2013-07-26 Thread Terry Therneau

Two choices.
If this were a linear model, do you like the GEE approach or a mixed effects approach?  
Assume that subject is a variable containing a per-subject identifier.


GEE approach: add + cluster(subject) to the model statement in coxph
Mixed models approach: Add  + (1|subject) to the model statment in coxme.

When only a very few subjects have multiple events, the mixed model (random effect) 
approach may not be reliable, however.  Multiple events per group are the fuel for 
estimation of the variance of the random effect, and with few of these the profile 
likelihood of the random effect will be very flat.  You can get esssentially a random 
estimate of the variance of the subject effect.  I'm still getting my arms around this 
issue, and it has taken me a long time.


Frailty is an alternate label for random effects when all we have is a random 
intercept.  Multiple labels for the same idea adds confusion, but nothing else.


Terry Therneau

On 07/25/2013 08:14 PM, Marc Schwartz wrote:

On Jul 25, 2013, at 4:45 PM, David Winsemiusdwinsem...@comcast.net  wrote:


On Jul 25, 2013, at 12:27 PM, Marc Schwartz wrote:


On Jul 25, 2013, at 2:11 PM, John Sorkinjsor...@grecc.umaryland.edu  wrote:


Colleagues,
Is there any R package that will allow one to perform a repeated measures Cox 
Proportional Hazards regression? I don't think coxph is set up to handle this 
type of problem, but I would be happy to know that I am not correct.
I am doing a study of time to hip joint replacement. As each person has two 
hips, a given person can appear in the dataset twice, once for the left hip and 
once for the right hip, and I need to account for the correlation of data from 
a single individual.
Thank you,
John



John,

See Terry's 'coxme' package:

http://cran.r-project.org/web/packages/coxme/index.html


When I looked over the description of coxme, I was concerned it was not really 
designed with this in mind. Looking at Therneau and Grambsch, I thought section 
8.4.2 in the 'Multiple Events per Subject' Chapter fit the analysis question 
well. There they compared the use of coxph( ...+cluster(ID),,...)  withcoxph( 
...+strata(ID),,...). Unfortunately I could not tell for sure which one was 
being described as superio but I think it was the cluster() alternative. I seem 
to remember there are discussions in the archives.


David,

I think that you raise a good point. The example in the book (I had to wait to 
get home to read it) is potentially different however, in that the subject's 
eye's were randomized to treatment or control, which would seem to suggest 
comparable baseline characteristics for each pair of eyes, as well as an active 
intervention on one side where a difference in treatment effect between each 
eye is being analyzed.

It is not clear from John's description above if there is one hip that will be 
treated versus one as a control and whether the extent of disease at baseline 
is similar in each pair of hips. Presumably the timing of hip replacements will 
be staggered at some level, even if there is comparable disease, simply due to 
post-op recovery time and surgical risk. In cases where the disease between 
each hip is materially different, that would be another factor to consider, 
however I would defer to orthopaedic physicians/surgeons from a subject matter 
expertise consideration. It is possible that the bilateral hip replacement data 
might be more of a parallel to bilateral breast cancer data, if each breast 
were to be tracked separately.

I have cc'd Terry here, hoping that he might jump in and offer some insights 
into the pros/cons of using coxme versus coxph with either a cluster or strata 
based approach, or perhaps even a frailty based approach as in 9.4.1 in the 
book.

Regards,

Marc



--
David.

You also might find the following of interest:

http://bjo.bmj.com/content/71/9/645.full.pdf

http://www.ncbi.nlm.nih.gov/pubmed/6885

http://www.ncbi.nlm.nih.gov/pubmed/22078901



Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prediction survival curves for coxph-models; how to extract the right strata per individual

2013-07-26 Thread Terry Therneau
It would help me to give advice if I knew what you wanted to do with the new curves.  
Plot, print, extract?


A more direct solution to your question will appear in the next release of the 
code, btw.

Terry T.


On 07/25/2013 05:00 AM, r-help-requ...@r-project.org wrote:

My problem is:



I have a coxph.model with several strata and other covariables.

Now I want to fit the estimated survival-curves for new data, using
survfit.coxph.

But this returns a prediction for each stratum per individual. So if I
have 15 new individuals and 10 strata, I have 150 fitted surivival curves
(or if I don't use the subscripts I have 15 predictions with the curves
for all strata pasted together)



Is there any possibility to get only the survival curves for the stratum
the new individual belongs to?




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multcomp on significant interaction in coxme model

2013-07-08 Thread Terry Therneau

--- most of the message deleted

Performing exactly the same routine with the same data on a logistic
model with family=binomial does not give this error message.
So my question is, what am I missing here?

Thanks, for any possible input



I'm not a user of multcomp, so will follow with one question.  Results from coxph and 
coxme do not contain an intercept column.  Could this be the problem?


 A second possiblity is that coxme will in some cases return a bdsmatrix object as the 
variance matrix (to save space).  In such a case  fit$var - as.matrix(fit$var) might 
correct the problem.


Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lexical scoping is not what I expect

2013-06-27 Thread Terry Therneau
I second Ellison sentiments of almost never.  One main reason is readability on later 
viewing.
Yes, as Duncan says global variables can sometimes be handy and make functions quick to 
write, but using a formal argument in the call is always clearer.


Terry Therneau

On 06/27/2013 05:00 AM, r-help-requ...@r-project.org wrote:

On 13-06-26 6:57 AM, S Ellison wrote:




  -Original Message-
  It may be helpful not to worry about the technical details,
  just to look at the source code defining the function:  if it
  is defined in a place where a variable can be seen, it can
  see that variable.


  I too find R's lexical scoping rules straightforward.
  However, I'd say that if your code relies on lexical scoping to find 
something, you should probably rewrite your code.

  The number of times I've seen new R users get unexpected results because 
they haven't noticed that their function is referencing a parent environment 
instead of a locally defined variable or argument is past counting.

  Of course there are times when it's useful and sensible to have globally 
defined variables that can be accessed within a function. But they are very rare; 
as a default, I'd recommend avoiding it if at all possible. If your function needs 
something from outside, pass it as an argument.


I would say the meaning of probably in your 2nd sentence depends quite
a bit on the user.  For beginners, it's almost certainly.  For people
who are comfortable with the concept, it's just maybe.

I would agree that in most cases the only global objects that functions
should reference are other functions, but small nested functions are
quite safe, and it's sometimes useful to create functions in a local
environment so they have persistent memory.

Duncan Murdoch



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] censor=FALSE and id options in survfit.coxph

2013-06-27 Thread Terry Therneau
Yes, it is a bug.  Thanks for providing a complete example.  I'll look into it, but leave 
for a week's vacation in a couple of hours and have some other pressing tasks.


Terry T.


Terry,

I recently noticed the censor argument of survfit.  For some analyses it 
greatly reduces the size of the resulting object, which is a nice feature.

However, when combined with the id argument, only 1 prediction is made.  
Predictions can be made individually but I'd prefer to do them all at once if 
that change can be made.

Chris

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Lexical scoping is not what I expect

2013-06-27 Thread Terry Therneau

Duncan,
I disagree with Duncan was not at all the intent of my note, but on rereading it does 
have that flavor.  Chastisement accepted.  Due to the documentation angle I'd simply 
change your original maybe to sometimes maybe.  A bit more caution but the same message.

Terry T.


On 06/27/2013 07:24 AM, Duncan Murdoch wrote:

On 13-06-27 8:18 AM, Terry Therneau wrote:

I second Ellison sentiments of almost never.  One main reason is readability 
on later
viewing.
Yes, as Duncan says global variables can sometimes be handy and make functions 
quick to
write, but using a formal argument in the call is always clearer.


I didn't say that.  I said that in most cases global variables other than functions 
should not be used.


I would generalize that a little:  global constants are okay, changing globals is 
usually a bad idea.


Duncan Murdoch




Terry Therneau

On 06/27/2013 05:00 AM, r-help-requ...@r-project.org wrote:

On 13-06-26 6:57 AM, S Ellison wrote:




  -Original Message-
  It may be helpful not to worry about the technical details,
  just to look at the source code defining the function:  if it
  is defined in a place where a variable can be seen, it can
  see that variable.


  I too find R's lexical scoping rules straightforward.
  However, I'd say that if your code relies on lexical scoping to find something, 
you should probably rewrite your code.


  The number of times I've seen new R users get unexpected results because they 
haven't noticed that their function is referencing a parent environment instead of a 
locally defined variable or argument is past counting.


  Of course there are times when it's useful and sensible to have globally defined 
variables that can be accessed within a function. But they are very rare; as a 
default, I'd recommend avoiding it if at all possible. If your function needs 
something from outside, pass it as an argument.



I would say the meaning of probably in your 2nd sentence depends quite
a bit on the user.  For beginners, it's almost certainly.  For people
who are comfortable with the concept, it's just maybe.

I would agree that in most cases the only global objects that functions
should reference are other functions, but small nested functions are
quite safe, and it's sometimes useful to create functions in a local
environment so they have persistent memory.

Duncan Murdoch





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survreg with measurement uncertainties

2013-06-12 Thread Terry Therneau
I will assume that you are talking about uncertainty in the response.  Then one simple way 
to fit the model is to use case weights that are proprional to 1/variance, along with 
+cluster(id) in the model statement to get a correct variance for this case.  In linear 
models this would be called the White or Horvitz-Thompsen or GEE working 
independence variance estimate, depending on which literature you happen to be reading 
(economics, survey sampling, or biostat).


Now if you are talking about errors in the predictor variables, that is a much 
harder problem.

Terry Therneau


On 06/12/2013 05:00 AM, Kyle Penner wrote:

Hello,

I have some measurements that I am trying to fit a model to.  I also
have uncertainties for these measurements.  Some of the measurements
are not well detected, so I'd like to use a limit instead of the
actual measurement.  (I am always dealing with upper limits, i.e. left
censored data.)

I have successfully run survreg using the combination of well detected
measurements and limits, but I would like to include the measurement
uncertainty (for the well detected measurements) in the fitting.  As
far as I can tell, survreg doesn't support this.  Does anyone have a
suggestion for how to accomplish this?

Thanks,

Kyle


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Survival aareg problem

2013-06-05 Thread Terry Therneau



On 06/05/2013 12:33 AM, r-help-requ...@r-project.org wrote:

Dear friends - I'm on windows 7, R 2.15.2

when I run the example for aareg in survival package I see this:

   plot(lfit[4], ylim=c(-4,4))
error in xy.coords(x, y, xlabel, ylabel, log) :
'x' is a list, but does not have components 'x' and 'y'

Is that a matter of an old R?

Best wishes

Troels Ring, MD
Aalborg, Denmark


It's an error in the survival package: the line S3method('[', aareg) was missing from 
the NAMESPACE file.
Without it R doesn't find the correct code to perform the subscripting.  (In earlier 
versions of R a declaration of every S3 method was not necessary.)  You are the first to 
run into this.


  This will be fixed in the next (soon) release.  In the meantime, simply read in a copy 
of the subscripting code into your session, either with source() or cut and paste.   The 
code is below.


   Terry Therneau

[.aareg - function(x, ..., drop=FALSE) {
if (!inherits(x, 'aareg')) stop (Must be an aareg object)
i - ..1
if (is.matrix(x$coefficient)) {
x$coefficient - x$coefficient[,i, drop=drop]
x$tweight - x$tweight[,i,drop=drop]
}
else stop(Subsripting impossible, coefficient component not a matrix)

if (!is.null(x$dfbeta)){
x$dfbeta - x$dfbeta[,i,,drop=drop]
x$test.var2 - x$test.var2[i,i,drop=drop]
}
x$test.statistic - x$test.statistic[i, drop=drop]
x$test.var - x$test.var[i,i,drop=drop]
x
}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Mixed effects model with a phylogenetic tree/ distance, matrix as a random effect

2013-06-04 Thread Terry Therneau
Take a look at lmekin() in the coxme package.  The motivating data set for my development 
of coxme was the Minnesota Family Breast Cancer project: 24050 subjects in 462 families.  
The random effect is an intercept per subject with sigma^2 K as its variance where K is 
the kinship matrix (1 for self-self, .5 for parent-child or sib-sib, .25 for uncle-neice, 
etc).  lmekin is a linear models front end to the same underlying routines.


   I think you want lmekin(y ~ x1 + x2 + (1| subject), data=yourdata, varlist= 
D)
or some such, where D is the similarity or correlation form of you distance 
matrix.

   A downside is that lmekin is sort of the poor cousin to comxe -- with finite time I've 
never gotton around to writing predict, residuals, plot, ... methods for it.  The basic 
fit is fine though.


Terry Therneau

(In general I agree with Bert  Ben to try the other list, but I don't happen 
to read it.)

On 06/04/2013 05:00 AM, r-help-requ...@r-project.org wrote:

Hi,
I'm trying to build a mixed-effects model in which I'd like to include
either a distance matrix or a phylogenetic tree as a random effect.
The troubles I've had are that:
1. Function lmer() in package lme4 only accepts a data frame column as a
random factor and not a distance matrix.
2. Function MCMCglmm() in package MCMCglmm only accepts a rooted and
ultrametric phylogenetic tree as a pedigree argument while my tree is
neither (and for various reasons I cannot construct one or coerce mine
to be a rooted, ultrametric tree).

Is there any way around it?
I'd appreciate mostly a solution to problem 1.

Roey


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to compute a P-value for a complex mixture of chi-squared distributions in R

2013-06-03 Thread Terry Therneau

You need to be more explicit about what you are doing.
For this problem:
y = (x1 + x2)/2
where x1 and x2 are chi-square random variables, you want to use the pchisqsum() routine 
found in the survey package.  This is not a trivial computation.


For the alternate problem where y is a random choice of either x1 or x2
y = ifelse(z, x1, x2)
z is binomial and x1, x2 are chisq, then the suggestion by Peter Dalgaard is 
correct.

Which of these two are you trying to solve?

Terry Therneau


On 06/02/2013 05:00 AM, r-help-requ...@r-project.org wrote:

Em 01-06-2013 05:26, Tiago V. Pereira escreveu:

  Hello, R users!

  I am struggling with the following problem:

  I need to compute a P-value for a mixture of two chi-squared
  distributions. My P-value is given by:

  P = 0.5*prob(sqrt(chi2(1))= x) + 0.5*prob(sqrt(chi2(2))= x)

  In words, I need to compute the p-value for 50?50 mixture of the square
  root of a chi-squared random variable with 1 degree of freedom and the
  square root of a chi-squared with two degrees of freedom.

  Although I can quickly simulate data, the P-values I am looking for are at
  the tail of the distribution, that is, alpha levels below 10^-7. Hence,
  simulation is not efficient.

  Are you aware of smart approach?


  All the best,

  Tiago



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] counting process in survival

2013-05-31 Thread Terry Therneau

It is hard to know exactly what you mean with such a generic question.
If you mean treat survival as a counting process, then the answer is yes.  The survival 
package in S (which is the direct ancestor of the Splus package, which is the direct 
ancestor of the R package) was the very first to do this.  I created the feature in 1984.


Terry Therneau

On 05/31/2013 05:00 AM, r-help-requ...@r-project.org wrote:

HiI have a question, Is there a package to do counting process in survival 
analysis with R?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tt function and coxph

2013-05-21 Thread Terry Therneau



On 05/18/2013 05:00 AM, r-help-requ...@r-project.org wrote:

hi all
this command used tt function for all variables.
How can i define a different function for each variable?
exCox.all-coxph(Surv(SURVT,STATUS) ~
RX+LOGWBC+SEX+tt(RX)+tt(LOGWBC)+tt(SEX),
data=rem.data,tt=function(x,t,...) log(t)*x))
thank you


There is currently no way to do what you ask.  Could you give me an example of 
why it is
necessary?  I'd never thought of adding such a feature.

Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R help - bootstrap with survival analysis

2013-04-30 Thread Terry Therneau

This comes up regularly.  Type ?print.survfit and look at the comments there under 
value.

Terry T.

-  begin included message 

Hi,

I'm not sure if this is the proper way to ask questions, sorry if not.  But
here's my problem:

I'm trying to do a bootstrap estimate of the mean for some survival data.
Is there a way to specifically call upon the rmean value, in order to store
it in an object? I've used print(...,print.rmean=T) to print the summary of
survfit, but I'm not sure how to access only rmean because it does not show
up under attributes for survfit.

Thanks for any help in advance!

Fayaaz Khatri

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing two different 'survival' events for the same subject using survdiff?

2013-04-30 Thread Terry Therneau

-Original Message-

I have a dataset which for the sake of simplicity has two endpoints. We would like to test 
if two different end-points have the same eventual meaning. To try and take an example 
that people might understand better:


Lets assume we had a group of subjects who all received a treatment. The could stop 
treatment for any reason (side effects, treatment stops working etc). Getting that data is 
very easy. Measuring if treatment stops working is very hard to capture... so we would 
like to test if duration on treatment (easy) is the same as time to treatment failure (hard).


--- End 

The problem you describe is known as surrogate endpoints and addressing it is harder 
than you think. You will need to look in the literature to gain an understanding of the 
issues before you proceed. Your question is an important one and lots of folks have 
thought about it more deeply than I.


Cohn JN (2004). Introduction to Surrogate Markers. Circulation (American Heart 
Association) 109 (25 Suppl 1): IV20–1. doi:10.1161/01.CIR.133441.05780.1d


Fleming T, David D. Surrogate End Points in Clinical Trials: Are We Being Misled? Ann 
Intern Med. 1996 Oct 1;125(7):605-13


Terry T.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trouble Computing Type III SS in a Cox Regression

2013-04-25 Thread Terry Therneau
You've missed the point of my earlier post, which is that type III is not an answerable 
question.


   1. There are lots of ways to compare Cox models, LRT is normally considered the most 
reliable by serious authors.  There is usually not much difference between score, Wald, 
and LRT tests though, and the other two are more convenient in many situations.


   2. Type III is a question that can't be addressed. SAS prints something out with 
that label, but since they don't document what it is, and people with in-depth knowlegde 
of Cox models (like me) cannot figure out what a sensible definition could actually be, 
there is nowhere to go.  How to do this in R can't be answered.  (It has nothing to do 
with interactions.)


  3. If you have customers who think that the earth is flat, global warming is a 
conspiracy, or that type III has special meaning this is a re-education issue, and I can't 
much help with that.


Terry T.

On 04/25/2013 07:59 AM, Paul Miller wrote

Hi Dr. Therneau,

Thanks for your reply to my question. I'm aware that many on the list do not 
like type III SS. I'm not particularly attached to the idea of using them but 
often produce output for others who see value in type III SS.

You mention the problems with type III SS when testing interactions. I don't 
think we'll be doing that here though. So my type III SS could just as easily 
be called type II SS I think. If the SS I'm calculating are essentially type II 
SS, is that still problematic for a Cox model?

People using type III SS generally want a measure of whether or not a variable 
is contributing something to their model or if it could just as easily be 
discarded. Is there a better way of addressing this question than by using type 
III (or perhaps type II) SS?

A series of model comparisons using a LRT might be the answer. If it is, is 
there an efficient way of implementing this approach when there are many 
predictors? Another approach might be to run models through step or stepAIC in 
order to determine which predictors are useful and to discard the rest. Is that 
likely to be any good?

Thanks,

Paul


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trouble Computing Type III SS in a Cox Regression

2013-04-24 Thread Terry Therneau
I should hope that there is trouble, since type III is an undefined concept for a Cox 
model.  Since SAS Inc fostered the cult of type III they have recently added it as an 
option for phreg, but I am not able to find any hints in the phreg documentation of what 
exactly they are doing when you invoke it.  If you can unearth this information, then I 
will be happy to tell you whether

   a. using the test (whatever it is) makes any sense at all for your data set
   b. if a is true, how to get it out of R

I use the word cult on purpose -- an entire generation of users who believe in the 
efficacy of this incantation without having any idea what it actually does.  In many 
particular instances the SAS type III corresponds to a survey sampling question, i.e., 
reweight the data so that it is balanced wrt factor A and then test factor B in the new 
sample.  The three biggest problems with type III are that
1: the particular test has been hyped as better when in fact it sometimes is sensible 
and sometimes not, 2: SAS implemented it as a computational algorithm which unfortunately 
often works even when the underlying rationale does not hold and
3: they explain it using a notation that completely obscures the actual question.  This 
last leads to the nonsense phrase test for main effects in the presence of interactions.


There is a survey reweighted approach for Cox models, very closely related to the work 
on causal inference (marginal structural models), but I'd bet dollars to donuts that 
this is not what SAS is doing.


(Per 2 -- type III was a particular order of operations of the sweep algorithm for linear 
models, and for backwards compatability that remains the core definition even as 
computational algorthims have left sweep behind.  But Cox models can't be computed using 
the sweep algorithm).


Terry Therneau


On 04/24/2013 12:41 PM, r-help-requ...@r-project.org wrote:

Hello All,

Am having some trouble computing Type III SS in a Cox Regression using either 
drop1 or Anova from the car package. Am hoping that people will take a look to 
see if they can tell what's going on.

Here is my R code:

cox3grp- subset(survData,
Treatment %in% c(DC, DA, DO),
c(PTNO, Treatment, PFS_CENSORED, PFS_MONTHS, AGE, PS2))
cox3grp- droplevels(cox3grp)
str(cox3grp)

coxCV- coxph(Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data=cox3grp, method = 
efron)
coxCV

drop1(coxCV, test=Chisq)

require(car)
Anova(coxCV, type=III)

And here are my results:

cox3grp- subset(survData,
+   Treatment %in% c(DC, DA, DO),
+   c(PTNO, Treatment, PFS_CENSORED, PFS_MONTHS, AGE, 
PS2))

  cox3grp- droplevels(cox3grp)
  str(cox3grp)

'data.frame':   227 obs. of  6 variables:
  $ PTNO: int  1195997 104625 106646 1277507 220506 525343 789119 
817160 824224 82632 ...
  $ Treatment   : Factor w/ 3 levels DC,DA,DO: 1 1 1 1 1 1 1 1 1 1 ...
  $ PFS_CENSORED: int  1 1 1 0 1 1 1 1 0 1 ...
  $ PFS_MONTHS  : num  1.12 8.16 6.08 1.35 9.54 ...
  $ AGE : num  72 71 80 65 72 60 63 61 71 70 ...
  $ PS2 : Ord.factor w/ 2 levels YesNo: 2 2 2 2 2 2 2 2 2 2 ...
  
  coxCV- coxph(Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2, data=cox3grp, method = efron)

  coxCV

Call:
coxph(formula = Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2,
 data = cox3grp, method = efron)


   coef exp(coef) se(coef)  z p
AGE0.00492 1.005  0.00789  0.624 0.530
PS2.L -0.34523 0.708  0.14315 -2.412 0.016

Likelihood ratio test=5.66  on 2 df, p=0.0591  n= 227, number of events= 198
  
  drop1(coxCV, test=Chisq)

Single term deletions

Model:
Surv(PFS_MONTHS, PFS_CENSORED == 1) ~ AGE + PS2
DfAICLRT Pr(Chi)
none 1755.2
AGE 1 1753.6 0.3915  0.53151
PS2 1 1758.4 5.2364  0.02212 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  
  require(car)

  Anova(coxCV,  type=III)

Analysis of Deviance Table (Type III tests)
 LR Chisq Df Pr(Chisq)
AGE   0.3915  10.53151
PS2   5.2364  10.02212 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  

Both drop1 and Anova give me a different p-value than I get from coxph for both 
my two-level ps2 variable and for age. This is not what I would expect based on 
experience using SAS to conduct similar analyses. Indeed SAS consistently 
produces the same p-values. Namely the ones I get from coxph.

My sense is that I'm probably misusing R in some way but I'm not sure what I'm 
likely to be doing wrong. SAS prodcues Wald Chi-Square results for its type III 
tests. Maybe that has something to do with it. Ideally, I'd like to get type 
III values that match those from coxph. If anyone could help me understand 
better, that would be greatly appreciated.

Thanks,

Paul


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide

Re: [R] Weighted Kaplan-Meier estimates with R

2013-03-26 Thread Terry Therneau
There are two ways to view weights.  One is to treat them as case weights, i.e., a weight 
of 3 means that there were actually three identical observations in the primary data, 
which were collapsed to a single observation in the data frame to save space.  This is the 
assumption of survfit.  (Most readers of this list will be too young to remember when 
computer memory was so small that we had to use tricks like this.)  The second assumption 
is that the weights are sampling weights and a Horvitz-Thompsen like estimator should be 
used for variance. This is the assumption of the svykm program in the survey package.  It 
appears you want the second behavior.


Terry Therneau


On 03/26/2013 06:00 AM, r-help-requ...@r-project.org wrote:

As part of a research paper, I would like to draw both weighted and
unweighted Kaplan-Meier estimates, the weight being the ?importance? of the
each project to the mass of projects whose survival I?m trying to estimate.

I know that the function survfit in the package survival accepts weights and
produces confidence intervals. However, I suspect that the confidence
intervals may not be correct. The reason why I suspect this is that
depending on how I define the weights, I get very different confidence
intervals, e.g.

require(survival)
s- Surv(c(50,100),c(1,1))
sf- survfit(s~1,weights=c(1,2))
plot(sf)

vs.

require(survival)
s- Surv(c(50,100),c(1,1))
sf- survfit(s~1,weights=c(100,200))
plot(sf)

Any suggestions would be more than welcome!



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] confidence interval for survfit

2013-03-15 Thread Terry Therneau

The first thing you are missing is the documentation -- try ?survfit.object.
 fit - survfit(Surv(time,status)~1,data)
fit$std.err will contain the standard error of the cumulative hazard or 
-log(survival)

The standard error of the survival curve is approximately S(t) * std(hazard), by the delta 
method.  This is what is printed by the summary function, because it is what user's 
expect, but it has very poor performance for computing confidence intervals.  A much 
better one is exp(-1* confidence interval for the cumulative hazard), which is the 
default.  In fact there are lots of better ones whose relative ranking depends on the 
details of your simulation study.  About the only really consistent result is that 
anything thoughtful beats S(t) +- 1.96 se(S), easily.  The default in R is the one that 
was best in the most recent paper I had read at the time I set the default.  If I were to 
rank them today using an average over all the comparison papers it would be second or 
third, but the good methods are so close that in practical terms it hardly matters.


Terry Therneau

On 03/15/2013 06:00 AM, r-help-requ...@r-project.org wrote:

Hi, I am wondering how the confidence interval for Kaplan-Meier estimator is 
calculated by survfit(). For example,?


  summary(survfit(Surv(time,status)~1,data),times=10)

Call: survfit(formula = Surv(rtime10, rstat10) ~ 1, data = mgi)

?time n.risk n.event survival std.err lower 95% CI upper 95% CI
?? 10 ?? 168? 55??? 0.761? 0.0282??? 0.707??? 0.818


I am trying to reproduce the upper and lower CI by using standard error. As far 
I understand, the default method for survfit() to calculate confidence interval 
is on the log survival scale, so:


upper CI = exp(log(0.761)+qnorm(0.975)*0.0282) = 0.804
lower CI = exp(log(0.761)-qnorm(0.975)*0.0282) = 0.720


they are not the same as the output from survfit().

Am I missing something?

Thanks

John


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing Cox model with Competing Risk model

2013-03-08 Thread Terry Therneau

-- begin included message --
I have a competing risk data where a patient may die from either AIDS or
Cancer. I want to compare the cox model for each of the event of interest
with a competing risk model. In the competing risk model the cumulative
incidence function is used directly.

-end inclusion ---
 If you do want to persue the Fine-Gray model I would suggest using software that already 
exists.  Find the Task Views tab on CRAN, and follow it to survival and then look at 
the competing risks section.  There is a lot to offer.  I would trust it more than rolling 
your own function.


 As an aside, modeling the subdistribution function is ONE way of dealing with competing 
risks, but not everyone thinks that it is the best way to proceed.  The model corresponds 
to a biology that I find unlikely, though it makes for nice math.  Since the alternative 
is discussed in a vignette that I haven't-yet-quite-written we won't persue that any 
further, however. :-)


Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survfit plot question

2013-03-05 Thread Terry Therneau



On 03/05/2013 05:00 AM, r-help-requ...@r-project.org wrote:
Hello, I create a plot from a coxph object called fit.ads4: plot(survfit(fit.ads4)) 
Questions: 1. What is the cross mark in the plot ? 2. How does the cross mark in the 
plot relate to either the rmean or the median from survfit ?

Try help(plot.survfit).  Read the description of mark.time.


3.  What is the meaning of the restricted mean ?  The upper limit noted
in the output is the end of the observation period (i.e., it is always
the Stop value in the Censored observation)

help(print.survfit)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Results from clogit out of range?

2013-03-04 Thread Terry Therneau

I'm late to this discussion, but let me try to put it in another context.
  Assume that I wanted to know whether kids who live west of their school or east of 
their shool are more likely to be early (some hypothesis about walking slower if the sun 
is in their eyes).  So I create a 0/1 variable east/west and get samples of 10 student 
arrival times at each of 100 different schools.  Fit the model


   lm(arrive ~ factor(school) + east.west)

where arrive is in some common scale like minutes since midnight.  Since different 
schools could have different starting times for their first class we need an intercept per 
school.


  Two questions:
 1. Incremental effect: the coefficient of east/west measures the incredmental effect 
across all schools.  With n of 1000 it is likely estimated with high precision.

 2. Absolute: predict the average arrival time (on the clock) for students.

Conditional logistic is very like this.  We have a large number of strata (schools) with 
a small number of observations in each (often only 2 per strata).  One can ask incremental 
questions about variables common to all strata, but absolute prediction is pretty 
worthless.  a. You can only do it for schools (strata) that have already been seen and b. 
there are so few subjects in each of them that the estimates are very noisy.
  The default prediction from clogit is focused on questions of type 1.  The 
documentation doesn't even bother to mention predictions of type 2, which would be 
probabilities of events.  I can think of a way to extract such output from the routine 
(being the author gives some insight), but why would I want to?


Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model convergence

2013-02-18 Thread Terry Therneau



On 02/16/2013 05:00 AM, r-help-requ...@r-project.org wrote:

Then I perform cox regression as follows
m2_1-coxph(Surv(X_t0,X_t, vlsupp) ~ nvp  + as.factor(cd4pccat) +
as.factor(vlcat) + as.factor(agecat) + as.factor(whostage)  +
as.factor(hfacat) + as.factor(wfacat) + as.factor(wfhcat) +
as.factor(resistance) + as.factor(postrantb) +
cluster(id),data=myimp$imputations[[1]],method=breslow,robust=TRUE)
summary(m2_1)
The I get the following eWarning message:
In fitter(X, Y, strats, offset, init, control, weights = weights,  :
   Ran out of iterations and did not converge


Failure to converge is highly unusual for coxph unless the model is near singular.  A good 
rule for Cox models is to have 10-20 events for each coefficient.   When models get below 
2 events/coef the results can be unreliable, both numerically and biologically, and some 
version of a shrinkage model is called for.  What are the counts for your data set?


A vector of inital values, if supplied, needs to be of the same length as the 
coefficients.  Make it the same length and in the same order as the printed coefficients 
from your run that did not converge.


Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NA/NaN/Inf in foreign function call (arg 6) error from coxph function

2013-02-14 Thread Terry Therneau

 The NaN/Inf message is almost certainly from a singular fit.
 I would be interested in a copy of the data set to test it: data sets that cause this 
are rare, and help me to tune the convergence/signularity criteria in the program.


 Note: it is much easier to use coxph(survobj ~ therapy + ReceptorA + ReceptorB, 
data=sample.data) than to put sample.data$ in front of every variable name; and easier 
to read as well.


Terry Therneau (author of coxph function)

On 02/14/2013 05:00 AM, r-help-requ...@r-project.org wrote:

I am trying to fit a multivariate Cox proportional hazards model,
modelling survival outcome as a function of treatment and receptor
status. The data look like below:


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Avoid re-building a vignette

2013-02-11 Thread Terry Therneau

I once knew the answer to this, but can't find where I found it.

Say I have a vignette that I don't want R to automatically rebuild, perhaps it uses some 
data I can't share or takes too long to run or whatever.  Say it is 
mypackage/vignettes/charlie.Rnw


It can create the pdf from an R session that actually has the data, place that result in 
inst/doc/charlie.pdf, then run

 R CMD build --no-vignettes mypackage

So far so good.  But then when I do R CMD INSTALL mypackage.tar.gz the intall process 
tries to run Sweave, resulting in a failure.   How do I get it to skip the Sweave call?


Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating the cumulative incidence function

2013-02-08 Thread Terry Therneau

 You can do this directly with the survfit function (survival verstion 2.37 or 
greater)
If status is a factor variable with levels of censored, relapse and death
fit - survfit(Surv(time, status) ~ sex, data=mydata)
plot(fit, xlab=...

The primary advantage of this over the cuminc package is that all of the usual options for 
survival plots carry forward.

   Terry Therneau


On 02/05/2013 05:00 AM, r-help-requ...@r-project.org wrote:

Hello,
I have a problem regarding calculation of Cumulative Incidence Function.
The event of interest is failure of bone-marrow transplantation, which may
occur due to relapse or death in remission. The data set that I have
consists of- lifetime variable, two indicator variables-one for relapse and
one for death in remission, and the other variables are donor type (having
3 categories), disease type(having 3 categories), disease stage(having 3
categories) and karnofsky score (having 2 categories).

I want to create an R function for cumulative incidence function for the
event (failure of bone-marrow transplantation) due to relapse (cause-1) and
also for the event (failure of bone-marrow transplantation) due to death in
remission (cause-2). Can anyone help me please?
Tas.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] First R package, advice

2013-02-08 Thread Terry Therneau



On 02/07/2013 05:00 AM, r-help-requ...@r-project.org wrote:

I'd argue that there's an important distinction between documenting a
function (how to use it) vs. documenting an algorithm (how it works).
I think noweb can work well for describing how something works, but
it's not so good for describing how to use it
I'm a big fan of noweb for the source code, take a look a the coxme package for extensive 
use of this.  That contains the most complex mathematics of any code I have written, and 
it is a big help to have real equations and discussion mixed in with the code.  The sparse 
matrix algorithms aren't very transparent either so it is a plus on that front too.  A big 
help to me whenever I look back at something.
  That given, I agree with the above that documentation of the algorithm and 
documentation of useage are disjoint tasks.  The pdf file from my noweb source is of 
critical importance to someone changing the code, and of essentially no use to anyone 
else.  Use vignettes for the latter.


 Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] obtainl survival curves for single strata

2013-02-01 Thread Terry Therneau

Stephen,
   I can almost but not quite get my arms around what you are asking.  A bit more detail 
would help.  Like

 cph.approve = what kind of object, generated by function __

But, let me make a guess:
   cph is the result of coxph, and the model has both covariates and a strata
   You want predictioned survival curves, more than 1, of the type covariates = a, b,c, 
strata=1  covariates = d,e, f, strata=2, ... for arbitrary covariates and strata.


Now, the predicted survival curves for different strata are different lengths.
The survfit.coxph routine gets around this by being verbose: it expects you to give it 
covariate sets, and returns
all of the strata for each covariate. This allows it to give a compact result.   You can 
always do:

   newpred - survfit(cox-model-fit, newdata=something)
   newpred[5,17] #  survival curve for the 5th strata, covariates from the 17th row of 
newdata


But, you want to put the results into a matrix, for some pre-specifed set of time points.  
Take it one step further.

mytimepoints - seq(0, 5*365, by=30)  # every 30 days
z - summary(newpred[5,17], time=mytimepoints, extend=TRUE)$surv

The summary.survfit function's time argument was originally written for people who only 
wanted to print certain time points, but it works as well for those who only want to 
extract certain ones.  It correctly handles the fact that the curve is a step function.


Terry Therneau


On 02/01/2013 05:00 AM, r-help-requ...@r-project.org wrote:

What is the syntax to obtain survival curves for single strata on many subjects?

I have a model based on Surv(time,response) object, so there is a single row 
per subject and no start,stop and no switching of strata.

The newdata has many subjects and each subject has a strata and the survival 
based on the subject risk and the subject strata is needed.

If I do

newpred- survfit(cph.approve,new=newapp,se=F)

I get all strata for every subject.

Attempting


  newpred- survfit(cph.approve,new=newapp,id=CertId,se=F)

Error in survfit.coxph(cph.approve, new = newapp, id = CertId, se = F) :
   The individual option is  only valid for start-stop data

  newpred- survfit(cph.approve,new=newapp,indi=T,se=F)

Error in survfit.coxph(cph.approve, new = newapp, indi = T, se = F) :
   The individual option is  only valid for start-stop data

Please, advise if obtaining a single strata for a basic (time,response) model is 
possible? Due to differing lengths of the surv for different strata this will not go in a 
wide data.frame without padding.

Thanks everybody and have a great day.

Stephen B



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predicted HR in coxph with psline

2013-01-25 Thread Terry Therneau
The normalization is the same as is found when you have type=terms from a gam model: 
each term is centered so that mean(predicted) = 0.  For a simple linear term beta*age this 
implies that the predicted value will be 0 at the mean of age, for a polynomial or spline 
this does not translate to a defined centering point.


The key idea is that if you were to replace age with age + 10 the figure 
should not change.

By the way, thanks for including such clear example code for your question.

Terry T.

- begin included message --
I have some questions about the predicted HR in coxph function including
psline covariate.

If just fitting covariate as linear form, by default, the reference value
for each of predictions type (linear predictor, risk and terms) is the mean
covariate within strata.

If the psline is specified for the covariate, what's the reference value
for the predictions?
I did some test code below:
###
par(mfrow = c(1,3))
test = coxph(Surv(time, status) ~ age, cancer)
test2 = coxph(Surv(time, status) ~ pspline(age, df=2), cancer)
test3 = coxph(Surv(time, status) ~ pspline(age, df=5), cancer)
pptest3=predict(test3, type=lp)
pptest2= predict(test2, type=lp)
pptest = predict(test, type=lp)
plot(cancer$age, exp(pptest))
abline(v=mean(cancer$age))
abline(h=1)
plot(cancer$age, exp(pptest2))
abline(v=mean(cancer$age))
abline(h=1)
plot(cancer$age, exp(pptest3))
abline(v=mean(cancer$age))
abline(h=1)

In the first model, age is just fitted as linear, and in the plot, when HR
=1 when age equals to the mean of age. However, it's not the case when age
is fitted as using psline.

So what are the reference value of age when psline is used? Any help or
idea will be very much appreciated!

Best,
Carrie--

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] nesting in CoxPH with survival package

2013-01-24 Thread Terry Therneau
In a Cox model, the baseline hazard takes the place of an intercept.
Try the following:
dummy - rnorm(nrow(goodexp))
testfit - lm(dummy ~ ExpTemp + Stability + Period, data=goodexp, x=T)

Now look at testfit$x, which is the design matrix for the linear model with an 
intercept.  
Examination of it will first of all clarify for you exactly what dummy variable 
coding was 
used for the model.  If this X matrix is singular, the coxph fit will be 
singular in 
exactly the same way and at the same place.  If you regress the last column of 
testfit$x 
on the others you should get a perfect fit, given the output below.  Other than 
the 
intercept the X matrix within coxph is the same as that for lm; the intercept 
is still 
there in the form of a baseline hazard, it just cannot be summarized as a 
single 
coefficient.  (All R modeling functions use the same internal routine to 
generate the 
design matrix.)
   My guess is that the row sums of testfit$x are constant, but that's just a 
guess.

Terry Therneau
PS -- use the spacebar more when showing an example.  It makes it a lot easier 
for the 
rest of us to read.

On 01/24/2013 05:00 AM, r-help-requ...@r-project.org wrote:
 Thank you for the suggestions.

 Just to clarify, my first question was more on what actual coding I
 should be using to indicate a nested variable when using the coxph()
 function.  I asked this after consulting several times with a local
 statistician, but unfortunately neither of us are very familiar with
 R.

 After further consultation, I have changed the design to a 2*2 design
 (2 levels of ExpTemp and Stability each) with blocking (Period).  I am
 still getting the x matrix deemed to be singular error.

   LOEmod3alt=coxph(LOE.fit~ExpTemp+Stability+Period,data=goodexp)
 Warning message:
 In coxph(LOE.fit ~ ExpTemp + Stability + Period, data = goodexp) :
X matrix deemed to be singular; variable 5
   summary(LOEmod3alt)
 Call:
 coxph(formula = LOE.fit ~ ExpTemp + Stability + Period, data = goodexp)

n= 184, number of events= 105

  coef exp(coef) se(coef)  z Pr(|z|)
 ExpTemp -3.17825   0.04166  0.53105 -5.985 2.17e-09 ***
 StabilityStatic -0.84129   0.43115  0.20470 -4.110 3.96e-05 ***
 PeriodB  1.06794   2.90937  0.22859  4.672 2.98e-06 ***
 PeriodC  1.23853   3.45054  0.58457  2.119   0.0341 *
 PeriodD   NANA  0.0 NA   NA
 ---
 Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

  exp(coef) exp(-coef) lower .95 upper .95
 ExpTemp   0.0416624.0047   0.01471 0.118
 StabilityStatic   0.43115 2.3194   0.28866 0.644
 PeriodB   2.90937 0.3437   1.85877 4.554
 PeriodC   3.45054 0.2898   1.0972310.851
 PeriodDNA NANANA

 Concordance= 0.833  (se = 0.03 )
 Rsquare= 0.591   (max possible= 0.995 )
 Likelihood ratio test= 164.4  on 4 df,   p=0
 Wald test= 111.1  on 4 df,   p=0
 Score (logrank) test = 179.9  on 4 df,   p=0

   with(redo, table(LOEStatusfull, Period,ExpTemp))
 , , ExpTemp = FIVE

   Period
 LOEStatusfull  A  B  C  D
0 42  0 35  0
1  4   0 11  0

 , , ExpTemp = FOUR

   Period
 LOEStatusfull  A  B  C  D
0  0  0   0  2
1  0 46  0 44

 As best as I can tell, none of my variables are collinear.  Are there
 any other suggestions of how to deal with this error, or any more
 information I can provide to help understand why I would be getting
 this?

 Thank you for your time and your help,

 Katie

 O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] coxph with smooth survival

2013-01-21 Thread Terry Therneau
For your first question -- read the manual.  ?survfit.coxph will reveal the censor 
argument, which controls the inclusion of points where the curve does not drop.


For your second, smooth is in the eye of the beholder, literally.  If the reason for a 
smooth curve is to plot it, you need to decide how many x-values you want across the range 
of the plotting window, I find 50 to 100 to be enough.
You also need to decide how many degrees of freedom for the spline.  If you are using the 
interpolates for prediction then you need to think through what the statistical properties 
of the smoothed estimates might be.


Here's an example where the default df of smooth.spline doesn't work so well

testfit - coxph(Surv(time, status) ~ age + sex , data=lung, 
subset=(ph.ecog==2))
testcurve - survfit(testfit, data.frame(age=65, sex=2), censor=FALSE)  # prediction for 
65 yr old female

testspl - smooth.spline(testcurve$time, testcurve$surv)
plot(testcurve)
xx - seq(0, 800, length=100)
lines(predict(testspl, xx), col=2)

Terry T.

---  begin included message 
Hello users,

I would like to obtain a survival curve from a Cox model that is smooth and does not have 
zero differences due to no events for those particular days.

I have:
 sum((diff(surv))==0)
[1] 18

So you can see 18 days where the survival curve did not drop due to no events.

Is there a way to ask survfit to fit a nice spline for the survival??

Note: I tried survreg and it did not work, but maybe I did not do it properly??

Thank you very much.

Stephen B

--- end inclusion ---

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] update.packages problem

2013-01-09 Thread Terry Therneau
I've updated to R-devel on my development machine and have lots of packages.  The 
update.packages() script ends up with 33 failures, all due to out-of-order reloading.  
That is, if package abc depends on package xyz, then the reinstall of abc fails with a 
message that version of xyz is built before R 3.0.0: please re-install it.


So I ran it a second time, and got 32 failures.  There should be a better way.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plot survival analysis with time dependent variables

2013-01-02 Thread Terry Therneau


On 01/01/2013 05:00 AM, r-help-requ...@r-project.org wrote:

Dear all,
Is there an implementation of Simon  Makuch method of plotting the survival 
function with time-dependent variables. I?m only able to find event.chart in Hmisc 
for the purpose and I would prefer the Simon and Makuch method. I believe stata has 
it implemented for this purpose, but I cannot find it on CRAN.


Simon R, Makuch RW. A non-parametric graphical representation of the 
relationship between survival and the occurrence of an event: application to 
responder versus non-responder bias.  Statistics in Medicine 1984; 3: 35-44.

Short answer: it's easy, but I don't think the result is a survival curve.

Long answer:
 1. Write out your data in (start, stop] form, exaclty as you would for a time-dependent 
Cox model.  For example:

id   time1  time2  status  response
 1 0  24  1  0
 2 0  15  0  0
 215  43  1  1
 3 0  31  0  0
  ...

Subject 1 has died without response at time 24.  Subject 2 responded at time 15 and then 
died at time 45.  Subject 3 is censored at 31 and has as yet not responded, etc.


The time dependent Cox model is
 fit - coxph(Surv(time1, time2, status) ~ response, data=yourdata).
Everyone agrees that this is the right model, and if you read the score test line of 
summary(fit) you will have the test discussed in equation 1 and 2 of the Simon and Makuch 
paper.  (Well almost-  if there are 2 deaths tied on the same day there is an n-1 vs n-2 
disagreement between authors with respect to the variance formula, but the effect is trivial.)


2. Hazard functions make sense for time dependent covariates.  It is what the Cox model is 
founded on and we all agree.   Simon and Makuck now make the argument

   a. Hazard functions are nice, but my clients are used to survival curves
   b. When there aren't time-dependent covariates S(t) = exp(-H(t)) where S=survival and 
H=hazard

   c. Let's pretend that the formula in b still makes sense for this case, and 
apply it.

The R solution for c is trivial: replace coxph with survfit in the code above.  You 
will get their curves (with proper confidence intervals) and can make plots in the usual way.


3. I don't agree with the argument.  On the positive side, if H1 = H2 (no impact of 
response) then the Cox model will say nothing here and the Simon curves will give a 
picture that agrees, as opposed to some of the more boneheaded (but common in the medical 
literature) curves that they show in their figures 2b and 3b.


The problem is that the result is not a survival curve --- I've never figured out exactly 
WHAT it is.  For something to be interpreted as a survival curve there needs to be a set 
of subjects for whom I could hold up the graph and say this is your future.  Put another 
way, assume that you have the above curves in hand and there is a new patient sitting 
across the desk from you.  What is your answer when he/she says Doc, which curve am I?.


I tend to view these curves as primum non nocere, if faced with a client who absolutely 
won't bend when told the truth that there is no good survival curve for time-dependent 
covariates.


Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-help Digest, Vol 118, Issue 20

2012-12-20 Thread Terry Therneau
I don't know of one.  If building your own you could use rpart with the maxdepth=1 as 
the tool to find the best split at each node.

Terry Therneau


On 12/20/2012 05:00 AM, r-help-requ...@r-project.org wrote:

Hi,

I've searched R-help and haven't found an answer. I have a set of data from 
which I can create a classification tree using
rpart. However, what I'd like to do is predefine the blank structure of the 
binary tree (i.e., which nodes to include) and then use a package like rpart to 
fit for the optimal splitting criteria at each of the predefined nodes.

Does such a package exist?

Thanks,
Lee



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fitting a gamma frailty model (coxph)

2012-12-05 Thread Terry Therneau

I looked at your data:
 table(x, cluster)
 1  2  3  4  5  6
  0  0 48  0 48 48  0
  1 48  0 48  0  0 48

Your covariate x is perfectly predicted by the cluster variable.
If you fit a fixed effects model:
  coxph(Surv(time, event) ~ factor(cluster) +x)

then the x variable is declared redundant.  When the variance of the random 
effect is
sufficiently large, the same happens in the gamma model when the variance is sufficiently 
large.  Your model approaches this limit, and the solution fails.  As mentioned in the 
manual page, the coxme function is now preferred.


Last, your particular error message is caused by an invalid value for sparse.  I'll add 
a check to the program.

You likely want sparse=10 to force non-sparse computation.

Terry Therneau



On 12/04/2012 05:00 AM, r-help-requ...@r-project.org wrote:

Dear all,

I have a data 
sethttp://yaap.it/paste/c11b9fdcfd68d02b#gIVtLrrme3MaiQd9hHy1zcTjRq7VsVQ8eAZ2fol1lUc=with
6 clusters, each containing 48 (possibly censored, in which case
event = 0) survival times. The x column contains a binary explanatory
variable. I try to describe that data with a gamma frailty model as follows:

library(survival)

mod- coxph(Surv(time, event) ~
x + frailty.gamma(cluster, eps=1e-10, method=em, sparse=0),
   outer.max=1000, iter.max=1,
   data=data)

Here is the error message:

Error in if (history[2, 3]  (history[1, 3] + 1)) theta-
mean(history[1:2,  :
   missing value where TRUE/FALSE needed

Does anyone have an idea on how to debug?

Yours sincerely,
Marco


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fitting and plotting a coxph with survfit, package(surv)

2012-11-28 Thread Terry Therneau

I answered a similar question yesterday:
The survfit routine will produce predicted survival curves for any requested combination 
of the covariates in the original model.  This is not the same thing as an adjusted 
survival curve.  Confusion on this is prevalent, however.  True adjustment requires a 
population average over the confounding factors and is closely related to the standardized 
incidence ratio concept found in epidemiology.


To do this you need to define a poplation of ages.  See chapter 10 of the book by Therneau 
and Grambsch for an explantion of the issues and examples of how to get the population 
value.  It's hard to distill 20 pages down into an email message.


Terry Therneau

-- begin included message -

I have a database with 18000 observations and 20 variables. I am running
cox regression on five variables and trying to use survfit to plot the
survival based on a specific variable without success.

Lets say I have the following coxph:
library(survival)
fit - coxph(Surv(futime, fustat) ~ age + rx, data = ovarian)
fit
what I am trying to do is plot a survival comparing objects based on rx.
Using this
plot(survfit(fit, newdata=data.frame(rx =c(1:2), age=c(60)),
 xscale=365.25, xlab = Years, ylab=Survival))
I get the survival for patients at 60, but is there an option to get a
survfit for the patients regardless of the value in variable age?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plotting an adjusted survival curve

2012-11-26 Thread Terry Therneau
First a statistical issue: The survfit routine will produce predicted survival curves for 
any requested combination of the covariates in the original model.  This is not the same 
thing as an adjusted survival curve.  Confusion on this is prevalent, however.  True 
adjustment requires a population average over the confounding factors and is closely 
related to the standardized incidence ratio concept found in epidemiology.


To answer your technical question:
   fit - coxph(Surv(.
   mysurv - survfit(fit, newdata= mydata)
This will give a set of predicted curves, one for each observation in mydata.  If we 
assume 2 treatments and 4 ethnicities, this means that there are 8 possible predicted 
curves.  You can certainly take the curves for trt=1, white and trt=2, white, plot 
them together on one graph, and call this your adjusted survival curves; the mydata data 
set would have two observations.  This is not a correct label but is certainly common.


Terry Therneau

On 11/26/2012 05:00 AM, r-help-requ...@r-project.org wrote:

Dear R-users
I am trying to make an adjusted Kaplan-Meier curve (using the Survival package) 
but I am having difficulty with
plotting it so that the plot only shows the curves for the adjusted results.
My data come from a randomised controlled trial, and I would like the adjusted 
Kaplan-Meier
curve to only show two curves for the adjusted survival: one for those on 
treatment (Treatment==1)
and another curve for those on placebo (Treatment==0).

My problem is that when I plot the survfit of my coxph, I think it displays a 
curve for
every single individual factor in my coxph, whereas I would like it to only 
display the
adjusted curves for when Treatment==1 and Treatment==0.  How can I do this?

A simplified example of my code with only one effect-modifier is:

simple.cox.ethnicity- coxph(Surv(whenfailed,failed) ~ factor(Treatment) + 
factor(ethnicity)) #I've my data are attached already
survfit.simple.cox.ethnicity- survfit(simple.cox.ethnicity,survmat) #survmat 
is a data.frame that contains Treatment and ethnicity
plot(survfit.simple.cox.ethnicity, col=c(red,black), main=survfit.simple.cox, 
xlab=survival time, ylab=propotion surviving)

Thank you so much for your help.
Yours gratefully,
Brent Caldwell


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survfit number of variables != number of variable names

2012-11-19 Thread Terry Therneau

I can't reproduce the problem.

Tell us what version of R and what version of the survival package.
Create a reproducable example.  I don't know if some variables are numeric and some are 
factors, how/where the surv object was defined, etc.


Terry Therneau



On 11/17/2012 05:00 AM, r-help-requ...@r-project.org wrote:

This works ok:


  cox = coxph(surv ~ bucket*(today + accor + both) + activity, data = data)
  fit = survfit(cox, newdata=data[1:100,])

but using strata leads to problems:


  cox.s = coxph(surv ~  bucket*(today + accor + both) + strata(activity),
  data = data)
  fit.s = survfit(cox.s, newdata=data[1:100,])

Error in model.frame.default(data = data[1:100, ], formula = ~bucket +  :
   number of variables != number of variable names

Note that the following give rise to the same error:


  fit.s = survfit(cox.s, newdata=data)

Error in model.frame.default(data = data, formula = ~bucket + today +  :
   number of variables != number of variable names

but if I use data implicitly, all is working fine:

  fit.s = survfit(cox.s)

Any idea on how I could solve this?

Best, and thank you,

ge


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survreg gompertz

2012-11-16 Thread Terry Therneau

The short answer is sort of.
Medium is that survreg implements the model framework found in Kalbfeisch and Prentice, 
The statistical analysis of failure time data, chapter, chapter 2.2.  The ime variable T has

   f(time) ~ X' beta +  sigma * W
where W is an error term from some distribution and f is a fixed function (usually 
identity or log).  Any distribution that can be shoehorned into this framework can be fit 
with survreg.


A longer and more detailed discussion is found in Division of Biostatistics Technical 
Report #53, Mayo Clinic,
A package for survival analysis in S.   One day I need to update this and make it a 
vignette for the package


   http://mayoresearch.mayo.edu/mayo/research/biostat/techreports.cfm


  Terry Therneau

 begin included message ---
Hi all,

Sorry if this has been answered already, but I couldn't find it in the
archives or general internet.

Is it possible to implement the gompertz distribution as
survreg.distribution to use with survreg of the survival library?
I haven't found anything and recent attempts from my side weren't
succefull so far.

I know that other packages like 'eha' and 'flexsurv' offer functions
similar to survreg with gompertz support. However, due to the run-time
environment were this needs to be running in the end, I can't use these
packages :(

Same questions for the gompertz-makeham distribution.

Many thanks!

Matthias

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] FW: replace repeated id in a pedigree list

2012-10-31 Thread Terry Therneau

 I use kinship package of R and it doesn't create a kinship matrix with 
repeated id.

The kinship package is out of date, use the kinship2 and coxme packages (it was split into 
two parts).  Then the problem you describe no longer exists -- the kinship function no 
longer requires each subject id to be unique across all pedigrees.


pedlist - with(mydata, pedgiree(id, fa_id, mo_id, sex, famid=famid))
kmat - kinship(pedlist)
fit - coxme(Surv(time, status) ~ x1 + x2 +... + (1| famid/id), varlist=kmat)

The pedlist object contains all the families, plot(pedlist[4]) would pull out and plot the 
4th family.

The column labels of kmat will be of the form famid/id


To solve your original problem (though you don't need to) create a single 
unified id.
uid - paste(famid, id, sep='/')
idlist - unique(uid)

newid - match(uid, idlist)
newmom - match(paste(famid, mo_id, sep='/'), idlist)
newdad - match(paste(famid, fa_id, sep='/'), idlist)


Terry Therneau
author of kinship and coxme, but not :-) the maintainer of kinship2

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kaplan Meier Post Hoc?

2012-10-25 Thread Terry Therneau
No, you didn't miss anything in the survival package.  I've never found post-hoc tests 
interesting so have little motivation to add such (and a very long to do list of things 
I would like to add).


If you simply must have them, why not do all pairwise tests?

chisq - matrix(0., 4,4)
for (i in 1:4) {
for (j in (1:4)[-i]) {
 temp - survdiff(Surv(time, status) ~ group, data=mydata,
 subset=(group %in% (unique(group))[c(i,j)]))
 chisq[i,j] - temp$chisq
 }
}

Terry Therneau

On 10/25/2012 05:00 AM, r-help-requ...@r-project.org wrote:

This is more of a general question without data.  After doing 'survdiff',
from the 'survival' package, on strata including four groups (so 4 curves
on a Kaplan Meier curve) you get a chi squared p-value whether to reject
the null hypothesis or not.  Is there a method to followup with pairwise
testing on the respective groups?  I have searched the library but have
come up with nothing.  Perhaps I am mistaken in something here.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Expected number of events, Andersen-Gill model fit via coxph in package survival

2012-10-22 Thread Terry Therneau

Therneau doesn't know the answer either.
The predictions are positively correlated since they all depend on the same beta-hat from 
the original model.  The same will be true for any model: logistic, poisson, linear, ...


Terry T

On 10/20/2012 06:06 PM, Omar De la Cruz C. wrote:

I have a follow-up question (for either Dr. Therneau, or anybody who
might know).

sum(zz) (see below) estimates the number of events for the cohort.
Now, how can I compute a confidence interval for sum(zz)? Or a
standard error?

My obvious choice, square root of the sum of the squares of the
standard errors for zz provided by predict.coxph, turns out to be too
small.

Thank you for any suggestions.

Omar.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looping survdiff

2012-10-19 Thread Terry Therneau

The number of recent questions from umn.edu makes me wonder if there's homework 
involved

Simpler for your example is to use get and subset.
dat - structure(.as found below
var.to.test - names(dat)[4:6]   #variables of interest
nvar - length(var.to.test)
chisq - double(nvar)
for (i in 1:nvar) {
tfit - survdiff(Surv(time, completion==2) ~ get(var.to.test[i]), data=dat, 
subset=(group==3))

chisq[i] - tfit$chisq
}
write.csv(data.frame(var.to.test, chisq))

On 10/19/2012 05:00 AM, r-help-requ...@r-project.org wrote:

Hello,

I am trying to set up a loop that can run the survdiff function with the
ultimate goal to generate a csv file with the p-values reported.  However,
whenever I try a loop I get an error such as invalid type (list) for
variable 'survival_data_variables[i].

This is a subset of my data:

structure(list(time = c(1.516667, 72, 72, 25.78333,
72, 72, 72, 72, 72, 72, 1.18, 0.883,
1.15, 0.867, 72, 1.03, 72, 1.05, 72,
22.76667), group = c(2L, 1L, 3L, 3L, 3L, 4L, 4L,
1L, 3L, 3L, 3L, 3L, 4L, 3L, 3L, 4L, 3L, 4L, 3L, 4L), completion =
structure(c(2L,
1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L,
2L, 1L, 2L), .Label = c(1, 2), class = factor), var1 =
structure(c(2L,
2L, 3L, 1L, 1L, 3L, 1L, 1L, 1L, 3L, 2L, 2L, 4L, 3L, 2L, 4L, 2L,
4L, 2L, 4L), .Label = c(1, 2, 3, 4), class = factor),
 var2 = structure(c(3L, 3L, 1L, 1L, 2L, 4L, 3L,
 3L, 2L, 4L, 2L, 1L, 2L, 1L, 2L, 2L, 4L, 4L, 2L, 3L), .Label = c(1,
 2, 3, 4), class = factor), var3 = structure(c(4L,
 2L, 3L, 1L, 3L, 4L, 4L, 2L, 2L, 4L, 2L, 2L, 1L, 2L, 2L, 2L,
 1L, 3L, 4L, 1L), .Label = c(1, 2, 3, 4), class = factor)),
.Names = c(time,
group, completion, var1, var2,
var3), row.names = c(NA, 20L), class = data.frame)


The loop I have been trying for just group 3 is:

d=data.frame()
for(i in 4:6){
 a=assign(paste(p-value,i,sep=),
 survdiff(Surv(time, completion==2)~dat[i],
 data=dat[group==3,],
 rho=0))
 b=as.matrix(a$chisq)
 d=rbind(d, b)
write.csv(d, file=C:/.../junk.csv, quote=FALSE)}

Perhaps I am making this more difficult than it needs to be.  Thanks for
any help,

Charles


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kaplan-Meier plotting quirks

2012-10-18 Thread Terry Therneau
Better would be to use interval censored data.  Create your data set so that 
you have 
(time1, time2) pairs, each of which describes the interval of time over which 
the tag was 
lost.  So an animal first captured at time 10 sans tag would be (0,10); with 
tag at 5 and 
without at 20 would be (5,20), and last seen with tag at 30 would be (30, NA).
Then survit(Surv(time1, time2, type='interval2') ~ 1, data=yourdata) will give 
a curve 
that accounts for interval censoring.
   As a prior poster suggested, if the times are very sparse then you may be 
better off 
assuming a smooth curve.  Use the survreg function with the same equation as 
above; see 
help(predict.survreg) for an example of how to draw the resulting survival 
curve.

Terry Therneau

On 10/18/2012 05:00 AM, r-help-requ...@r-project.org wrote:
 -Original Message-
 From: Michael Rentz [mailto:rent0...@umn.edu]
 Sent: Tuesday, October 16, 2012 12:36 PM
 To:r-help@r-project.org
 Subject: [R] R Kaplan-Meier plotting quirks?

 Hello. I apologize in advance for the VERY lengthy e-mail. I endeavor to 
 include enough detail.

 I have a question about survival curves I have been battling off and on for a 
 few months. No one local seems to be able to help, so I turn here. The issue 
 seems to either be how R calculates Kaplan-Meier Plots, or something with the 
 underlying statistic itself that I am misunderstanding. Basically, longer 
 survival times are yielding steeper drops in survival than a set of shorter 
 survival times but with the same number of loss and retention events.

 As a minor part of my research I have been comparing tag survival in marked 
 wild rodents. I am comparing a standard ear tag with a relatively new 
 technique. The newer tag clearly ?wins? using survival tests, but the 
 resultant Kaplan-Meier plot does not seem to make sense. Since I am dealing 
 with a wild animal and only trapped a few days out of a month the data is 
 fairly messy, with gaps in capture history that require assumptions of tag 
 survival. An animal that is tagged and recaptured 2 days later with a tag and 
 30 days later without one could have an assumed tag retention of 2 days 
 (minimum confirmed) or 30 days (maximum possible).

 Both are significant with a survtest, but the K-M plots differ. A plot of 
 minimum confirmed (overall harsher data, lots of 0 days and 1 or 2 days) 
 yields a curve with a steep initial drop in ?survival?, but then a leveling 
 off and straight line thereafter at about 80% survival. Plotting the maximum 
 possible dates (same number of losses/retention, but retention times are 
 longer, the length to the next capture without a tag, typically
 25-30 days or more) does not show as steep of a drop in the first few days, 
 but at about the point the minimum estimate levels off this one begins 
 dropping steeply. 400 days out the plot with minimum possible estimates has 
 tag survival of about 80%, whereas the plot with the same loss rate but 
 longer assumed survival times shows only a 20% assumed survival at 400 days. 
 Complicating this of course is the fact that the great majority of the 
 animals die before the tag is lost, survival of the rodents is on the order 
 of months.

 I really am not sure what is going on, unless somehow the high number of 
 events in the first few days followed by few events thereafter leads to the 
 assumption that after the initial few days survival of the tag is high. The 
 plotting of maximum lengths has a more even distribution of events, rather 
 than a clumping in the first few days, so I guess the model assumes 
 relatively constant hazards? As an aside, a plot of the mean between the 
 minimum and maximum almost mirrors the maximum plot. Adding five days to the 
 minimum when the minimum plus 5 is less than the maximum returns a plot with 
 a steeper initial drop, but then constant thereafter, mimicking the minimum 
 plot, but at a lower final survival rate.

 Basically, I am at a loss why surviving longer would*decrease*  the survival 
 rate???

 My co-author wants to drop the K-M graph given the confusion, but I think it 
 would be odd to publish a survival paper without one. I am not sure which 
 graph to use? They say very different things, while the actual statistics do 
 not differ that greatly.

 I am more than happy to provide the data and code for anyone who would like 
 to help if the above is not explanation enough. Thank you in advance.

 Mike.


 --
 Michael S. Rentz
 PhD Candidate, Conservation Biology
 University of Minnesota
 5122 Idlewild Street
 Duluth, MN 55804
 (218) 525-3299
 rent0...@umn.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems with coxph and survfit in a stratified model with interactions

2012-10-15 Thread Terry Therneau
Your df object (newdata) has to have ALL the variables in it.  Normally you wouldn't need 
the strata variable, but in this case cov1 is also a covariate.  Look at the example I 
gave earlier.


Terry T.

On 10/15/2012 05:00 AM, r-help-requ...@r-project.org wrote:

Many thanks for your very quick reply!  I tried replacing ?~ strata(cov1) +
cov1:cov2 ? with  ?~ strata(cov1):cov2 ?. As a result, I get an error
message saying that ?object 'cov1' not found?. I guess R refers to the data
frame. Any idea how to fix the error? Complete code follows.

Best regards,
Roland



require(survival)
data(lung)
#
lung$cov1- as.factor(lung$ph.ecog)
lung$cov2- as.factor(lung$sex)
levels(lung$cov1)[levels(lung$cov1)==0]- zero
levels(lung$cov1)[levels(lung$cov1)==1]- one
levels(lung$cov1)[levels(lung$cov1)==2]- two
levels(lung$cov1)[levels(lung$cov1)==3]- three
levels(lung$cov2)[levels(lung$cov2)==1]- male
levels(lung$cov2)[levels(lung$cov2)==2]- female
#
df- data.frame(
   cov2=factor(female, levels = levels(lung$cov2))
)
sCox- coxph(Surv(time, status) ~ strata(cov1):cov2, data=lung)
sfCox- survfit(sCox,newdata=df)



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems with coxph and survfit in a stratified model, with interactions

2012-10-14 Thread Terry Therneau
First, here is your message as it appears on R-help.

On 10/14/2012 05:00 AM, r-help-requ...@r-project.org wrote:
 I?m trying to set up proportional hazard model that is stratified with
 respect to covariate 1 and has an interaction between covariate 1 and
 another variable, covariate 2. Both variables are categorical. In the
 following, I try to illustrate the two problems that I?ve encountered, using
 the lung dataset.



 The first problem is the warning:



 To me, it seems that there are too many dummies generated.

 The second problem is the error:

Please try to fix this in the future (Nabble issue?)

As to the problems: handling strata by covariate interactions turns out to be a 
bit of a 
pain in the posteriorin the survival code.  It would have worked, however, if 
you had done 
the following:
 fit - coxph(Surv(time, status) ~ strata(cov1) * cov2, data=...)
or ~ strata(cov1):cov2
or ~ strata(cov1):cov2 + cov2

But by using
~ strata(cov1) + cov1:cov2

you fooled the program into thinking that there was no strata by covariate 
interaction, 
and so it did not follow the special logic necessary for that case.

  Second issue: The model.matrix function of R, common to nearly all the 
modeling 
functions (including coxph) tries to guess which dummy variables will be 
redundant, and 
thus can be removed from the X matrix before the fit.  Such an approach is 
doomed to 
failure.  I'm actually surprised at how often R guesses correctly, because 
until a matrix 
decomposition is actually performed the only thing possible is an informed 
guess.  Your 
particular case gives rise to a larger than usual number of NA coefs (redundant 
columns), 
but short of building your own X matrix by hand there isn't anything to be done 
about it.  
Just ignore them.

Terry Therneau


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question on survival

2012-10-12 Thread Terry Therneau

It is easy to get a cumulative hazard curve.
First, decide what values of age, a, and b you want curves for

  tdata- data.frame(age=55, a=2, b=6)

Get the curves, there will be one for each strata in the output
  sfit- survfit(coxPhMod, newdata= tdata)

Plot them
  plot(sfit, fun='cumhaz', col=1:4, xlab= etc etc)

Hazard functions are something else again, estimating these rigorously is
akin to density estimation.  A quick and dirty method is to use smooth.spline.
   temp- sfit[1]  #grab the first curve
   tfit- smooth.spline(temp$time, -log(temp$surv), df= 5) #smooth the cum haz
   plot(predict(tfit, deriv=1))

That value of df=5 is made up -- you need to decide for yourself how much 
smoothing to do.
I make no claims that this is statistically well grounded, it's just a good way 
to get
a quick idea.

PS; There is no such thing as THE baseline hazard function; predictions are 
always for some particular value of the covariates.  In a book it is sometimes useful to 
pick a particular set of x values as a default in order to simplify notation, often x=0, 
and label that as a baseline.  But in actual computation all zeros is usually crazy 
(age=0, weight=0, blood pressure=0, etc).

Terry Therneau



Hi,
I'm going crazy trying to plot a quite simple graph.
i need to plot estimated hazard rate from a cox model.
supposing the model i like this:
coxPhMod=coxph(Surv(TIME, EV) ~ AGE+A+B+strata(C) data=data)
with 4 level for C.
how can i obtain a graph with 4 estimated (better smoothed) hazard curve
(base-line hazard + 3 proportional) to highlight the effect of C.
thanks!!
laudan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Survival prediction

2012-10-08 Thread Terry Therneau



Dear All,

I have built a survival cox-model, which includes a covariate * time 
interaction. (non-proportionality detected)
I am now wondering how could I most easily get survival predictions from my 
model.

My model was specified:
coxph(formula = Surv(event_time_mod, event_indicator_mod) ~ Sex +
 ageC + HHcat_alt + Main_Branch + Acute_seizure + TreatmentType_binary +
 ICH + IVH_dummy + IVH_dummy:log(event_time_mod)

And now I was hoping to get a prediction using survfit and providing new.data 
for the combination of variables
I am doing the predictions:
  survfit(cox, new.data=new)


 Some comments:
1. even though it is in the SAS manual and some literature, I have myself never used 
 X * log(time) as a fix for lack  of proportionality.  Is it really true that when you 
use

   fit - coxph(Surv(event_time_mod, event_indicator_mod) ~ Sex +
ageC + HHcat_alt + Main_Branch + Acute_seizure + 
TreatmentType_binary +
 ICH + IVH_dummy)
   zfit - cox.zph(fit, transform=log)
   plot(zfit[8])

that the estimated function is linear?  I have not yet seen such a simple time 
effect
and would find it interesting.

2. The code you wrote does not fit the time dependent model that you suppose; it 
treats event_time_mod as a fixed covariate.  To fit the model see the relevant vignette 
for the survival package.  Essentially the program has to build a large (start, stop) data 
set behind the scenes.  (SAS does the same thing).  Defining proper residuals for said 
data set is hard and the R code does not yet do this.  (Last I checked, SAS did the same 
thing.)


3. The survival curve for a time dependent covariate is something that is not 
easily defined.  Read chapter 10.2.4 of the Therneau and Grambch book for a discussion of 
this (largely informed by the many mistakes I've myself made.)


Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Expected number of events, Andersen-Gill model fit via coxph in package survival

2012-10-08 Thread Terry Therneau



I am interested in producing the expected number of events, in a
recurring events setting. I am using the Andersen-Gill model, as fit
by the function coxph in the package survival.

I need to produce expected numbers of events for a cohort,
cumulatively, at several fixed times. My ultimate goal is: To fit an
AG model to a reference sample, then use that fitted model to generate
expected numbers of events for a new cohort; then, comparing the
expected vs. the observed numbers of events would give us some idea of
whether the new cohort differs from the reference one.


From my reading of the documentation and the text by Therneau and

Grambsch, it seems that the function survexp is what I need. But
using it I am not able to obtain expected numbers of events that match
reasonably well the observed numbers *even for the same reference
population.* So, I think I am misunderstanding something quite badly.



 You've hit a common confusion.  Observed versus expected events computations are done on 
a cumulative hazard scale H, not the surivival scale S; S = exp(-H).  Relating this back 
to simple Poisson models H(t) would be the expected number of events by time t and S(t) 
the probability of no events before time t.  G. Berry (Biometrics 1983) has a classic 
ane readable article on this (especially if you ignore the proofs).


  Using your example:

 cphfit - 
coxph(Surv(start,stop,event)~rx+number+size+cluster(id),data=bladder2)
 zz - predict(cphfit, type='expected')
 c(sum(zz), sum(bladder2$event))
[1] 112 112

 tdata - bladder2[1:10]   #new data set (lazy way)
 predict(cphfit, type='expected', newdata=tdata)
 [1] 0.0324089 0.3226540 0.4213402 1.0560768 0.6702130 0.2163531 0.6490665
 [8] 0.8864808 0.2932915 0.5190647


 You can also do this using survexp and the cohort=FALSE argument, which would return 
S(t) for each subject and we would then use -log(result) to get H.  This is how it was 
done when I wrote the book, but the newer predict function is easier.


Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] variances of random effects in coxme

2012-10-08 Thread Terry Therneau
You are right, those look suspicious.  What version of R and of the coxme package are you 
running?  Later version of coxme use multiple starting estimates due to precisely this 
kind of problem.
  Also, when the true MLE is variance=0 the program purposely never quite gets there, in 
order to avoid log(0).  Compare the log-lik to a fixed effects model with those covariates.

   I can't do more than guess without a reproducable example.

Terry Therneau


On 10/08/2012 05:00 AM, r-help-requ...@r-project.org wrote:

Dear R users,

I'm using the function coxme of the package coxme in order to build Cox
models with complex random effects. Unfortunately, I sometimes get
surprising estimations of the variances of the random effects.

I ran models with different fixed covariates but always with the same 3
random effects defined by the argument
varlist=coxmeMlist(list(mat1,mat2,mat3), rescale = F, pdcheck = F,
positive=F). I get a few times exactly the same estimations of the
parameters of the random effects whereas the fixed effects of the models
are different:

Random effects
  Group Variable Std DevVariance
  idp   Vmat.1   0.1000 0.0100
Vmat.2   0.02236068 0.0005
Vmat.3   0.02449490 0.0006

The variances are round figures, so I have the feeling that the algorithm
didn't succeed in fitting the model.

Has anyone ever faced to this problem?

Thanks,

Hugo


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problems with printing and plotting aareg

2012-10-05 Thread Terry Therneau

It's a bug in summary.aareg which no one found until now.

What's wrong:
If dfbeta=TRUE then there is a second estimate of variance calculated, labeled as 
test.var2. If maxtime is set, then both estimates of variance need to be recalculated by 
the summary routine.  An incorrect if-then-else flow led it to look for test.var2 when it 
wasn't relevant.  My test cases with maxtime also happened to have dfbeta=TRUE.


Short term solution: set dfbeta=TRUE.  A bit more computation time though.
Long term: I'll fix it, and a new version of survival will one day appear.  (With 200+ 
packages that depend on survival, new releases now require a lot of checking.  No 
overnight fixes).


Terry T

On 10/05/2012 05:00 AM, r-help-requ...@r-project.org wrote:

Hi all,

I've ventured into the world of nonparametric survival and I would like to use the 
maxtime option for printing and plotting my aareg fit.

However, my fit does not have test.var2 and this stops the print and plot 
when adding a maxtime.

My code is as follows:

Response-Surv(Time,Event)
Model-aareg(Response~Factor1*Factor2)
Model2-aareg(Response~Factor1+Factor2)  #Just did this to see if the 
interaction term had anything to do with it

Model, print(Model), summary(Model), and plot(Model) seem to work fine, but as 
soon as I try summary/print/plot(Model, maxtime=400) it tells me that test.var2 
is not found and when I look at  summary(Model), there is indeed a NULL under 
test.var2.

Anyone know why it doesn't include test.var2? Is this a compatibility problem? 
I'm using R version 2.13 (I know it's quite old, but updating is a pain when 
you don't have admin rights to your computer) and just updated the survival 
package (no warning messages).

Any input would be much appreciated.

Cheers,

Freya


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] erasing a polygon

2012-09-27 Thread Terry Therneau
I'm updating some (very) old code, and one particular option of its plot method depends on 
a once-was-true trick

polygon(x, y, border=TRUE, col=0)
polygon(x, y, border=TRUE, density=0)

would draw the polygon AND erase whatever was underneath it back to background 
color.

Is there a reliable way to do this with the current R (standard graphics)?

Terry Therneau

PS For the inquiring, the routine is text.rpart with the fancy=T option, and the original 
target was the postscript driver on Splus 3.4.  (I said it was old.)  The plot.rpart 
routine draws the branches, and text.rpart then wants to lay down some ellipses, erasing 
what is underneath them.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] erasing a polygon

2012-09-27 Thread Terry Therneau


I was able to puzzle it out with the help of the book R Graphics (Murrell).
When par(bg) = transparenent one needs to use col=white, otherwise the
old code col=0 works correctly.

The default for pdf and x11, the two I use, is transparent.

Terry Therneau



On 09/27/2012 08:48 AM, PIKAL Petr wrote:

Hi

It seems that it still works.

x-c(3,7,7,3)
y-c(4,4,6,6)
par(bg=pink)
plot(1:10,1:10)
polygon(x, y, border=TRUE, col=0)


Sent: Thursday, September 27, 2012 2:37 PM
To: r-help@r-project.org
Subject: [R] erasing a polygon

I'm updating some (very) old code, and one particular option of its
plot method depends on a once-was-true trick
  polygon(x, y, border=TRUE, col=0)
  polygon(x, y, border=TRUE, density=0)

would draw the polygon AND erase whatever was underneath it back to
background color.

Is there a reliable way to do this with the current R (standard
graphics)?

Terry Therneau

PS For the inquiring, the routine is text.rpart with the fancy=T
option, and the original target was the postscript driver on Splus 3.4.
(I said it was old.)  The plot.rpart routine draws the branches, and
text.rpart then wants to lay down some ellipses, erasing what is
underneath them.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extrapolating Cox predicted risk

2012-09-26 Thread Terry Therneau



On 09/26/2012 05:00 AM, r-help-requ...@r-project.org wrote:

I generated predicted risk of death for each subject in the study by
means of Cox proportional hazards model at 8 year of follow-up, a time
point at which follow-up was more than 90% complete. It is possible to
extrapolate to 10-year the predicted risk of each subjet  by assuming
an exponential distribution?
Any help would be greatly appreciated.
Thanks for your consideration.



The ability to extrapolate over long distances is exactly why industrial reliability work 
uses parametric models (weibull or log-normal, see survreg) instead of the Cox model.


If any of your data goes out to 10 years, then the predictions for coxph will go out that 
far, just like they would for a Kaplan-Meier.  But, just like the KM, if there are only a 
handful of people out that far the extrapolated curve will be very noisy.


Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correlation between random effects in the package coxme

2012-09-17 Thread Terry Therneau



On 09/14/2012 05:00 AM, r-help-requ...@r-project.org wrote:

Hello,

Why the correlation between the random effects is negative?
library(coxme)
rats1- coxme(Surv(time, status) ~ (1|litter), rats)
random.effects(rats1)[[1]] #one value for each of the 50 litters
print(rats1)

rats2- lmekin(time ~ (1|litter), rats)
fixed.effects(rats2)
random.effects(rats2)[[1]] #one value for each of the 50 litters
print(rats2)

cor(random.effects(rats1)[[1]],random.effects(rats2)[[1]])

Thanks


Because coxph models the death rate, so large coefficients indicate bad news, and lme is 
modeling the surival time where large coefficients are good news.


Terry T

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] p value from lmekin()

2012-09-06 Thread Terry Therneau



On 09/06/2012 05:00 AM, r-help-requ...@r-project.org wrote:

Hi, R experts

   I am currently using lmekin() function in coxme package to fit a
mixed effect model for family based genetic data. How can I extract the p
value from a lmekin object?  When I print the object in R console, I can
see the p value and Z value are just over there. But I can not extract them
by the coef() function. kinfit$coefficient$fixed (kinfit is the name of the
lmekin object) just include the intercept and the value of fixed effects.
Where are p and Z values?

  Thank you!



Similar to lme and lmer, lmekin and coxme have
fixef(fit)  returns the coefficients of the fixed effects
vcov(fit)   returns the variance-covariance matrix of the fixed effects

so fixef(fit)/ sqrt(diag(vcov(fit))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Coxph not converging with continuous variable

2012-09-03 Thread Terry Therneau



On 09/03/2012 05:00 AM, r-help-requ...@r-project.org wrote:

The coxph function in R is not working for me when I use a continuous predictor 
in the model. Specifically, it
 fails to converge, even when bumping up the number of max iterations 
or setting reasonable initial values.
 The estimated Hazard ratio from the model is incorrect (verified by 
an AFT model). I've isolated it to the x1
 variable in the example below, which is log-normally distributed. The 
x1 here has extreme values, but I've
 been able to reproduce the problem intermittently with less extreme 
values. It seemed odd that I couldn't find
 this question asked anywhere, so I'm wondering if I'm just not seeing 
a mistake I've made.

 
Alex Keil
UNC Chapel Hill


Congratulations-- it's been a long time since someone managaed to break
the iteration code in coxph.

I used your example, but simplifed to using n=1000 and a 1 variable 
model.  The quantiles of your x1 variable are

 round(quantile(xx, c(0, 5:10/10)),2)
   0%   50%   60%   70%   80%   90%  100%
 0.06  2.67  3.75  5.74  8.93 15.04 98.38

For a coefficient of 1 (close to the real solution) you have one subject 
with a risk of death that 999 times the average risk (he should die 
before his next heartbeat) and another with relative risk of 1.99e-40 
(an immortal).  The key components of a Cox model iteration are, it 
turns out, weighted means and variances.  For this data 99.99929 % of 
the weight falls on a single observation, i.e., at beta=1 you have an 
effective sample size of 1. The optimal coefficient is the one that best 
predicts that single subject's death time.


  Due to the computational round off error that results, the routine 
takes a step of size 1.7 from a starting estimate of 1.0 when it should 
take a stop of size of about .05, then falls into step halving to 
overcome the mistake.  Rinse and repeat.


I could possibly make coxph resistant to this data set, but at the cost 
of a major rewrite and significantly slower performance.  Can you 
convince me that this data set has any relevance?


Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] basehaz() in package survival and warnings with coxph

2012-08-10 Thread Terry Therneau
Since fit3.1 and fit2 are based on different data sets, why would I 
expect the same number of events?
Also, when you have a large number of variables, are observations being 
deleted due to missing values?

And to echo David W's comments -- it is hard for me to imagine a data 
set where this many variables can be looked at simultaneoulsy, and 
obtain a meaningful result.

Terry Therneau


On 08/09/2012 07:52 PM, Nasib Ahmed wrote:
 My sessionInfo is as follows:

 R version 2.15.1 (2012-06-22)
 Platform: x86_64-unknown-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_GB.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_GB.UTF-8LC_COLLATE=en_GB.UTF-8
  [5] LC_MONETARY=en_GB.UTF-8LC_MESSAGES=en_GB.UTF-8
  [7] LC_PAPER=C LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

 attached base packages:
 [1] splines   stats graphics  grDevices utils datasets  methods
 [8] base

 other attached packages:
  [1] mi_0.09-16   arm_1.5-05   foreign_0.8-50   abind_1.4-0
  [5] R2WinBUGS_2.1-18 coda_0.15-2  lme4_0.99-0  Matrix_1.0-6
  [9] lattice_0.20-6   car_2.0-12   nnet_7.3-4   MASS_7.3-20
 [13] MuMIn_1.7.11 survival_2.36-14

 loaded via a namespace (and not attached):
 [1] grid_2.15.1   nlme_3.1-104  stats4_2.15.1
 


 It will be difficult to reproduce an example here as the data set I am 
 using in very large. I can give you an example:

 fit3.1- coxph(formula = y ~ sex + ns(ageyrs, df = 2) + AdmissionSource +
 + X1 + X2 + X3 + X5 + X6 + X7 + X11 + X12 + X13 + X14 + X15 +
 + X16 + X17 + X18 + X19 + X20 + X22 + X24 + X25 + X26 + X27 +
 + X28 + X29 + X32 + X33 + X35 + X38 + X39 + X40 + X41 + X42 +
 + X43 + X44 + X47 + X49 + X53 + X54 + X55 + X58 + X59 + X62 +
 + X68 + X69 + X78 + X80 + X81 + X84 + X85 + X86 + X93 + X95 +
 + X98 + X100 + X101 + X102 + X105 + X107 + X108 + X109 + X110 +
 + X112 + X113 + X114 + X115 + X116 + X117 + X121 + X122 + X125 +
 + X127 + X128 + X129 + X131 + X132 + X133 + X134 + X138 + X140 +
 + X143 + X145 + X146 + X148 + X150 + X151 + X153 + X157 + X158 +
 + X159 + X164 + X197 + X200 + X202 + X203 + X204 + X205 + X211 +
 + X214 + X217 + X224 + X228 + X233 + X237 + X244 + X249 + X254 +
 + X258 + X259 + X260 + CharlsonIndex + ethnic + day + season +
 + ln, data = dat2)

 haz-basehaz(fit3.1) # gives 507 unique haz$time, time points

 fit2-coxph(y~ns(ageyrs,df=2)+day+ln+sex+AdmissionSource+season+CharlsonIndex,data=dat1)

 haz-basehaz(fit2) # gives 611 unique haz$time, time points


 I get the following warnings() with fit3.1:
 Warning message:
 In fitter(X, Y, strats, offset, init, control, weights = weights,  :
   Loglik converged before variable   ; beta may be infinite.

 Also the coefficients of the variables that the error occurs for are 
 very high. The Wald test suggests dropping these terms where as the 
 LRT suggests keeping them. What should I do in terms of model selection?








 On Thu, Aug 9, 2012 at 2:00 PM, Terry Therneau thern...@mayo.edu 
 mailto:thern...@mayo.edu wrote:

 I've never seen this, and have no idea how to reproduce it.
 For resloution you are going to have to give me a working example
 of the failure.

 Also, per the posting guide, what is your sessionInfo()?

 Terry Therneau

 On 08/09/2012 04:11 AM, r-help-requ...@r-project.org
 mailto:r-help-requ...@r-project.org wrote:

 I have a couple of questions with regards to fitting a coxph
 model to a data
 set in R:

 I have a very large dataset and wanted to get the baseline
 hazard using the
 basehaz() function in the package : 'survival'.
 If I use all the covariates then the output from basehaz(fit),
 where fit is
 a model fit using coxph(), gives 507 unique values for the
 time and the
 corresponding cumulative hazard function. However if I use a
 subset of the
 varaibles, basehaz() gives 611 values for the time and
 cumulative hazard.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] basehaz() in package survival and warnings with coxph

2012-08-09 Thread Terry Therneau

I've never seen this, and have no idea how to reproduce it.
For resloution you are going to have to give me a working example of the 
failure.


Also, per the posting guide, what is your sessionInfo()?

Terry Therneau

On 08/09/2012 04:11 AM, r-help-requ...@r-project.org wrote:

I have a couple of questions with regards to fitting a coxph model to a data
set in R:

I have a very large dataset and wanted to get the baseline hazard using the
basehaz() function in the package : 'survival'.
If I use all the covariates then the output from basehaz(fit), where fit is
a model fit using coxph(), gives 507 unique values for the time and the
corresponding cumulative hazard function. However if I use a subset of the
varaibles, basehaz() gives 611 values for the time and cumulative hazard.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Building a web risk calculator based on Cox, PH--definitive method for calculating probability?

2012-07-18 Thread Terry Therneau

Here is an example of how to do it.
 library(survival)
 vfit - coxph(Surv(time, status) ~ celltype + trt, data=veteran)

 userinput - data.frame(celltype=smallcell, trt = 1)
 usercurve - survfit(vfit, newdata=userinput)  #the entire predicted 
survival curve

 user2 - summary(usercurve, time= 2*365.25)# 2 year time point
 user2$surv
[1] 0.0007438084

Some comments:
 1. The summary function for survival curves was written so that people 
could print out shorter summaries, but it works nicely for picking off a 
particular time point.  Since the curve is a step function and there 
isn't likely to be a step at exactly x years, this is a bit more work 
to do yourself.  You might want to include the confidence limits in your 
web report as well.


 2. Put the whole formula into your coxph call.  I have never ever 
understood why people use the construct

 tempvar - with(data, Surv(time, status))
 coxph(tempvar ~ age + sex + 
It leaves you with harder to read code, poorer documentation (the 
printout from coxph no longer shows the actual response variable), leads 
to hard-to-diagnose failures for certain uses of predict, ... the list 
goes on.  I have not yet thought of a single good reason for doing it 
other than because you can.


 3. Make the user data the same as the original.  In the veteran cancer 
data set trt is a numeric 0/1 variable, you had it as a factor in the 
new data set.


 4. Your should get your keyboard fixed -- it appears that the spacebar 
is disabled when writing code  :-)


 5. If you plot the survival curve for the veterans cancer data set it 
only reaches to about 2 1/2 years, so the summary for 5 years will 
return NULL.


Terry Therneau

On 07/18/2012 05:00 AM, r-help-requ...@r-project.org wrote:

I am a medical student and as a capstone for my summer research project I am
going to create a simple online web calculator for users to input their
relevant data, and a probability of relapse within 5 years will be computed
and returned based on the Cox PH model I have developed.

The issue I'm having is finding a definitive method/function to feed the
user's newdata and return the probability of relapse within 5 years.  I
have googled this and the answers seems to be inconsistent; I have variously
seen people recommend survest(), survfit(), and predict.coxph().  Terry had
a useful answer
http://r.789695.n4.nabble.com/how-to-calculate-predicted-probability-of-Cox-model-td4515265.html
here  but I didn't quite understand what he meant in his last sentence.

Here is some code for you to quickly illustrate what you suggest.

library(rms)
library(survival)
library(Hmisc)
data(veteran)
dd=datadist(veteran)
options(datadist='dd')
options(digits=4)
obj=with(veteran,Surv(time,status))
vetcoxph=coxph(obj~celltype+trt,data=veteran)#I will fit models from
both the survival and rms packages so you can
#use what you like
vetcph=cph(obj~celltype+trt,data=veteran,surv=TRUE,time.inc=5*365,x=T,y=T)
#let's say the user inputted that their cell type was smallcell and their
treatment was 1.
userinput=data.frame(celltype='smallcell',trt=factor(1))

I really appreciate your recommendations

Best,
Jahan


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Power analysis for Cox regression with a time-varying covariate

2012-07-18 Thread Terry Therneau
Marc gave the referencer for Schoenfeld's article.  It's actually quite 
simple.


Sample size for a Cox model has two parts:
 1. Easy part: how many deaths to I need

  d = (za + zb)^2 / [var(x) * coef^2]

  za = cutoff for your alpah, usually 1.96 (.05 two-sided)
  zb = cutoff for power, often 0.84 = qnorm(.8) = 80% power
  var(x) = variance of the covariate you are testing.  For a yes/no 
variable like treatment this would be p(1-p) where p = fraction on the 
first arm
  coef = the target coefficient in your Cox model.  For an 
increase in survival of 50% we need exp(coef)=1.5 or coef=.405


All leading to the value I've memorized by now of (1.96 + 0.84)^2 /(.25* 
.405^2) = 191 deaths for a balanced two arm study to detect a 50% 
increase in survival.


 2. Hard part: How many patients will I need to recruit, over what 
interval of time, and with how much total follow-up to achieve this 
number of events?
   I never use the canned procedures for sample size because this 
second part is so study specific.  And frankly, it's always a 
guesstimate.  Death rates for a condidtion will usually drop by 1/3 as 
soon as you start enrolling subjects.


Terry T.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-help Digest, Vol 113, Issue 13

2012-07-10 Thread Terry Therneau

http://www.ncbi.nlm.nih.gov/pubmed/21418051  for the full reference.
I don't have an electronic copy, but I do have that issue of Biometrics 
in my office.  I'll have a copy sent over.


Terry


On 07/10/2012 04:08 PM, r-help-requ...@r-project.org wrote:

Send R-help mailing list submissions to
r-help@r-project.org

To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/r-help
or, via email, send a message with subject or body 'help' to
r-help-requ...@r-project.org

You can reach the person managing the list at
r-help-ow...@r-project.org

When replying, please edit your Subject line so it is more specific
than Re: Contents of R-help digest...


Today's Topics:

1. how can I show the xlab and ylab information while using
   layout (Jie Tang)
2. Re: how can I show the xlab and ylab information while using
   layout (Sarah Goslee)
3. Re: Extracting arithmetic mean for specific values from
   multiple .txt-files (Rui Barradas)
4. Re: How to use external image with R plot? (Michael Sumner)
5. Re: is it possible to insert a figure into into another new
   figure by r script (Thomas Adams)
6. image.plot transparent? (Chris82)
7. Re: How to add marker in Stacked bar plot? (Jorge I Velez)
8. Re: Predicted values for zero-inflated Poisson (Laura Lee)
9. Re: how can I show the xlab and ylab information while using
   layout (Sarah Goslee)
   10. Re: image.plot transparent? (Sarah Goslee)
   11. RGB components of plot() colours ( (Ted Harding))
   12. Re: number of decimal places in a number? (Martin Ivanov)
   13. define stuff to be only usable in the same file
   (Jessica Streicher)
   14. fill 0-row data.frame with 1 line of NAs (Liviu Andronic)
   15. Re: RGB components of plot() colours (Duncan Murdoch)
   16. Re: define stuff to be only usable in the same file
   (Duncan Murdoch)
   17. Re: RGB components of plot() colours (Sarah Goslee)
   18. Re: Predicted values for zero-inflated Poisson (Achim Zeileis)
   19. Re: fill 0-row data.frame with 1 line of NAs (Rui Barradas)
   20. Re: Plotting rpart trees with long list of class members
   (Jean V Adams)
   21. Re: image.plot transparent? (Prof Brian Ripley)
   22. customize packages' help index ( 00index.html file )
   (Damien Georges)
   23. identify.hclust() doesn't cut tree at the vertical position
   of the mouse pointer (WATSON Mick)
   24. Use of Sappy and Tappy for Mathematical Calculation (Rantony)
   25. fitting power growth (Thomas Hoffmann)
   26. Mac OS X R uninstallation question (Alastair)
   27. Count of elements in coulmns of a matrix (Rantony)
   28. Re: Questions about doing analysis based on time (APOCooter)
   29. gdata: Problem reading excel document containing non-US
   characters (=?iso-8859-1?Q?Rolf_Marvin_B=F8e_Lindgren?=)
   30. Re: Count of elements in coulmns of a matrix (Sarah Goslee)
   31. Re: define stuff to be only usable in the same file
   (Jessica Streicher)
   32. Re: Mac OS X R uninstallation question (Prof Brian Ripley)
   33. Re: fill 0-row data.frame with 1 line of NAs (Peter Ehlers)
   34. Re: define stuff to be only usable in the same file
   (Jessica Streicher)
   35. estimation of NA by predict command (eliza botto)
   36. Re: fill 0-row data.frame with 1 line of NAs (Brian Diggs)
   37. Help with vectors and rollapply (Raghuraman Ramachandran)
   38. Re: multiple comparisons with generalised least squares (Ariel)
   39. RGL 3D curvilinear shapes (PatGauthier)
   40. Re: estimation of NA by predict command (arun)
   41. Re: Count of elements in coulmns of a matrix (arun)
   42. Re: Help with vectors and rollapply (William Dunlap)
   43. Re: Predicted values for zero-inflated Poisson (Laura Lee)
   44. Re: Use of Sappy and Tappy for Mathematical Calculation
   (Nordlund, Dan (DSHS/RDA))
   45. Re: Questions about doing analysis based on time (Rui Barradas)
   46. Re: Extracting arithmetic mean for specific values from
   multiple .txt-files (vimmster)
   47. Revolutions blog: June Roundup (David Smith)
   48. calculating the difference between days? (C W)
   49. Re: R to winbugs interface (Uwe Ligges)
   50. Re: Predicted values for zero-inflated Poisson
   (Highland Statistics Ltd)
   51. Re: Extracting arithmetic mean for specific values from
   multiple .txt-files (Rui Barradas)
   52. Re: fill 0-row data.frame with 1 line of NAs (Peter Ehlers)
   53. Re: fill 0-row data.frame with 1 line of NAs (Rui Barradas)
   54. Re: Predicted values for zero-inflated Poisson (Laura Lee)
   55. Re: Specify model with polynomial interaction terms up to
   degree n (YTP)
   56. Re: Use of Sappy and Tappy for Mathematical Calculation (arun)
   57. problem for installing rgdal (stanislas rebaudet)
   58. HELP me please with import of csv to R (F86)
   59. R code help to change table format (peziza)
   60. Re: Skipping lines and incomplete rows (arun)
   

Re: [R] differences between survival models between STATA and R

2012-07-09 Thread Terry Therneau
Without more information, we can only guess what you did, or what you 
are seeing on the page that is different.


I'll make a random guess though.  There are about 5 ways to paramaterize 
the Weibull distribution.  The standard packages that I know, however, 
tend to use the one found in the Kalbfleisch and Prentice book The 
Statistical Analysis of Failure time Data.  This includes the survreg 
funciton in R and lifereg in SAS, and likely stata tthought I don't know 
that package.  The aftreg function in the eha package uses something 
different.


About 1/2 the weibull questions I see are due to a change in parameters.

Terry T.

 begin included message -



Dear Community,

I have been using two types of survival programs to analyse a data set.

The first one is an R function called aftreg. The second one an STATA
function called streg.

Both of them include the same analyisis with a weibull distribution. Yet,
results are very different.

Shouldn't the results be the same?

Kind regards,
J

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to specify newdata in a Cox-Modell with a time dependent interaction term?

2012-06-26 Thread Terry Therneau
I'm finally back from vacation and looking at your email.

1. The primary mistake is in your call, where you say
 fit - survfit(mod.allison.5, newdata.1, id=Id)

This will use the character string Id as the value of the identifier, 
not the data.  The effect is exactly the same as the difference between 
print(x) and print('x').

2. In reply to John's comment that all the id values are the same.  It 
is correct.  Normally the survfit routine is used to produce multiple 
curves, one curve per line of the input data, for time-independent 
variables.   The presence of an id argument is used to tell it that 
there are multiple lines per subject in the data, e.g. time-dependent 
covariates.  So even though there is only one curve being produced we 
need an id statement to trigger the behavior.
   If you only want one curve for one individual, then individual=TRUE 
is an alternate, as John pointed out.

3. It's very important to specify the Surv object and the formula 
directly in the coxph function ...
Yes, I agree.  I always use your suggested form because it gives better 
documentation -- variable names are directly visible in the coxph call.  
I don't understand the attraction of the other form, but lot's of people 
use it.
Why did it go wrong?  Because the survfit function was evaluating
 Surv(Rossi.2$start, Rossi.2$stop, Rossi.2$arrest.time) ~ fin + 
age + age:stop + pro, data=newdata.1

The length of the variables will be different.  The error message comes 
from the R internals, not my program.

Terry Therneau


On 06/16/2012 08:04 AM, Jürgen Biedermann wrote:

 Dear Mr. Therneau, Mr. Fox, or to whoever, who has some time...

 I don't find a solution to use the survfit function (package:
 survival)  for a defined pattern of covariates with a Cox-Model
 including a time dependent interaction term. Somehow the definition of
 my newdata argument seems to be erroneous.
 I already googled the problem, found many persons having the same or a
 similar problem, but still no solution.
 I want to stress that my time-dependent covariate does not depend on the
 failure of an individual (in this case it doesn't seem sensible to
 predict a survivor function for an individual). Rather one of my effects
 declines with time (time-dependent coefficient).

 For illustration, I use the example of John Fox's paper Cox
 Proportional - Hazards Regression for Survival Data.
 http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf

 Do you know any help? See code below

 Thanks very much in advance
 Jürgen Biedermann

 #
 #Code

 Rossi -
 read.table(http://cran.r-project.org/doc/contrib/Fox-Companion/Rossi.txt;,
 header=T)

 Rossi.2 - fold(Rossi, time='week',
  event='arrest', cov=11:62, cov.names='employed')

 # see below for the fold function from John Fox

 # modeling an interaction with time (Page 14)

 mod.allison.5 - coxph(Surv(start, stop, arrest.time) ~
  fin + age + age:stop + prio,
  data=Rossi.2)
 mod.allison.5

 # Attempt to get the survivor function of a person with age=30, fin=0
 and prio=5

 newdata.1 -
 data.frame(unique(Rossi.2[c(start,stop)]),fin=0,age=30,prio=5,Id=1,arrest.time=0)
 fit - survfit(mod.allison.5,newdata.1,id=Id)

 Error message:

 Fehler in model.frame.default(data = newdata.1, id = Id, formula =
 Surv(start,  :
Variablenlängen sind unterschiedlich (gefunden für '(id)')

 -- failure, length of variables are different.

 #-
 fold - function(data, time, event, cov,
  cov.names=paste('covariate', '.', 1:ncovs, sep=),
  suffix='.time', cov.times=0:ncov, common.times=TRUE, lag=0){
  vlag - function(x, lag) c(rep(NA, lag), x[1:(length(x)-lag)])
  xlag - function(x, lag) apply(as.matrix(x), 2, vlag, lag=lag)
  all.cov - unlist(cov)
  if (!is.list(cov)) cov - list(cov)
  ncovs - length(cov)
  nrow - nrow(data)
  ncol - ncol(data)
  ncov - length(cov[[1]])
  nobs - nrow*ncov
  if (length(unique(c(sapply(cov, length), length(cov.times)-1)))  1)
  stop(paste(
  all elements of cov must be of the same length and \n,
  cov.times must have one more entry than each element of
 cov.))
  var.names - names(data)
  subjects - rownames(data)
  omit.cols - if (!common.times) c(all.cov, cov.times) else all.cov
  keep.cols - (1:ncol)[-omit.cols]
  nkeep - length(keep.cols)
  if (is.numeric(event)) event - var.names[event]
  times - if (common.times) matrix(cov.times, nrow, ncov+1, byrow=T)
  else data[, cov.times]
  new.data - matrix(Inf, nobs, 3 + ncovs + nkeep)
  rownames - rep(, nobs)
  colnames(new.data) - c('start', 'stop', paste(event, suffix, 
 sep=),
  var.names[-omit.cols], cov.names)
  end.row - 0
  for (i in 1:nrow){
  start.row - end.row + 1
  end.row - end.row + ncov
  start - times

Re: [R] How to specify newdata in a Cox-Modell with a time dependent interaction term?

2012-06-18 Thread Terry Therneau
I've been out for a week, with one more to go.  I'll look at this in 
earnest when I return.
  Terry T

On 06/17/2012 04:07 AM, Jürgen Biedermann wrote:
 Dear John,

 Thank you very much for your help! It was very important for getting 
 along further.

 I found out some additional things which I want to give back to you as 
 feedback (maybe other persons will have the same problems).

 It's also possible to leave away the individual=T argument.
 Instead you can use:

 newdata.1- 
 data.frame(unique(Rossi.2[c(start,stop)]),fin=0,age=30,prio=5,Id=1,arrest.time=0)
 fit - survfit(mod.allison.5, newdata.1, id=Id) *but not* fit - 
 survfit(mod.allison.5, newdata.1, id=Id)

 so you have to leave away the  in specifying the id variable -- 
 maybe the help file could be more precise here.

 Additionally I found out some more things while dealing with my 
 original data.

 It's very important to specify the Surv object and the formula 
 directly in the coxph function, *so don't do this:*

 SurvO - Surv(Rossi.2$start, Rossi.2$stop, Rossi.2$arrest.time)
 form - as.formula(SurvO ~ fin + age + age:stop + prio)
 mod.allison.5- coxph(form,
   data=Rossi.2)
 mod.allison.5

 newdata.1- 
 data.frame(unique(Rossi.2[c(start,stop)]),fin=0,age=30,prio=5,arrest.time=0)
 fit - survfit(mod.allison.5, newdata.1, individual=T)
  Fehler in model.frame.default(data = newdata.1, formula = SurvO ~ 
 fin +  :
   Variablenlängen sind unterschiedlich (gefunden für 'fin')
 -- Error, length of variables are different

 I think the error message is not optimal here, anyway.

 The problem obviously consists in the survfit-function not recognizing 
 the names of the variables anymore.
 It seems completely logic, but anyway, it took me many hours.

 Thanks again,
 Jürgen



 - Korrespondenz --

 Betreff: Re: How to specify newdata in a Cox-Modell with a time 
 dependent interaction term?
 Von: John Fox j...@mcmaster.ca
 An: J?rgen Biedermann juergen.biederm...@googlemail.com
 Datum: 16.06.2012 17:22
 Dear Jurgen,

 fit- survfit(mod.allison.5, newdata.1, individual=TRUE)
 fit
 Call: survfit(formula = mod.allison.5, newdata = newdata.1, individual = 
 TRUE)

 records   n.max n.start  events  median 0.95LCL 0.95UCL
19809 432 432 114  NA  NA  NA

 plot(fit)  # shows one survival curve

 See ?survfit.coxph.

 Best,
   John


 On Sat, 16 Jun 2012 16:23:41 +0200
   Jürgen Biedermannjuergen.biederm...@googlemail.com  wrote:
 Dear John,

 Thank you very much for the quick answer. I will use the mentioned citation 
 of your book in the future.

 However:
 The Id argument is necessary for the function to know that the different 
 rows belong to one individual.

 Or how the Help-File says:

 id:
 optional variable name of subject identifiers. If this is present, then 
 each group
 of rows with the same subject id represents the covariate path through time 
 of
 a single subject, and the result will contain one curve per subject. If the 
 coxph
 fit had strata then that must also be specified in newdata. If missing, 
 then each
 individual row of newdata is presumed to represent a distinct subject and 
 there
 will be nrow(newdata) times the number of strata curves in the result (one 
 for
 each strata/subject combination).

 I tried your proposal anyway, but, as expected, fifty different survival 
 curves appeared.

 So, there is still no solution.

 Best regards
 Jürgen

 -- 
 ---
 Jürgen Biedermann
 Bergmannstraße 3
 10961 Berlin-Kreuzberg
 Mobil: +49 176 247 54 354
 Home: +49 30 940 53 044
 e-mail:juergen.biederm...@gmail.com


 - Korrespondenz --

 Betreff: Re: How to specify newdata in a Cox-Modell with a time dependent 
 interaction term?
 Von: John Foxj...@mcmaster.ca
 An: J?rgen Biedermannjuergen.biederm...@googlemail.com
 Datum: 16.06.2012 15:55
 Dear Jürgen,

 All the values of your Id variable are equal to 1; how can that make 
 sense? Removing the argument Id=id may get you what you want.

 By the way, the document to which you refer was an appendix to a 2002 book 
 that has been superseded by Fox and Weisberg, An R Companion to Applied 
 Regression, Second Edition (Sage, 2011). Appendices for that book are 
 available 
 athttp://socserv.socsci.mcmaster.ca/jfox/Books/Companion/appendix.html.

 I hope this helps,
John

 
 John Fox
 Sen. William McMaster Prof. of Social Statistics
 Department of Sociology
 McMaster University
 Hamilton, Ontario, Canada
 http://socserv.mcmaster.ca/jfox/

 On Sat, 16 Jun 2012 15:04:48 +0200
Jürgen Biedermannjuergen.biederm...@googlemail.com   wrote:
 Dear Mr. Therneau, Mr. Fox, or to whoever, who has some time...

 I don't find a solution to use the survfit function (package: survival) 
  for a defined pattern of covariates with a Cox-Model including a time 
 dependent interaction term. Somehow the definition of my newdata 
 argument seems to be erroneous.
 I 

Re: [R] (bug?) in survfit in R-2-15.0

2012-05-17 Thread Terry Therneau
The predict methods are designed to work with a dataframe.  In the case 
of survfit(cox model) the code has, for backwards compatability 
purposes, the ability to use a vector as the newdata argument.  This 
feature is not documented in the help file, on purpose, and is expected 
to go away one day.  This backwards compatatibilty works in all my 
test cases, but not in the one you present.  This may accelerate my 
plans for removal.


  The best solution for you is to use a dataframe for both the coxph 
fit and the predict call.  Avoid using objects that are in the global 
data or are attached.  Below I reprise your example in safer code.  The 
coxph/survfit calls both work from a temporary data frame called 
dummy.  Your use of the coxph/survfit pair to get a survival curve for 
glmnet is clever, by the way.  I like it.


library(glmnet)
library(survival)
load(system.file(doc,VignetteExample.rdata,package=glmnet))

fit - with(patient.data, glmnet(x[-1,], Surv(time[-1], status[-1]),
 family=cox, alpha=.05, maxit=1))
max.dev.index - which.max(fit$dev.ratio)
optimal.beta - fit$beta[, max.dev.index]
nonzero.coef - (optimal.beta != 0)

dummy - with(patient.data, data.frame(time=time, status=status, 
x=x[,nonzero.coef]))

coxfit - coxph(Surv(time, status) ~ ., data=dummy, subset= -1,
iter=0, init=optimal.beta[nonzero.coef])
sfit - survfit(coxfit, newdata=dummy[1,])

Terry Therneau

 begin included message ---

Dear all,

I am using glmnet + survival and due to the latest
release of glmnet 1.7.4 I was forced to use the latest version of R 2.15.0.
My previous version of R was 2.10.1. I changed glmnet version and R
version and when I started to get weird results I was not sure where the 
bug was.


After putting glmnet back to previous version, I have
found that the bug or weird behaviour happens in survival survfit. My 
script is

below in email and it is uses glmnet 1.7.3. I get two completely different
answers in different versions of R and in my opinion the older version of
survival package returns correct value.

in R 2-10-1 I get

?
cat(dim(sfit$surv))

?

?
cat(length(sfit$surv))

32

?

?

in R-2-15-0 I get

cat(dim(sfit$surv))

62 99

?
cat(length(sfit$surv))

6138

?

?Kind regardsDK


library(survival)

library(glmnet)

load(system.file(doc,VignetteExample.rdata,package=glmnet))

attach(patient.data)

trainX?
-x[-1,]

trainTime??
-time[-1]

trainStatus - status[-1]

fit -

glmnet(trainX,Surv(trainTime,trainStatus),family=cox,alpha=0.5,maxit=1)

max.dev.index
- which.max(fit$dev.ratio)

optimal.lambda - fit$lambda[max.dev.index]
optimal.beta? -
fit$beta[,max.dev.index] nonzero.coef - abs(optimal.beta)0 selectedBeta
- optimal.beta[nonzero.coef]

selectedTrainX??
- trainX[,nonzero.coef]

coxph.model- coxph(Surv(trainTime,trainStatus)~

selectedTrainX,init=selectedBeta,iter=0)

selectedTestX - x[1,nonzero.coef]

sfit- survfit(coxph.model,newdata=selectedTestX)

cat(\ndim(sfit$surv))

cat(dim(sfit$surv))

cat(\nlength(sfit$surv))

cat(length(sfit$surv))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lmekin bug

2012-05-15 Thread Terry Therneau
Version 2.2-3 of the coxme package has been posted to CRAN. It should 
propogate to the mirrors over the next 1-2 days.


Version 2.2-2 had a serious bug in the lmekin function.  If the variance 
matrix of the random effects was not diagonal the answers were wrong: at 
one spot an upper triangular matrix U was being used when tr(U) was 
called for.  This affects genetic models in particular.  A new test case 
with known solution has now been added as to the package as well.


 There was no issue with any of the other functions, e.g., coxme.

This pointed out to me by Paola Sebastiani just after 2-2 was posted and 
just before I left on a trip.  I really appreciate the bug report, but 
the timing was clearly Murphy's law at work.


Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Estimating survival times with glmnet and coxph

2012-05-07 Thread Terry Therneau
 What you are doing is correct.  However‹ because it is not informed about
the penalty parameters used in creating the fit, the coxph command is not
able to correctly reconstruct the variance matrix; this is the source of all
of the ³singular² messages.  The confidence bands for the survival curves
will be incorrect (likely too wide), but the curves themselves are fine.

Terry Therneau


 begin included message 

Dear all,

I am using glmnet (Coxnet) for building a Cox Model and
to make actual prediction, i.e. to estimate the survival function S(t,Xn)
for a
new subject Xn. If I am not mistaken, glmnet (coxnet) returns beta, beta*X
and
exp(beta*X), which on its own cannot generate S(t,Xn). We miss baseline
survival function So(t).

Below is my code which takes beta coefficients from
glmnet and creates coxph object (iter=0, init=beta) in order to later
calculate
survival estimates with survfit.

I would be grateful if someone could confirm that my
solution with coxph object (iter=0, init=beta) is correct and that the
warning
I get 'X matrix deemed to be singular' when creating a coxph object is of no
concern. Below is my code which uses example from Coxnet: Regularized Cox
Regression.

Thanks in advance.
DK
...

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] need help with avg.surv (Direct Adjusted Survival Curve), Message-ID:

2012-04-30 Thread Terry Therneau
 Well, I would suggest using the code already in place in the survival 
package.  Here is my code for your problem.
I'm using a copy of the larynx data as found from the web resources for 
the Klein and Moeschberger book.


larynx - read.table(larynx.dat, skip=12,
 col.names=c(stage, time, age, year, death))
larynx$stage - factor(larynx$stage)
larynx$age.grp - cut(larynx$age, c(0, 55, 70, 90),
  labels=c(40-55, 55-70, 70-90))

fit1 - coxph(Surv(time, death) ~  age.grp + stage + I(year =74), 
larynx) #fit a Cox model

km - survfit(Surv(time, death) ~ stage, data=larynx)# KM
plot(km)

direct - survexp( ~stage, data=larynx, ratetable=fit1)  # direct 
adjusted survival

lines(direct, col=2)

---
 A few further comments
  1. It would help those of us who advise if you would simplify your 
code.  I didn't need to see 20+ lines of data set manipulation and curve 
drawing.  Give the essence of the problem.
  2. The direct estimate above was first created for expected 
survival curves based on population rate tables and publised by Edererer 
in 1961.  The same idea was rediscovered in the context of the Cox model 
and renamed the direct adjusted estimate; I like to give credit to the 
oridingal.

  3. I did not try to debug your function.


Terry Therneau


---  begin included message ---
Hello R users,

I am trying to obtain a direct adjusted survival curve. I am sending my 
whole code (see below). It's basically the larynx cancer data with Stage 
1-4.I am using the cox model using coxph option, see the fit3 coxph. 
When I use the avg.surv option on fit3, I get the following error: 
fits-avg.surv(fit3, var.name=stage.fac, var.values=c(1,2,3,4), 
data=larynx)

Error in `contrasts-`(`*tmp*`, value = contrasts.arg[[nn]]) :
  contrasts apply only to factors

If try using var.values = c (1,2,3,4) option, I get the 
following error:

Error in xmat %*% cfit$coef : non-conformable arguments
In addition: Warning message:
In model.matrix.default(Terms, mf, contrasts = contrast.arg) :
  variable 'stage.fac' converted to a factor

Please advise me on what I am doing wrong!

Regards,
Manish Dalwani
University of Colorado

library(survival)
library(ggplot2)

setwd(/Users/manishdalwani/surv_6646/)
#file.show(Larynx.txt)

#Stage of disease (1=stage 1, 2=stage2, 3=stage 3, 4=stage 4)
#Time to death or on-study time, months
#Age at diagnosis of larynx cancer
#Year of diagnosis of larynx cancer
#Death indicator (0=alive, 1=dead)

larynx -read.table(file =larynx.dat,col.names 
=c(stage,time,age,year,death),colClasses 
=c(factor,numeric,numeric,numeric,numeric))

#the death indicator needs to be numeric for the survfit function to work

# splitting age into three intervals: 55, 50-70, 70
age.larynx - rep(0, length(larynx[,1]))
age.larynx [larynx$age  55] - 1
age.larynx [larynx$age = 55  larynx$age = 70] - 2
age.larynx [larynx$age 70] - 3
larynx - cbind(larynx, age.larynx)
larynx$age.larynx - factor(larynx$age.larynx, labels=c(1, 2, 3))
larynx$stage.fac-as.factor(larynx$stage)

year.larynx - rep(0, length(larynx[,1]))
year.larynx [larynx$year = 74] - 1
year.larynx [larynx$year  74] - 2
larynx - cbind(larynx, year.larynx)

head(larynx)
summary(larynx)

kapmeier.fit - survfit( Surv(time, death) ~ stage, data = larynx, type 
= kaplan-meier)

sumkap - summary(kapmeier.fit)
attributes(kapmeier.fit)
attributes(sumkap)
plot(kapmeier.fit, lty=2:3, fun=cumhaz,
xlab=Months,ylab=Cumulative Hazard of death)
legend(4,.4,c(stage1,stage2,stage3,stage4),lty=1:2)

plot(kapmeier.fit, lty=2:3,
xlab=Months,ylab=Survival Function)
#construct a data frame for the plots
plotdata - data.frame(time = sumkap$time, surv = sumkap$surv, strata = 
sumkap$strata)

fact.stage-factor(larynx$stage)
fit1-coxph(Surv(time, death) ~ fact.stage + age.larynx + 
factor(year=74), data=larynx, method=c(efron))

summary(fit1)
avg.surv - function(cfit, var.name, var.values, data, weights)
{
if(missing(data)){
if(!is.null(cfit$model))
mframe - cfit$model
elsemframe -model.frame(cfit,sys.parent())
}   else mframe - model.frame(cfit, data)
var.num - match(var.name, names(mframe))
data.patterns - apply(data.matrix(mframe[,  - c(1, var.num)]), 1,
paste, collapse = ,)
data.patterns - factor(data.patterns,levels=unique(data.patterns))
if(missing(weights))
weights - table(data.patterns)
else weights - tapply(weights, data.patterns, sum)
kp - !duplicated(data.patterns)
mframe - mframe[kp,]
obs.var - mframe[,var.num]
lps - (cfit$linear.predictor)[kp]
tframe - mframe[rep(1,length(var.values)),]
tframe[,var.num] - var.values
xmat - model.matrix(cfit,data=tframe)[,-1]
tlps - as.vector(xmat%*%cfit$coef)
names(tlps) - var.values
obs.off - tlps[as.character(obs.var)]
explp.off - exp(outer(lps - obs.off ,tlps,+))
bfit - survfit.coxph(cfit, se.fit = F)
fits - outer(bfit$surv,explp.off,function(x,y) x^y)
avg.fits -
   apply(sweep(fits,2,weights

Re: [R] Survreg

2012-04-23 Thread Terry Therneau



On 04/22/2012 05:00 AM, r-help-requ...@r-project.org wrote:

I am trying to run Weibull PH model  in R.

Assume in the data set  I  have x1  a continuous variable and x2 a
categorical  variable with two classes (0= sick and 1= healthy).  I fit the
model in the following way.

Test=survreg(Surv(time,cens)~ x1+x2,dist=weibull)

My questions are

1. Is it Weibull PH model or Weibull AFT model?
Call:
survreg(formula = Surv(time, delta) ~ x1 + x2, data = nn,
 dist = weibull)
 Value Std. Error  z  p
(Intercept)   5.6521553.54e-02159.8   0.00e+00
x10.4925921.92e-0225.7   3.65e-145
x2   -0.0002125.64e-06  -37.60.00e+00
Log(scale)  -0.269219 1.57e-02  -17.1   1.24e-65
Scale= 0.764
The Weibull model can be veiwed as either.  The cumulative hazard for a 
Weibull is  t^p, viewed as an AFT model we have (at)^p [multiply time], 
viewed as PH we have a(t^p) [multiply the hazard].  The survreg routing 
uses the AFT parameterization found in Chapter 2 of Kalbfleisch and 
Prentice, The statistical analysis of failure time data.


 For the routine our multiplier a above is exp(X beta), for the usual 
reason that negative multipliers should be avoided -- it would 
correspond to time running backwards.  In the above x1 makes time run 
faster, x2 time run slower.

  Terry T

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-help Digest, Vol 110, Issue 23

2012-04-23 Thread Terry Therneau
Yes, the (start, stop] formalism is the easiest way to deal with time 
dependent data.


Each individual only needs to have sufficient data to describe them, so 
for if id number 4 is in house 1, their housemate #1 was eaten at time 
2, and the were eaten at time 10, the following is sufficient data for 
that subject:


 id  house time1  time2 status discovered
 4 10   20 false
 4 12  10   1 true

We don't need observations for each intermediate time, only that from 
0-2 they were not yet discovered and that from 2-10 they were.  The 
status variable tells whether an interval ended in disaster.   Use 
Surv((time1, time2, status) on the left side of the equation.


Since the time scale is discrete you should technically use 
method='exact' in a Cox model, but the default Efron approximation will 
be very close.


Interval censoring isn't necessary.  You will have a model of time to 
discovery instead of time to eaten, but with a fixed examination 
schedule such as you have there is no information in the data to help 
you move from one to the other.  The standard interval approach would 
just assume deaths happened at the midpoint between examinations.


Terry T.

On 04/21/2012 05:00 AM, r-help-requ...@r-project.org wrote:

Dear R users,

I fear this is terribly trivial but I'm struggling to get my head around it.

First of all, I'm using the survival package in R 2.12.2 on Windows Vista with the 
RExcel plugin. You probably only need to know that I'm using survival for this.

I have data collected from 180 or so individuals that were checked 7 times 
throughout a trial with set start and end times. Once the event happens (death 
by predator) there are no more checks for that individual. This means that I 
check on each individual up to 7 times with either an event recorded or the 
final time being censored.

At the moment, I have a data sheet with one observation per individual; that is 
either the event time (the observation time when the individual had had an 
event) or the censored time. However, I'd like to add a time dependent factor 
and I also wonder if this data should be treated as interval censored.

The time dependent factor is like this. The individuals are grouped in houses and once one individual in a group has 
an event, it makes biological sense that the rest of them should be at greater risk, as the predator is likely to have discovered 
the others in the house as well (the predator is able to consume many individuals). At the moment I'm coding this as 
a normal two level factor (discovered) where all individuals alive after the first event in that house are TRUE and 
the first individuals in a house to be eaten are FALSE. All individuals in houses that were not discovered at al are 
also FALSEl. Obviously, all individuals that were eaten, were first discovered, then eaten. However, the first 
individuals in a house to be eaten, had not been previously discovered by the predator (not observably so, anyway).

Should I write up this data set with a start and stop time for every check I 
made so each individual has up to 7 records, one for each time I checked?

Is there a quick and easy way to do this in R or would I have to go through the 
data set manually?

Does coding the discovered factor the way I have, make statistical sense?

Should I worry about proportional hazards of the discovered factor? It seems 
to me that it would often turn out not proportional because of its nature.

Sorry, lots of stats questions. I don't mind if you don't answer all of these. 
Just knowing how to best feed this data into R would help me no end. The rest I 
can probably glean from the millions of survival analysis books I have lying 
about.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Test-Predict R survival analysis

2012-04-18 Thread Terry Therneau

On 04/18/2012 05:00 AM, r-help-requ...@r-project.org wrote:

Hi,

I'm trying to use the R Survival analysis on a windows 7 system.
The input data format is described at the end of this mail.

1/ I tried to perform a survival analysis including stratified variables
using the following formula.
cox.xtab_miR=coxph(Surv(time, status) ~ miR + strata(sex,nbligne, age),
data=matrix)
and obtain the following error message
Warning message:
In fitter(X, Y, strats, offset, init, control, weights = weights,  :
Ran out of iterations and did not converge

Is this due to the model (error in formula) or is the number of
stratified variables fixed?
The Cox model compares the deaths to the non-deaths, separately within 
each stratum, then adds up the result.


Your data set and model combination puts each subject into their own 
strata, so there is no one to compare them to.  The fit has no data to 
use and so must fail.  (I admit the error message is misleading, but I 
hadn't ever seen someone make this particular mistake before.)


The following model works much better

 coxph(Surv(time, status) ~ miR + age + nbligne + strata(sex))
coef exp(coef) se(coef) z  p
miR 2.75e-05  1.00 9.35e-06 2.941 0.0033
age 3.39e-03  1.00 1.01e-02 0.334 0.7400
nbligne 7.14e-02  1.07 1.32e-01 0.542 0.5900

Likelihood ratio test=5.87  on 3 df, p=0.118  n= 70, number of events= 59
   (1 observation deleted due to missingness)

Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Censoring data

2012-04-16 Thread Terry Therneau

On 04/14/2012 05:00 AM, r-help-requ...@r-project.org wrote:

Hello,
?I want to estimate weibull parameters with 30% censored data. I have below the 
code for the censoring
?but how it must be put into the likelihood equation to obtain the desire 
estimate is where i have a problem with,
?can some body help?
?My likelihood equation is for a random type-I censoring where time for the 
censored units is different for each unit.
?
n=50;r=35
p=0.8;b=1.5
t-rweibull(50,shape=p,scale=b)
meantrue-gamma(1+(1/p))*b
meantrue
d-meantrue/0.30

cen- runif(50,min=0,max=d)
cen
s-ifelse(t=cen,1,0)
s
Use the survreg function in the 'survival' package to get the 
estimates,  It uses a location-scale parameterization, read the examples 
in the manual page, help('survreg'), to see the transformation to and 
from the form used by rweibull.


Terry T.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kaplan Meier analysis: 95% CI wider in R than in SAS

2012-04-16 Thread Terry Therneau

On 04/14/2012 05:00 AM, r-help-requ...@r-project.org wrote:

Am replicating in R an analysis I did earlier using SAS. See this as a test of 
whether I'm ready to start using R in my day-to-day work.
?
Just finished replicating a Kaplan Meier analysis. Everything seems to work out 
fine except for one thing. The 95% CI around my estimate for the median is 
substantially larger in R than in SAS. For example, in SAS I have a median of 
3.29 with a 95% CI of [1.15, 5.29]. In R, I get a median of 3.29 with a 95% CI 
of [1.35,?13.35].
?
Can anyone tell me why I get this difference?



The confidence interval for the median is based on the confidence 
intervals for the curves.  There are several methods for computing 
confidence intervals for the curves: plain, log, log-log, or logit 
scale.  There are opinions on which is best, and it is a close race: 
except for the first of these.  The type plain intervals are awful, 
it's like putting me in one lane of a championship 100 meter dash.


Until about version 9 the only option in SAS was plain, then for a 
time it was still the default.  By 9.2 they finally went to loglog.


Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Survreg output - interpretation

2012-04-12 Thread Terry Therneau

--- begin included message ---
Hello R users,
I am analizing survival data (mostly uncensored) and want to extract the
most out of it.
Since I have more than one factor, I?ve read that the survival regression
can help to test the interactions between factors, and then decide how to do
the comparisons using the Log-rank test (survdiff).
1- if I chose the Weibull distribution, does the output inform the goodness
of fit to it? perhaps in this part of the output...

Weibull distribution
Loglik(model)= -1302.8   Loglik(intercept only)= -1311
Chisq= 16.49 on 11 degrees of freedom, p= 0.12
Number of Newton-Raphson Iterations: 7
n= 873

2- one of my factors is gender (2 levels). With survreg, it appears as
significant, but if I compare them with log-rank it turns not significant.
Are they comparing different things? or is it a test power issue?

--- end inclusion ---

1. To understand goodness of fit you need to look at the residuals in 
multiple ways.  (The same answer applies to ordinary linear regression.)


2. You have not given us enough information to answer the questions.  If 
the data is p=.049 vs p=.051, the the answers are in agreement even 
though the artificial label of significant changes.  The logrank test 
and survreg are not the same model.  If the data is p=.02 vs p=.8, then 
you have an error in the code.


Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to get the confidence interval of area under the time dependent roc curve

2012-04-09 Thread Terry Therneau

On 04/07/2012 05:00 AM, r-help-requ...@r-project.org wrote:

  It is possible to calculate the c-index for time dependent outcomes (such as 
disease) using the survivalROC package in R. My question is : is it possible to 
produce a p-value for the c-index that is calculated (at a specific point in 
time)?How to get the confidence interval of area under then time dependent roc 
curve (or by bootstrap)?
The current version of survival computes the c-index and it's standard 
error for time-dependent covariates.

Use summary(cox model fit) or the survConcordance function.

Terry T.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to calculate the Deviance for test data based on a Cox model

2012-03-30 Thread Terry Therneau

kevinchiang...@hotmail.com wrote:


 Dear List,

 If I got a Cox model based on training set, then how should I 
calculate the Cox log partial likelihood for the test data?
 Actually I am trying to calculate the deviance on test dataset to 
evaluate the performance of prediction model, the equation is as 
follows: D = -2{L(test)[beta_train] - L(test)[0]}. It means using the 
beta coefficients got from training set to calculate the likelihood of 
test data. I know I can got log likelihood for training model, but how 
to get it for test data?


--
  If you are attending the ENAR meeting next week, go find Cindy 
Crowson's poster which has a very nice discussion of how to evaluate a 
prior Cox model on new data.  Given the timeline for statistics 
journals, her draft paper (sitting on my desk) might yet be years away 
from print even if reviewers like it.


Terry T.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to calculate predicted probability of Cox model

2012-03-30 Thread Terry Therneau

-- begin included message ---
Because Cox proportional hazards model didn't give the baseline hazard 
function, how to calculate the predictive probability for each test 
sample at a special time point,such as 5-year or 10-year ?
In survival package,  predict.coxph() function gives three different 
type of predicted value, for the type of expected, Does it mean the 
expected number of events for a special sample during the total 
follow-up time? What's the relationship between this value and  
predicted probability?

-- end inclusion ---

The default printout for coxph does not include the baseline hazard (too 
lengthy), but it is lurking behind the scenes.


Yes, the type expected is the expected number of events (= value of 
the cumulative hazard) for a given set of covariates at a particular 
time.  Prob(event by time t) = exp(- cumulative hazard at time t).
To get a prediction of type = expected you data set has to have both 
covariate AND survival time values for each subject.


Terry T.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   4   5   >