[R] scatterplot function - double check: dashed lines

2010-06-08 Thread K F Pearce
Hello everyone,

This is just a quick double check.  It concerns the 'scatterplot function' in 
R. 

 I have 6 curves and I wish to represent each of them by a different kind of 
line (their colour must be black).

The curves are derived from the cuminc function...the coordinates of which are 
in 'xx'.

Upon reading the documentation in R, it looks like I can use the 'on/off' 
command for lty and I can merely run:

plot(xx,color=black,lty=c(11,13,1343,73,2262,35))

according to the documentation, 13 (for example) means 1 'on' and 3 'off' .

Does the above look OK ?

Say, in another study, I wish to draw my 6 lines all in different colours 
(solid lines), I suppose that I could type:

plot(xx, color=c(red,black,purple,yellow,brown,violet), lty=1)

Thanks so much for your help,
All the Best,
Kim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] FW: Pec Function in R - syntax

2009-08-25 Thread K F Pearce
 
Hello everyone - sorry, I forgot to add the syntax for my program in the last 
message:

My syntax is as follows:

library(Hmisc)
 library(survival)
 library(pec)
 AL-read.table(PredErr2.txt,header=TRUE)

 Models-list(Kaplan.Meier=survfit(Surv(OVS,dead)~1,data=AL),EBMT+SN
 Pgroup=coxph(Surv(OVS,dead)~GroupMod,method=efron,data=AL),EBMTgro
 up=coxph(Surv(OVS,dead)~GroupGrat,
 method=efron,data=AL),EBMToriginal=coxph(Surv(OVS,dead)~Gratwohl,
 method=efron,data=AL),EBMT+SNPoriginal=
 coxph(Surv(OVS,dead)~Gratwohl+PIL1rAnyC+PIL4AnyT,
 method=efron,data=AL))

 PredError.632plus-pec(object=Models,formula=Surv(OVS,dead)~1,data=AL,
 exact=TRUE,cens.model=marginal,replan=boot632plus,B=100,verbose=TR
 UE)

Thankyou so much.
Kim


Dr Kim Pearce CStat
Industrial Statistics Research Unit (ISRU) School of Mathematics and Statistics 
Herschel Building University of Newcastle Newcastle upon Tyne United Kingdom
NE1 7RU

Tel.   0044 (0)191 222 6244 (direct)
Fax.   0044 (0)191 222 8020
Email: k.f.pea...@ncl.ac.uk
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Pec function in R

2009-08-25 Thread K F Pearce
Hello everyone,

These are some questions about the 'pec' function in R.  These questions deal 
with prediction error curves and their derivation.  Prediction error curves are 
documented in, for example, Efron-type measures of prediction error for 
survival analysis by Gerds and Schumacher.

I have detailed some syntax that I have used at the bottom of this email.  The 
associated data is available upon request.

In the 'pec' function I have

formula=Surv(OVS,dead)~1

...apparently for right censored data, the RH side of the formula is used to 
specify conditional censoring models when there are no covariates (as in 
our case)...I understand that we are assuming that censoring occurs totally at 
random (for all models). 

Say we have a data set containing potential predictor variables X1,X2Xp.  
Our survival time variable (time) is measured in months and our status 
variable (status) has 0=alive and 1=dead.

Firstly, I would like to ask about how to derive conditional censoring models. 
From past conversations it seems that to establish the form of our censoring 
model(s), we would use status=event=alive =1  and status=event=death=0.  Is 
this correct?.. 

Now, say we are assuming that the model for censoring is of the cox regression 
type, do we assess the model for censoring using the usual variable selection 
procedures where candidate variables for variable selection are X1,X2Xp and 
we code *alive=1 and dead=0*?

Say we found that X1 and X5 should be included in our cox regression model for 
censoring, then we would enter:

formula=Surv(time,status)~X1+X5 and cens.model=cox  in function pec.

Am I thinking about this correctly? I think it could be more difficult than 
this, so I'd appreciate some guidance.

Replan is the method for estimating prediction error curves. I understand that 
replan=none would be used when we don't cross validate i.e. we would create a 
model using a specific bootstrap sample and then evaluate its performance on 
the same sample. 

If we are looking at bootstrap crossvalidation (OutOfBag) method...say we 
have 500 patients.it says in the paper - Gerds and Schumacher - that 
bootstrap samples Q*_1Q*_B each of size n are drawn with replacement from 
the original data (we have chosen 100 bootstrap samples  i.e. B=100 in our 
work).  

So here I assume that n is equal to the total number of patients i.e. 500 in 
this example?

When we decide to sample with replacement, as here,...I am assuming that, for 
each of the bootstrap samples, there are 500 patients but some of these 
patients could occur in this bootstrap sample more than once.

The documentation says M is the size of the bootstrap samples for sampling 
*without* replacement.

Hence am I correct in thinking that for sampling *with* replacement, M is equal 
to n ?  i.e. M=n by default.  However, if we choose Mn  then each of our 
bootstrap samples will have 'sampling without replacement' i.e. the training 
set would be comprised of n-M patients.  Am I correct?

[Each of the bootstrap samples acts as a 'training data set' to generate a 
modelthe model is then validated using the patients which weren't in the 
bootstrap sample. ]

In a study I did using a 286 case data set, I noticed that my prediction error 
curves seem to terminate at around 35 months when the actual last survival time 
was 248 months.  I checked the survival times and corresponding alive/dead 
for this data set and noticed that the number of 'deaths' gets very sparse 
after month 35but I'm still a bit puzzled as to why the curves end at 
around 35 months.  Perhaps the small number of cases in the original data set 
leads to small training data sets and hence to a termination of the curves at 
low times?  Has anybody else encountered this?

In our study, we wanted to develop a model on a specific bootstrap sample  and 
then test it out using observations which aren't in the bootstrap sample (this 
would be repeated B times)...we chose the 0.632+ bootstrap estimator.   There 
are many types of  estimators for prediction error curves.  How should we 
decide which is 'best'?

Thanks for any advice on these questions,
Kindest Regards,
Kim

Dr Kim Pearce CStat
Industrial Statistics Research Unit (ISRU)
School of Mathematics and Statistics
Herschel Building
University of Newcastle
Newcastle upon Tyne
United Kingdom
NE1 7RU

Tel.   0044 (0)191 222 6244 (direct)
Fax.   0044 (0)191 222 8020
Email: k.f.pea...@ncl.ac.uk
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Function Surv and interpretation

2009-05-15 Thread K F Pearce


Dear everyone,

My question involves the use of the survival object.

We can have 

Surv(time,time2,event, type=, origin = 0)   
(1)

As detailed on p.65 of:

http://cran.r-project.org/web/packages/survival/survival.pdf


My data (used in my study) is 'right censored'  i.e. my variable corresponding 
to 'event' indicates whether a person is alive (0) or dead (1) at date last 
seen  and my 'time'  indicates time from transplant to date of last contact 
(where this is time from transplant to death if person has died or time from 
transplant to date last seen if person is still alive).

Now I am using function, rcorr.cens

http://lib.stat.cmu.edu/S/Harrell/help/Hmisc/html/rcorr.cens.html

This function involves use of Surv.

Now here is a section of my syntax:


 time-data$ovsrecod
 x1-data$RMY.GROUPS
 death-data$death
 rcorr.cens(x1,Surv(time,death),outx=FALSE)
 (2)

As you can see, I have entered Surv(time,death)...this works (and complies with 
the example given in R for rcorr.cens) and all seems to be well...however, 
bearing in mind that in (1) we have:

Surv(time,time2,event, type=, origin = 0)

...how does R know that 'death' in *my* syntax (2) is the 'event'...i.e. how 
does it know that time2 is skipped in my analysis?  I am a bit perplexed!  The 
R documentation for Surv says that  Surv(time,event) is a 'typical usage' as is 
Surv(time,time2,event, type=, origin = 0)...but how does it know when we are 
using the former and not the latter?  I have tried entering:  
rcorr.cens(x1,Surv(time,event=death),outx=FALSE) but it does not like it saying 
that Error in Surv(time, event = death) : argument time2 is missing, with no 
default

I hope that this makes sense!

Thank you so much for your advice on this ...it's much appreciated, Kim
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plotting question re. cuminc

2009-05-15 Thread K F Pearce
Hello everyone,

(This is my second question posted today on the R list).

I am carrying out a competing risks analysis using the cuminc function...this 
takes the form:

cuminc(ftime,fstatus,group)

In my study, fstatus has 3 different causes of failure (1,2,3) there are also 
censored cases (0).  group has two levels (0 and 1).

I therefore have 6 different cumulative incidence curves:

cause 1, group=0; cause 1 group=1
cause 2, group=0; cause 2 group=1
cause 3, group=0; cause 3 group=1

If I type the following commands:

 xx-cuminc(ftime,fstatus,group)
 plot(xx,lty=1,color=1:6)

I end up with the 6 curves plotted on the same graph.

Is there a way that I can plot a selection of these curves? (say only curves 
for cause 1, group=0 and cause 1 group=1).

Thank you so much,
Kind Regards,
Kim


Dr Kim Pearce CStat
Industrial Statistics Research Unit (ISRU)
School of Mathematics and Statistics
Herschel Building
University of Newcastle
Newcastle upon Tyne
United Kingdom
NE1 7RU

Tel.   0044 (0)191 222 6244 (direct)
Fax.   0044 (0)191 222 8020
Email: k.f.pea...@ncl.ac.uk
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cumulative Incidence : Gray's test

2008-12-15 Thread K F Pearce
Hello everyone, 
 
I am a very new user of R and I have a query about the cuminc function
in the package cmprsk. In particular I would like to verify that I am
interpreting the output correctly when we have a stratification
variable.
 
Hypothetical example:
 
group : fair hair, dark hair
fstatus: 1=Relapse, 2=TRM, 0=censored
strata: sex (M or F)
 
Our data would be split into:
 
Fair, male, relapse
Dark,male, relapse
Fair, female, relapse
Dark, female, relapse
 
Fair, male, TRM
Dark,male, TRM
Fair, female, TRM
Dark, female, TRM
 
Fair, male, censored
Dark,male, censored
Fair, female, censored
Dark, female, censored
 
Am I correct in thinking that the 2 (Gray's) Tests which will be
printed by R tell us (i) if there are significant differences between
those with fair hair and those with dark hair as regards cumulative
incidence of relapse [taking into account sex differences] (ii) if there
are significant differences between those with fair hair and those with
dark hair as regards cumulative incidence of TRM [taking into account
sex differences] ? The 'est'and 'var'values are the same regardless of
whether we include a stratification variable or not.
 
If we do not include a stratification variable the '(Gray's)tests'
results will be different to those when a stratification variable is
included and they test (i) if there are significant differences between
those with fair hair and those with dark hair as regards cumulative
incidence of relapse (ii) if there are significant differences between
those with fair hair and those with dark hair as regards cumulative
incidence of TRM.

Can I ask.what happens when the group variable has say 3 levels?
I guess that the (Gray's) Tests output in R for each 'cause' (in our
case TRM and Relapse) would tell us the significance *overall* three-way
comparison of the groups?  
What if I wanted to determine the significance of  pairwise comparisons
i.e.  so that we were comparing groups (i) 12,  (ii) 23 (iii) 13?
 

Many thanks for your help on this matter,
Kind Regards,


Kim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Concordance Index - interpretation

2008-12-12 Thread K F Pearce
Hello everyone.
 
This is a question regarding generation of the concordance index (c
index) in R using the function rcorr.cens.  In particular about
interpretation of its direction and form of the 'predictor'.
 
One of the arguments is a numeric predictor variable ( presumably
this is just a *single* predictor variable).  Say this variable takes
numeric values  Am I correct in thinking that if the c index is 
0.5 (with Somers D positive) then  this tells us that the higher the
numeric values of the 'predictor', the  greater the survival probability
and similarly if the c index is 0.5 (with Somers D negative) then  this
tells us that the higher the numeric values of the 'predictor' the
lower  the survival probability ?
 
The c index  estimates the probability of concordance between predicted
and observed responsesHarrel et al (1996) says in predicting time
until death, concordance is calculated by considering all possible pairs
of patients, at least one of whom has died.  If the *predicted* survival
time (probability) is larger for the patient who (actually) lived
longer, the predictions for that pair are said to be concordant with the
(actual) outcomes.  .  I have read that the c index is defined by the
proportion of all usable patients in which the predictions and outcomes
are concordant.
 
Now, secondly, I'd like to ask what form the predictor can take.
Presumably if the predictor was a continuous-type variable e.g. 'age'
then predicted survival probability (calculated internally via Cox
regression?) would be compared with actual survival time for each
specific age to get the c index?  Now, if the predictor was an *ordinal
categorical variable* where 1=worst group and 5=best group - I presume
that the c index would be calculated similarly but this time there would
be many ties in the predictor (as regards predicted survival
probability) - hence  if I wanted to count all ties in such a case I
would keep the default argument outx=FALSE? 

Does anyone have a clear reference which gives the formula used to
generate the concordance index (with worked examples)? 
 
Many thanks for your help on these interpretations
Kind Regards,
Kim
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.