Re: [R] linear mixed model using lmer

2022-03-04 Thread array chip via R-help
 Thanks Jeff for reminding me that the attachment is removed. I put it in my 
google drive if anyone wants to test the data 
(https://drive.google.com/file/d/1lgVZVLHeecp9a_sFxEPeg6353O-qXZhM/view?usp=sharing)
I'll try the mixed model mailing list as well.
John
On Friday, March 4, 2022, 04:56:20 PM PST, Jeff Newmiller 
 wrote:  
 
 a) There is a mailing list for that: 
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

b) Read the Posting Guide, as most attachment types are removed to avoid 
propagating worms/viruses. (None seen upon receipt of this email.)

On March 4, 2022 4:41:57 PM PST, array chip via R-help  
wrote:
>Dear all, I have this simple dataset to measure the yeild of a crop collected 
>in 2 batches (attached). when I ran a simple inear mixed model using lmer to 
>estimate within-batch and between-batch variability, the between-batch 
>variability is 0. The run showed that data is singular. Does anyone know why 
>the data is singular and what's the reason for 0 variability? is it because 
>the dataset only has 2 batches?
>> daty<-read.table("datx.txt",sep='\t',header=T,row.names=NULL)
>> library(lme4)> lmer(yield~1+(1|batch),daty)
>boundary (singular) fit: see ?isSingular
>Linear mixed model fit by REML ['lmerMod']
>Formula: yield ~ 1 + (1 | batch)
>   Data: daty
>REML criterion at convergence: 115.6358
>Random effects:
> Groups   Name        Std.Dev.
> batch    (Intercept) 0.000   
> Residual             2.789   
>Number of obs: 24, groups:  batch, 2
>Fixed Effects:
>(Intercept)  
>      5.788  
>
>Thanks!
>John
-- 
Sent from my phone. Please excuse my brevity.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 2 KM curves on the same plot

2020-09-30 Thread array chip via R-help
Hi Jim,

I found out why clip() does not work with lines(survfit.object)!

If you look at code of function survival:::lines.survfit, in th middle of the 
code:

    do.clip <- getOption("plot.survfit")
    if (!is.null(xx <- do.clip$plotclip)) 
        clip(xx[1], xx[2], xx[3], xx[4])

This will reset the clipping to the defualt plot region!

So I just comment out the last 2 lines of the above 3 lines, and created a 
customized lines2 function. Now it works!

It's fun to learn clip().

Thanks,

John


On Wednesday, September 30, 2020, 01:47:55 AM PDT, Jim Lemon 
 wrote: 


Hi John,
Hmmm, this works:

plot(1:10)
xylim<-par("usr")
clip(5,xylim[2],xylim[3],xylim[4])
lines(10:1)

so I suspect that there is a "lines" method that resets the clipping
region out of sight. Fortunately Mark Schwartz provided a way to get
your plot so I will give the wall against which I have been banging my
head a break.

Jim

On Wed, Sep 30, 2020 at 1:57 PM array chip  wrote:
>
> Jim,
>
> I tried a few things, I found that clip() works if I just do some regular 
> graphing tasks. But as long as I run lines(fit) with "fit" object is a 
> survfit object, this would reset to default plot region. See the ovarian 
> example below:
>
> library(survival)
> ovarian1<-ovarian
> ovarian1$fustat[ovarian$futime>450]<-0
> ovarian1$futime[ovarian$futime>450]<-450
> ovarian2<-subset(ovarian,futime>450)
>
> fit1 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian1)
> fit2 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian2)
>
> plot(fit1, xlim=c(0,1200), col = 1:2)
> abline(v=450)
> xylim<-par("usr")
> points(-1,-1)
> clip(450,xylim[2],xylim[3],xylim[4])
> abline(h=0.5,col=2)      ### YES, clipping() works!
>
> lines(fit2, col = 3:4,lty=2)  ### clipping does not work! reset to default 
> plot region
> abline(h=0.4,col=2)      ### NO, clipping() does not work!
>
> So disappointed with this, otherwise this would be such a simple method to do 
> what I want.
>
> Thanks,
>
> John
>
> On Tuesday, September 29, 2020, 07:58:53 PM PDT, Jim Lemon 
>  wrote:
>
> Hi John,
> I should have remembered this. For some reason, the clip() function
> doesn't operate until you have issued a graphics command. Try:
>
> points(-1,-1)
>
> before calling lines()
>
> Jim
>
> On Wed, Sep 30, 2020 at 12:26 PM array chip  wrote:
> >
> > Hi Jim,
> >
> > I tried the clip() function below, surprisingly it did not work! I read the 
> > R help file and feel your script should work. To have a workable example, I 
> > used the ovarian dataset in the survival package as an example:
> >
> > ovarian1<-ovarian
> > ovarian1$fustat[ovarian$futime>450]<-0
> > ovarian1$futime[ovarian$futime>450]<-450
> >
> > ovarian2<-subset(ovarian,futime>450)
> >
> > fit1 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian1)
> > fit2 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian2)
> >
> > plot(fit1, xlim=c(0,1200), col = 1:2)
> > abline(v=450)
> > xylim<-par("usr")
> > clip(450,xylim[2],xylim[3],xylim[4])
> > lines(fit2, col = 3:4,lty=2)
> >
> > I can still see that the extra horizontal line on the top.
> >
> > Can you or anyone have any suggestion what went wrong?
> >
> > Thanks,
> >
> > John
> >
> >
> > On Tuesday, September 29, 2020, 01:35:48 AM PDT, Jim Lemon 
> >  wrote:
> >
> >
> >
> >
> >
> > Hi John,
> > Perhaps the most direct way would be:
> >
> > plot(fit1, col=1:2)
> > xylim<-par("usr")
> > clip(4,xylim[2],xylim[3],xylim[4])
> > lines(fit2,col=1:2)
> >
> > Remember that the new clipping rectangle will persist until you or
> > something else resets it.
> >
> > Jim
> >
> > On Tue, Sep 29, 2020 at 10:34 AM array chip via R-help
> >  wrote:
> > >
> > > Hello,
> > >
> > > Can anyone suggest a simple way to generate a Kaplan-Meier plot with 2 
> > > survfit objects, just like this one:
> > >
> > > https://drive.google.com/file/d/1fEcpdIdE2xYtA6LBQN9ck3JkL6-goabX/view?usp=sharing
> > >
> > > Suppose I have 2 survfit objects: fit1 is for the curve on the left 
> > > (survtime has been truncated to the cutoff line: year 5), fit2 is for the 
> > > curve on the right (minimum survival time is at the cutoff line: year 5), 
> > > but if I do the following:
> > >
> > > plot(fit1, col=1:2)
> > > lines(fit2,col=1:2)
> > >
> > > Then I will have an h

Re: [R] 2 KM curves on the same plot

2020-09-30 Thread array chip via R-help
Thank you Jim for helping! Yes, I will try Mark's method.

John


On Wednesday, September 30, 2020, 01:47:55 AM PDT, Jim Lemon 
 wrote: 

Hi John,
Hmmm, this works:

plot(1:10)
xylim<-par("usr")
clip(5,xylim[2],xylim[3],xylim[4])
lines(10:1)

so I suspect that there is a "lines" method that resets the clipping
region out of sight. Fortunately Mark Schwartz provided a way to get
your plot so I will give the wall against which I have been banging my
head a break.

Jim

On Wed, Sep 30, 2020 at 1:57 PM array chip  wrote:
>
> Jim,
>
> I tried a few things, I found that clip() works if I just do some regular 
> graphing tasks. But as long as I run lines(fit) with "fit" object is a 
> survfit object, this would reset to default plot region. See the ovarian 
> example below:
>
> library(survival)
> ovarian1<-ovarian
> ovarian1$fustat[ovarian$futime>450]<-0
> ovarian1$futime[ovarian$futime>450]<-450
> ovarian2<-subset(ovarian,futime>450)
>
> fit1 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian1)
> fit2 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian2)
>
> plot(fit1, xlim=c(0,1200), col = 1:2)
> abline(v=450)
> xylim<-par("usr")
> points(-1,-1)
> clip(450,xylim[2],xylim[3],xylim[4])
> abline(h=0.5,col=2)      ### YES, clipping() works!
>
> lines(fit2, col = 3:4,lty=2)  ### clipping does not work! reset to default 
> plot region
> abline(h=0.4,col=2)      ### NO, clipping() does not work!
>
> So disappointed with this, otherwise this would be such a simple method to do 
> what I want.
>
> Thanks,
>
> John
>
> On Tuesday, September 29, 2020, 07:58:53 PM PDT, Jim Lemon 
>  wrote:
>
> Hi John,
> I should have remembered this. For some reason, the clip() function
> doesn't operate until you have issued a graphics command. Try:
>
> points(-1,-1)
>
> before calling lines()
>
> Jim
>
> On Wed, Sep 30, 2020 at 12:26 PM array chip  wrote:
> >
> > Hi Jim,
> >
> > I tried the clip() function below, surprisingly it did not work! I read the 
> > R help file and feel your script should work. To have a workable example, I 
> > used the ovarian dataset in the survival package as an example:
> >
> > ovarian1<-ovarian
> > ovarian1$fustat[ovarian$futime>450]<-0
> > ovarian1$futime[ovarian$futime>450]<-450
> >
> > ovarian2<-subset(ovarian,futime>450)
> >
> > fit1 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian1)
> > fit2 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian2)
> >
> > plot(fit1, xlim=c(0,1200), col = 1:2)
> > abline(v=450)
> > xylim<-par("usr")
> > clip(450,xylim[2],xylim[3],xylim[4])
> > lines(fit2, col = 3:4,lty=2)
> >
> > I can still see that the extra horizontal line on the top.
> >
> > Can you or anyone have any suggestion what went wrong?
> >
> > Thanks,
> >
> > John
> >
> >
> > On Tuesday, September 29, 2020, 01:35:48 AM PDT, Jim Lemon 
> >  wrote:
> >
> >
> >
> >
> >
> > Hi John,
> > Perhaps the most direct way would be:
> >
> > plot(fit1, col=1:2)
> > xylim<-par("usr")
> > clip(4,xylim[2],xylim[3],xylim[4])
> > lines(fit2,col=1:2)
> >
> > Remember that the new clipping rectangle will persist until you or
> > something else resets it.
> >
> > Jim
> >
> > On Tue, Sep 29, 2020 at 10:34 AM array chip via R-help
> >  wrote:
> > >
> > > Hello,
> > >
> > > Can anyone suggest a simple way to generate a Kaplan-Meier plot with 2 
> > > survfit objects, just like this one:
> > >
> > > https://drive.google.com/file/d/1fEcpdIdE2xYtA6LBQN9ck3JkL6-goabX/view?usp=sharing
> > >
> > > Suppose I have 2 survfit objects: fit1 is for the curve on the left 
> > > (survtime has been truncated to the cutoff line: year 5), fit2 is for the 
> > > curve on the right (minimum survival time is at the cutoff line: year 5), 
> > > but if I do the following:
> > >
> > > plot(fit1, col=1:2)
> > > lines(fit2,col=1:2)
> > >
> > > Then I will have an horizontal line on the top that connect from 0 to 4 
> > > years, which I do not want that to be drawn (see blue arrow below):
> > >
> > > https://drive.google.com/file/d/178mQGlhnaOg9PA-oE-W_W5CtrGD03ljH/view?usp=sharing
> > >
> > > Can anyone have a strategy to make this kind of plot happen?
> > >
> > > Thanks,
> > >
> > > John
> >
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 2 KM curves on the same plot

2020-09-29 Thread array chip via R-help
Jim,

I tried a few things, I found that clip() works if I just do some regular 
graphing tasks. But as long as I run lines(fit) with "fit" object is a survfit 
object, this would reset to default plot region. See the ovarian example below:

library(survival)
ovarian1<-ovarian
ovarian1$fustat[ovarian$futime>450]<-0
ovarian1$futime[ovarian$futime>450]<-450
ovarian2<-subset(ovarian,futime>450)

fit1 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian1)
fit2 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian2)

plot(fit1, xlim=c(0,1200), col = 1:2)
abline(v=450)
xylim<-par("usr")
points(-1,-1)
clip(450,xylim[2],xylim[3],xylim[4])
abline(h=0.5,col=2)      ### YES, clipping() works!

lines(fit2, col = 3:4,lty=2)  ### clipping does not work! reset to default plot 
region
abline(h=0.4,col=2)       ### NO, clipping() does not work!

So disappointed with this, otherwise this would be such a simple method to do 
what I want.

Thanks,

John

On Tuesday, September 29, 2020, 07:58:53 PM PDT, Jim Lemon 
 wrote: 

Hi John,
I should have remembered this. For some reason, the clip() function
doesn't operate until you have issued a graphics command. Try:

points(-1,-1)

before calling lines()

Jim

On Wed, Sep 30, 2020 at 12:26 PM array chip  wrote:
>
> Hi Jim,
>
> I tried the clip() function below, surprisingly it did not work! I read the R 
> help file and feel your script should work. To have a workable example, I 
> used the ovarian dataset in the survival package as an example:
>
> ovarian1<-ovarian
> ovarian1$fustat[ovarian$futime>450]<-0
> ovarian1$futime[ovarian$futime>450]<-450
>
> ovarian2<-subset(ovarian,futime>450)
>
> fit1 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian1)
> fit2 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian2)
>
> plot(fit1, xlim=c(0,1200), col = 1:2)
> abline(v=450)
> xylim<-par("usr")
> clip(450,xylim[2],xylim[3],xylim[4])
> lines(fit2, col = 3:4,lty=2)
>
> I can still see that the extra horizontal line on the top.
>
> Can you or anyone have any suggestion what went wrong?
>
> Thanks,
>
> John
>
>
> On Tuesday, September 29, 2020, 01:35:48 AM PDT, Jim Lemon 
>  wrote:
>
>
>
>
>
> Hi John,
> Perhaps the most direct way would be:
>
> plot(fit1, col=1:2)
> xylim<-par("usr")
> clip(4,xylim[2],xylim[3],xylim[4])
> lines(fit2,col=1:2)
>
> Remember that the new clipping rectangle will persist until you or
> something else resets it.
>
> Jim
>
> On Tue, Sep 29, 2020 at 10:34 AM array chip via R-help
>  wrote:
> >
> > Hello,
> >
> > Can anyone suggest a simple way to generate a Kaplan-Meier plot with 2 
> > survfit objects, just like this one:
> >
> > https://drive.google.com/file/d/1fEcpdIdE2xYtA6LBQN9ck3JkL6-goabX/view?usp=sharing
> >
> > Suppose I have 2 survfit objects: fit1 is for the curve on the left 
> > (survtime has been truncated to the cutoff line: year 5), fit2 is for the 
> > curve on the right (minimum survival time is at the cutoff line: year 5), 
> > but if I do the following:
> >
> > plot(fit1, col=1:2)
> > lines(fit2,col=1:2)
> >
> > Then I will have an horizontal line on the top that connect from 0 to 4 
> > years, which I do not want that to be drawn (see blue arrow below):
> >
> > https://drive.google.com/file/d/178mQGlhnaOg9PA-oE-W_W5CtrGD03ljH/view?usp=sharing
> >
> > Can anyone have a strategy to make this kind of plot happen?
> >
> > Thanks,
> >
> > John
>
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 2 KM curves on the same plot

2020-09-29 Thread array chip via R-help
Hi Jim,

I tried points(-1,-1) before lines() and before clip(), but either way, it 
still shows everything, :-(

It's interesting that the examples with hist() provided by the R help of clip 
function works nicely.

I also tried a simple linear regression plots below, clip() works, too.

dat<-data.frame(x=1:10,y=1:10)
fit<-lm(y~x,dat)
plot(1:10)
abline(fit)
xylim<-par("usr")
clip(6,xylim[2],xylim[3],xylim[4])
abline(fit,col=2)    ## yes, it only shows the fit line from 6 to the right
lines(c(2,8),c(5,5))    ## yes, it only shows the line from 6 to the right

So it's puzzling that only when using lines() with a survfit() object (ovarian 
example below), somehow clip() doesn't work

John




On Tuesday, September 29, 2020, 07:58:53 PM PDT, Jim Lemon 
 wrote: 

Hi John,
I should have remembered this. For some reason, the clip() function
doesn't operate until you have issued a graphics command. Try:

points(-1,-1)

before calling lines()

Jim

On Wed, Sep 30, 2020 at 12:26 PM array chip  wrote:
>
> Hi Jim,
>
> I tried the clip() function below, surprisingly it did not work! I read the R 
> help file and feel your script should work. To have a workable example, I 
> used the ovarian dataset in the survival package as an example:
>
> ovarian1<-ovarian
> ovarian1$fustat[ovarian$futime>450]<-0
> ovarian1$futime[ovarian$futime>450]<-450
>
> ovarian2<-subset(ovarian,futime>450)
>
> fit1 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian1)
> fit2 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian2)
>
> plot(fit1, xlim=c(0,1200), col = 1:2)
> abline(v=450)
> xylim<-par("usr")
> clip(450,xylim[2],xylim[3],xylim[4])
> lines(fit2, col = 3:4,lty=2)
>
> I can still see that the extra horizontal line on the top.
>
> Can you or anyone have any suggestion what went wrong?
>
> Thanks,
>
> John
>
>
> On Tuesday, September 29, 2020, 01:35:48 AM PDT, Jim Lemon 
>  wrote:
>
>
>
>
>
> Hi John,
> Perhaps the most direct way would be:
>
> plot(fit1, col=1:2)
> xylim<-par("usr")
> clip(4,xylim[2],xylim[3],xylim[4])
> lines(fit2,col=1:2)
>
> Remember that the new clipping rectangle will persist until you or
> something else resets it.
>
> Jim
>
> On Tue, Sep 29, 2020 at 10:34 AM array chip via R-help
>  wrote:
> >
> > Hello,
> >
> > Can anyone suggest a simple way to generate a Kaplan-Meier plot with 2 
> > survfit objects, just like this one:
> >
> > https://drive.google.com/file/d/1fEcpdIdE2xYtA6LBQN9ck3JkL6-goabX/view?usp=sharing
> >
> > Suppose I have 2 survfit objects: fit1 is for the curve on the left 
> > (survtime has been truncated to the cutoff line: year 5), fit2 is for the 
> > curve on the right (minimum survival time is at the cutoff line: year 5), 
> > but if I do the following:
> >
> > plot(fit1, col=1:2)
> > lines(fit2,col=1:2)
> >
> > Then I will have an horizontal line on the top that connect from 0 to 4 
> > years, which I do not want that to be drawn (see blue arrow below):
> >
> > https://drive.google.com/file/d/178mQGlhnaOg9PA-oE-W_W5CtrGD03ljH/view?usp=sharing
> >
> > Can anyone have a strategy to make this kind of plot happen?
> >
> > Thanks,
> >
> > John
>
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 2 KM curves on the same plot

2020-09-29 Thread array chip via R-help
Hi Jim,

I tried the clip() function below, surprisingly it did not work! I read the R 
help file and feel your script should work. To have a workable example, I used 
the ovarian dataset in the survival package as an example:

ovarian1<-ovarian
ovarian1$fustat[ovarian$futime>450]<-0
ovarian1$futime[ovarian$futime>450]<-450

ovarian2<-subset(ovarian,futime>450)

fit1 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian1)
fit2 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian2)

plot(fit1, xlim=c(0,1200), col = 1:2)
abline(v=450)
xylim<-par("usr")
clip(450,xylim[2],xylim[3],xylim[4])
lines(fit2, col = 3:4,lty=2)

I can still see that the extra horizontal line on the top. 

Can you or anyone have any suggestion what went wrong?

Thanks,

John


On Tuesday, September 29, 2020, 01:35:48 AM PDT, Jim Lemon 
 wrote: 





Hi John,
Perhaps the most direct way would be:

plot(fit1, col=1:2)
xylim<-par("usr")
clip(4,xylim[2],xylim[3],xylim[4])
lines(fit2,col=1:2)

Remember that the new clipping rectangle will persist until you or
something else resets it.

Jim

On Tue, Sep 29, 2020 at 10:34 AM array chip via R-help
 wrote:
>
> Hello,
>
> Can anyone suggest a simple way to generate a Kaplan-Meier plot with 2 
> survfit objects, just like this one:
>
> https://drive.google.com/file/d/1fEcpdIdE2xYtA6LBQN9ck3JkL6-goabX/view?usp=sharing
>
> Suppose I have 2 survfit objects: fit1 is for the curve on the left (survtime 
> has been truncated to the cutoff line: year 5), fit2 is for the curve on the 
> right (minimum survival time is at the cutoff line: year 5), but if I do the 
> following:
>
> plot(fit1, col=1:2)
> lines(fit2,col=1:2)
>
> Then I will have an horizontal line on the top that connect from 0 to 4 
> years, which I do not want that to be drawn (see blue arrow below):
>
> https://drive.google.com/file/d/178mQGlhnaOg9PA-oE-W_W5CtrGD03ljH/view?usp=sharing
>
> Can anyone have a strategy to make this kind of plot happen?
>
> Thanks,
>
> John

>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 2 KM curves on the same plot

2020-09-29 Thread array chip via R-help
Thank you Marc as well! I'll try both ways! Yes this is an oncology study with 
time set at 5

John


On Tuesday, September 29, 2020, 07:08:39 AM PDT, Marc Schwartz 
 wrote: 





Hi John,

>From the looks of the first plot, it would appear that perhaps you are engaged 
>in a landmark analysis in an oncology setting, with the landmark time set at 5 
>years. On the off chance that you are not familiar with the pros and cons of 
>that methodology, Google "landmark analysis", which should yield a number of 
>references.

With respect to the plot itself, here is one approach, in addition to the one 
that Jim suggested. This will use a 1 x 2 matrix of plot regions, and adjust 
the various axes accordingly.

library(survival)

## Create two models with the same data for this use, the second adding 150 to 
the event times
## to mimic a landmark time at 150 weeks

fit <- survfit(Surv(time, status) ~ x, data = aml)
fit2 <- survfit(Surv(time, status) ~ x, data = aml)
fit2$time <- fit2$time + 150

## create the x 1 x 2 plot matrix
par(mfrow = c(1, 2))

## Set the plot region margins so there is no space to the right
par(mar = c(4, 4, 4, 0))

## Fix the plot limits for consistency
## xaxs = "i" removes the default 4% extensions to the plot region limits
plot(fit, axes = FALSE, xlim = c(0, 150), ylim = c(0, 1), xaxs = "i")

axis(1, at = seq(0, 150, 50), line = -1)
axis(2, las = 1)

## Set the plot region margins so there is no space to the left
par(mar = c(4, 0, 4, 4))

## Set the plot limits for the second time interval
plot(fit2, axes = FALSE, xlim = c(150, 300), ylim = c(0, 1), xaxs = "i")

axis(1, at = seq(150, 300, 50), line = -1)
axis(4, las = 1)

## Draw the vertical line at 150 weeks
axis(2, at = seq(0, 1, 0.2), labels = FALSE, lty = "dashed")


Regards,

Marc Schwartz


> On Sep 28, 2020, at 8:33 PM, array chip via R-help  
> wrote:
> 
> Hello,
> 
> Can anyone suggest a simple way to generate a Kaplan-Meier plot with 2 
> survfit objects, just like this one: 
> 
> https://drive.google.com/file/d/1fEcpdIdE2xYtA6LBQN9ck3JkL6-goabX/view?usp=sharing
> 
> Suppose I have 2 survfit objects: fit1 is for the curve on the left (survtime 
> has been truncated to the cutoff line: year 5), fit2 is for the curve on the 
> right (minimum survival time is at the cutoff line: year 5), but if I do the 
> following:
> 
> plot(fit1, col=1:2)
> lines(fit2,col=1:2)
> 
> Then I will have an horizontal line on the top that connect from 0 to 4 
> years, which I do not want that to be drawn (see blue arrow below):
> 
> https://drive.google.com/file/d/178mQGlhnaOg9PA-oE-W_W5CtrGD03ljH/view?usp=sharing
> 
> Can anyone have a strategy to make this kind of plot happen?
> 
> Thanks,
> 
> John

> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] 2 KM curves on the same plot

2020-09-29 Thread array chip via R-help
Thank you very much Jim, this is great!!

John


On Tuesday, September 29, 2020, 01:35:48 AM PDT, Jim Lemon 
 wrote: 





Hi John,
Perhaps the most direct way would be:

plot(fit1, col=1:2)
xylim<-par("usr")
clip(4,xylim[2],xylim[3],xylim[4])
lines(fit2,col=1:2)

Remember that the new clipping rectangle will persist until you or
something else resets it.

Jim

On Tue, Sep 29, 2020 at 10:34 AM array chip via R-help
 wrote:
>
> Hello,
>
> Can anyone suggest a simple way to generate a Kaplan-Meier plot with 2 
> survfit objects, just like this one:
>
> https://drive.google.com/file/d/1fEcpdIdE2xYtA6LBQN9ck3JkL6-goabX/view?usp=sharing
>
> Suppose I have 2 survfit objects: fit1 is for the curve on the left (survtime 
> has been truncated to the cutoff line: year 5), fit2 is for the curve on the 
> right (minimum survival time is at the cutoff line: year 5), but if I do the 
> following:
>
> plot(fit1, col=1:2)
> lines(fit2,col=1:2)
>
> Then I will have an horizontal line on the top that connect from 0 to 4 
> years, which I do not want that to be drawn (see blue arrow below):
>
> https://drive.google.com/file/d/178mQGlhnaOg9PA-oE-W_W5CtrGD03ljH/view?usp=sharing
>
> Can anyone have a strategy to make this kind of plot happen?
>
> Thanks,
>
> John
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 2 KM curves on the same plot

2020-09-28 Thread array chip via R-help
Hello,

Can anyone suggest a simple way to generate a Kaplan-Meier plot with 2 survfit 
objects, just like this one: 

https://drive.google.com/file/d/1fEcpdIdE2xYtA6LBQN9ck3JkL6-goabX/view?usp=sharing

Suppose I have 2 survfit objects: fit1 is for the curve on the left (survtime 
has been truncated to the cutoff line: year 5), fit2 is for the curve on the 
right (minimum survival time is at the cutoff line: year 5), but if I do the 
following:

plot(fit1, col=1:2)
lines(fit2,col=1:2)

Then I will have an horizontal line on the top that connect from 0 to 4 years, 
which I do not want that to be drawn (see blue arrow below):

https://drive.google.com/file/d/178mQGlhnaOg9PA-oE-W_W5CtrGD03ljH/view?usp=sharing

Can anyone have a strategy to make this kind of plot happen?

Thanks,

John

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [External] R rounding problem?

2020-09-03 Thread array chip via R-help
Thanks Richard. Got it now...


On Thursday, September 3, 2020, 10:12:36 PM PDT, Richard M. Heiberger 
 wrote: 


FAQ 7.31

On Fri, Sep 4, 2020 at 12:47 AM array chip via R-help
 wrote:
>
> Hello,
>
> I made a mistake today on simple counting in R, that almost got me into 
> trouble. After trying multiple times, I finally figured out it's rounding 
> issue in R.
>
> For exmaple, when I just simply type:
>
> > (6.9-6.3) > 0.6
> [1] TRUE
>
> 6.9-6.3 should be 0.6 exactly, but R thinks that it's greater than 0.6!!
>
> Similarly, R thinks 5.6-5.5 is smaller than 0.1:
>
> > (5.6-5.5) < 0.1
> [1] TRUE
>
> Why is the above happening? This rounding issue seems to be small, but this 
> could cause serious problem in real world.
>
> Can anyone shed a light on how to avoid the issue?
>
> Thanks,
>
> Yi

>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R rounding problem?

2020-09-03 Thread array chip via R-help
Hello,

I made a mistake today on simple counting in R, that almost got me into 
trouble. After trying multiple times, I finally figured out it's rounding issue 
in R.

For exmaple, when I just simply type:

> (6.9-6.3) > 0.6
[1] TRUE

6.9-6.3 should be 0.6 exactly, but R thinks that it's greater than 0.6!!

Similarly, R thinks 5.6-5.5 is smaller than 0.1:

> (5.6-5.5) < 0.1
[1] TRUE

Why is the above happening? This rounding issue seems to be small, but this 
could cause serious problem in real world.

Can anyone shed a light on how to avoid the issue?

Thanks,

Yi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to generate this kind of graph

2020-07-22 Thread array chip via R-help


Hello everyone,

I saw this scatterplots from a paper and thought it looked very nice:

https://drive.google.com/file/d/1V7F1gq-J_GIFDOrJs00hwGyXUqCZ_xwa/view?usp=sharing

It was similar to stripchart() with 'jitter' method, but it has a special 
pattern of aligning points which made it look nicer than standard stripchart().

Does anyone know if there is a package in R that can do this kind of plots?

Thanks,

John

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to generate this type of scatter plots in R

2020-07-22 Thread array chip via R-help


Hello everyone,

I saw this scatterplots from a paper and thought it looked very nice:

https://drive.google.com/file/d/1V7F1gq-J_GIFDOrJs00hwGyXUqCZ_xwa/view?usp=sharing

It was similar to stripchart() with 'jitter' method, but it has a special 
pattern of aligning points which made it look nicer than standard stripchart().

Does anyone know if there is a package in R that can do this kind of plots?

Thanks,

John

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R 4.0.2 is released

2020-06-23 Thread array chip via R-help
My current version of R is 3.6.3, there is no problem of installing packages 
there. I also have no problem installing bioconductor packages with 3.6.3. 

But installing packages from biocondcutor failed as well on 4.0.2 (also tried 
4.0.0 and failed, too)


On Tuesday, June 23, 2020, 08:56:45 PM PDT, array chip via R-help 
 wrote: 

Hi,

I downloaded R4.0.2 and installed it succesffully without any error. However, 
when I opened up a R session (using x64) and tried to install packages, I got 
the following error message:

> utils:::menuInstallPkgs()
Warning: failed to download mirrors file (internet routines cannot be loaded); 
using local file 'C:/PROGRA~1/R/R-40~1.2/doc/CRAN_mirrors.csv'
Warning: unable to access index for repository 
https://rweb.crmda.ku.edu/cran/src/contrib:
  internet routines cannot be loaded
Error in install.packages(lib = .libPaths()[1L], dependencies = NA, type = 
type) : 
  argument "pkgs" is missing, with no default
In addition: Warning message:
In download.file(url, destfile = f, quiet = TRUE) :
  unable to load shared object 
'C:/PROGRA~1/R/R-40~1.2/modules/x64/internet.dll':
  LoadLibrary failure:  Access is denied.

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)


Matrix products: default


locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
 
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                         
 
[5] LC_TIME=English_United States.1252    


attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     


loaded via a namespace (and not attached):
[1] compiler_4.0.2 tools_4.0.2   


However, if I load up a session using i386 verson, then everything is ok - I 
can install packages.

Can you and anyone suggest what is going on?

Thanks,

John




On Monday, June 22, 2020, 01:24:20 AM PDT, Peter Dalgaard via R-help 
 wrote: 





The build system rolled up R-4.0.2.tar.gz (codename "Taking Off Again") this 
morning.

The list below details the changes in this release.

You can get the source code from

http://cran.r-project.org/src/base/R-4/R-4.0.2.tar.gz

or wait for it to be mirrored at a CRAN site nearer to you.

Binaries for various platforms will appear in due course.


For the R Core Team,

Peter Dalgaard

These are the checksums (md5 and SHA-256) for the freshly created files, in 
case you wish
to check that they are uncorrupted:

MD5 (AUTHORS) = b9c44f9f78cab3184ad9898bebc854b4
MD5 (COPYING) = eb723b61539feef013de476e68b5c50a
MD5 (COPYING.LIB) = a6f89e2100d9b6cdffcea4f398e37343
MD5 (FAQ) = 4afa171cd982aaa60f0ba92e2e7bc5d6
MD5 (INSTALL) = 7893f754308ca31f1ccf62055090ad7b
MD5 (NEWS) = 566a6bb3642e28e6bf01cf98db31137c
MD5 (NEWS.0) = bfcd7c147251b5474d96848c6f57e5a8
MD5 (NEWS.1) = eb78c4d053ec9c32b815cf0c2ebea801
MD5 (NEWS.2) = 496062c138e2def06cebccddfb814ac6
MD5 (NEWS.3) = 012e7f4a80cc8ec947bf3f0ff6117ec8
MD5 (R-latest.tar.gz) = 1eac7293d5fe313a56ddabfda02b437e
MD5 (README) = f468f281c919665e276a1b691decbbe6
MD5 (RESOURCES) = 529223fd3ffef95731d0a87353108435
MD5 (THANKS) = 251d20510bfc3cc93b82c5a99f7efcc6
MD5 (VERSION-INFO.dcf) = 62496d3a0fd8cc2ed644ea518c052371
MD5 (R-4/R-4.0.2.tar.gz) = 1eac7293d5fe313a56ddabfda02b437e

2cde824a7b18958e5f06b391c801c8288be0f84fa8934b7ddefef23c67e60c09  AUTHORS
e6d6a009505e345fe949e1310334fcb0747f28dae2856759de102ab66b722cb4  COPYING
6095e9ffa777dd22839f7801aa845b31c9ed07f3d6bf8a26dc5d2dec8ccc0ef3  COPYING.LIB
eddf87b12197c7b3b19cbc9b11c1beab95b14e3dcd715bf37d2f6a8b2a72c2a1  FAQ
f87461be6cbaecc4dce44ac58e5bd52364b0491ccdadaf846cb9b452e9550f31  INSTALL
ec05bba338358410fae6b34fed061605989ab3601aba1b3fcb45a610d5dd2eb9  NEWS
4e21b62f515b749f80997063fceab626d7258c7d650e81a662ba8e0640f12f62  NEWS.0
12b30c724117b1b2b11484673906a6dcd48a361f69fc420b36194f9218692d01  NEWS.1
e80de410c77f05ff2012fa70051b89119845f734a7fa5c55857e61e4ed7d5f6e  NEWS.2
7201d139947afa52b5e09d26dc01445edf444506264355b2185122bc1ed3dce0  NEWS.3
d3bceab364da0876625e4097808b42512395fdf41292f4915ab1fd257c1bbe75  
R-latest.tar.gz
2fdd3e90f23f32692d4b3a0c0452f2c219a10882033d1774f8cadf25886c3ddc  README
408737572ecc6e1135fdb2cf7a9dbb1a6cb27967c757f1771b8c39d1fd2f1ab9  RESOURCES
c9c7cb32308b4e560a22c858819ade9de524a602abd4e92d1c328c89f8037d73  THANKS
10cc5f566a4a5ce49147e7dcfbe9180dba09ccb9efb17298b067309eb799e92e  
VERSION-INFO.dcf
d3bceab364da0876625e4097808b42512395fdf41292f4915ab1fd257c1bbe75  
R-4/R-4.0.2.tar.gz

This is the relevant part of the NEWS file

CHANGES IN R 4.0.2:

  UTILITIES:

    * R CMD check skips vignette re-building (with a warning) if the
      VignetteBuilder package(s) are not available.

  BUG FIXES:

    * Paths with non-ASCII characters caused problems for package
      loading on Windows PR#17833.

    * Using tcltk widgets no longer crashes R on Windows.

    * source(*, echo=TRUE) no longer fails in some cases with empty
      lines; rep

Re: [R] R 4.0.2 is released

2020-06-23 Thread array chip via R-help
Hi,

I downloaded R4.0.2 and installed it succesffully without any error. However, 
when I opened up a R session (using x64) and tried to install packages, I got 
the following error message:

> utils:::menuInstallPkgs()
Warning: failed to download mirrors file (internet routines cannot be loaded); 
using local file 'C:/PROGRA~1/R/R-40~1.2/doc/CRAN_mirrors.csv'
Warning: unable to access index for repository 
https://rweb.crmda.ku.edu/cran/src/contrib:
  internet routines cannot be loaded
Error in install.packages(lib = .libPaths()[1L], dependencies = NA, type = 
type) : 
  argument "pkgs" is missing, with no default
In addition: Warning message:
In download.file(url, destfile = f, quiet = TRUE) :
  unable to load shared object 
'C:/PROGRA~1/R/R-40~1.2/modules/x64/internet.dll':
  LoadLibrary failure:  Access is denied.

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)


Matrix products: default


locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
 
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                         
 
[5] LC_TIME=English_United States.1252    


attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     


loaded via a namespace (and not attached):
[1] compiler_4.0.2 tools_4.0.2   


However, if I load up a session using i386 verson, then everything is ok - I 
can install packages.

Can you and anyone suggest what is going on?

Thanks,

John




On Monday, June 22, 2020, 01:24:20 AM PDT, Peter Dalgaard via R-help 
 wrote: 





The build system rolled up R-4.0.2.tar.gz (codename "Taking Off Again") this 
morning.

The list below details the changes in this release.

You can get the source code from

http://cran.r-project.org/src/base/R-4/R-4.0.2.tar.gz

or wait for it to be mirrored at a CRAN site nearer to you.

Binaries for various platforms will appear in due course.


For the R Core Team,

Peter Dalgaard

These are the checksums (md5 and SHA-256) for the freshly created files, in 
case you wish
to check that they are uncorrupted:

MD5 (AUTHORS) = b9c44f9f78cab3184ad9898bebc854b4
MD5 (COPYING) = eb723b61539feef013de476e68b5c50a
MD5 (COPYING.LIB) = a6f89e2100d9b6cdffcea4f398e37343
MD5 (FAQ) = 4afa171cd982aaa60f0ba92e2e7bc5d6
MD5 (INSTALL) = 7893f754308ca31f1ccf62055090ad7b
MD5 (NEWS) = 566a6bb3642e28e6bf01cf98db31137c
MD5 (NEWS.0) = bfcd7c147251b5474d96848c6f57e5a8
MD5 (NEWS.1) = eb78c4d053ec9c32b815cf0c2ebea801
MD5 (NEWS.2) = 496062c138e2def06cebccddfb814ac6
MD5 (NEWS.3) = 012e7f4a80cc8ec947bf3f0ff6117ec8
MD5 (R-latest.tar.gz) = 1eac7293d5fe313a56ddabfda02b437e
MD5 (README) = f468f281c919665e276a1b691decbbe6
MD5 (RESOURCES) = 529223fd3ffef95731d0a87353108435
MD5 (THANKS) = 251d20510bfc3cc93b82c5a99f7efcc6
MD5 (VERSION-INFO.dcf) = 62496d3a0fd8cc2ed644ea518c052371
MD5 (R-4/R-4.0.2.tar.gz) = 1eac7293d5fe313a56ddabfda02b437e

2cde824a7b18958e5f06b391c801c8288be0f84fa8934b7ddefef23c67e60c09  AUTHORS
e6d6a009505e345fe949e1310334fcb0747f28dae2856759de102ab66b722cb4  COPYING
6095e9ffa777dd22839f7801aa845b31c9ed07f3d6bf8a26dc5d2dec8ccc0ef3  COPYING.LIB
eddf87b12197c7b3b19cbc9b11c1beab95b14e3dcd715bf37d2f6a8b2a72c2a1  FAQ
f87461be6cbaecc4dce44ac58e5bd52364b0491ccdadaf846cb9b452e9550f31  INSTALL
ec05bba338358410fae6b34fed061605989ab3601aba1b3fcb45a610d5dd2eb9  NEWS
4e21b62f515b749f80997063fceab626d7258c7d650e81a662ba8e0640f12f62  NEWS.0
12b30c724117b1b2b11484673906a6dcd48a361f69fc420b36194f9218692d01  NEWS.1
e80de410c77f05ff2012fa70051b89119845f734a7fa5c55857e61e4ed7d5f6e  NEWS.2
7201d139947afa52b5e09d26dc01445edf444506264355b2185122bc1ed3dce0  NEWS.3
d3bceab364da0876625e4097808b42512395fdf41292f4915ab1fd257c1bbe75  
R-latest.tar.gz
2fdd3e90f23f32692d4b3a0c0452f2c219a10882033d1774f8cadf25886c3ddc  README
408737572ecc6e1135fdb2cf7a9dbb1a6cb27967c757f1771b8c39d1fd2f1ab9  RESOURCES
c9c7cb32308b4e560a22c858819ade9de524a602abd4e92d1c328c89f8037d73  THANKS
10cc5f566a4a5ce49147e7dcfbe9180dba09ccb9efb17298b067309eb799e92e  
VERSION-INFO.dcf
d3bceab364da0876625e4097808b42512395fdf41292f4915ab1fd257c1bbe75  
R-4/R-4.0.2.tar.gz

This is the relevant part of the NEWS file

CHANGES IN R 4.0.2:

  UTILITIES:

    * R CMD check skips vignette re-building (with a warning) if the
      VignetteBuilder package(s) are not available.

  BUG FIXES:

    * Paths with non-ASCII characters caused problems for package
      loading on Windows PR#17833.

    * Using tcltk widgets no longer crashes R on Windows.

    * source(*, echo=TRUE) no longer fails in some cases with empty
      lines; reported by Bill Dunlap in PR#17769.

    * on.exit() now correctly matches named arguments, thanks to
      PR#17815 (including patch) by Brodie Gaslam.

    * regexpr(*, perl=TRUE) no longer returns incorrect positions into
      text containing characters outside of the Unicode Basic
      Multilingual Plane on Windows.

-- 
Peter Dalgaard, Professor,
Center for 

Re: [R] weighed Fleming-Harrington log rank test

2018-02-16 Thread array chip via R-help
 Thank you Terry. Right now I can use comp() from survMisc package to do the 
2-parameter version of F-H weighting. I think both SAS and stata offer the 
2-parameter version, so just thought it would be nice if survdiff() can have 
that option given it's standard package in R. 
Thanks!
John
On Friday, February 16, 2018, 7:08:46 AM PST, Therneau, Terry M., Ph.D. 
 wrote:  
 
  
You are correct that the survreg routine only supports 'rho' of the 
Fleming-Harrington G-rho tests.  This is a function of age -- I wrote the 
original code back when I was working with Tom (Fleming), and he was only using 
1 parameter.  Later he and David expanded the test to two parameters.  This 
might be only the second request for the feature in the 30+ years since that 
date. 
 
  
 
Terry Therneau
   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fleming-Harrington weighted log rank test

2018-02-15 Thread array chip via R-help
 Thank you David!
On Wednesday, February 14, 2018, 6:05:46 PM PST, David Winsemius 
<dwinsem...@comcast.net> wrote:  
 
 
> On Feb 14, 2018, at 5:26 PM, David Winsemius <dwinsem...@comcast.net> wrote:
> 
>> 
>> On Feb 13, 2018, at 4:02 PM, array chip via R-help <r-help@r-project.org> 
>> wrote:
>> 
>> Hi all, 
>> 
>> The survdiff() from survival package has an argument "rho" that implements 
>> Fleming-Harrington weighted long rank test. 
>> 
>> But according to several sources including "survminer" package 
>> (https://cran.r-project.org/web/packages/survminer/vignettes/Specifiying_weights_in_log-rank_comparisons.html),
>>  Fleming-Harrington weighted log-rank test should have 2 parameters "p" and 
>> "q" to control the weighting for earlier vs later times in the follow-up.
>> 
>> For example, setting rho=1 in survdiff() uses the Peto-Peto modification of 
>> Gehan-Wilcox weights, which I can confirm by setting p=1 & 1=0 in comp() 
>> from survminer package. similarly rho=0 is equivalent to p=0 & q=0
>> 
>> I am interested in putting more weights on survival difference in later 
>> follow-up time. According to comp() from survminer package, that would set 
>> p=0 & q=1 for Fleming-Harrington weights. 
>> 
>> My question is how I can do the same by setting certain values for "rho" in 
>> the regular survival() function?
> 
> I think that survdiff uses a different version than what you have found. The 
> G-rho family weights are:
> 
> w_j = [Sˆ(tj)]^ρ
> 
> So rather than two parameters on S(t) and (1-S(t)) as in the p,q version, you 
> only have one parameter applied to S(t). This class handout says that the 
> G-rho,gamma weighting scheme is not available in survdiff.
> 

Forgot to paste the link:

http://www.ics.uci.edu/~dgillen/STAT255/Handouts/lecture4.pdf

> -- 
> David Winsemius
> Alameda, CA, USA
> 
> 'Any technology distinguishable from magic is insufficiently advanced.'  
> -Gehm's Corollary to Clarke's Third Law

David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'  
-Gehm's Corollary to Clarke's Third Law




  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fleming-Harrington weighted log rank test

2018-02-13 Thread array chip via R-help
Hi all, 

The survdiff() from survival package has an argument "rho" that implements 
Fleming-Harrington weighted long rank test. 

But according to several sources including "survminer" package 
(https://cran.r-project.org/web/packages/survminer/vignettes/Specifiying_weights_in_log-rank_comparisons.html),
 Fleming-Harrington weighted log-rank test should have 2 parameters "p" and "q" 
to control the weighting for earlier vs later times in the follow-up.

For example, setting rho=1 in survdiff() uses the Peto-Peto modification of 
Gehan-Wilcox weights, which I can confirm by setting p=1 & 1=0 in comp() from 
survminer package. similarly rho=0 is equivalent to p=0 & q=0

I am interested in putting more weights on survival difference in later 
follow-up time. According to comp() from survminer package, that would set p=0 
& q=1 for Fleming-Harrington weights. 

My question is how I can do the same by setting certain values for "rho" in the 
regular survival() function?

Thank you,

John

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SAMseq errors

2017-11-29 Thread array chip via R-help
Sorry forgot to use plain text format, hope this time it works:

Hi, I am trying to using SAMseq() to analyze my RNA-seq experiment (2 genes 
x 550 samples) with survival endpoint. It quickly give the following error:

> library(samr)
Loading required package: impute
Loading required package: matrixStats

Attaching package: ‘matrixStats’

The following objects are masked from ‘package:Biobase’:

    anyMissing, rowMedians

Warning messages:
1: package ‘samr’ was built under R version 3.3.3 
2: package ‘matrixStats’ was built under R version 3.3.3

> samfit<-SAMseq(data, PFI.time,censoring.status=PFI.status, 
> resp.type="Survival")

Estimating sequencing depths...
Error in quantile.default(prop, c(0.25, 0.75)) : 
  missing values and NaN's not allowed if 'na.rm' is FALSE
In addition: Warning message:
In sum(x) : integer overflow - use sum(as.numeric(.))
Error during wrapup: cannot open the connection

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
  LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252   
 

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] samr_2.0             matrixStats_0.52.2   impute_1.48.0        
BiocInstaller_1.24.0 rcom_3.1-3           rscproxy_2.1-1      

loaded via a namespace (and not attached):
[1] tools_3.3.2


I checked, my data matrix and y variables have no missing values. Anyone has 
suggestions what's going on?

Thank you!

John

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] SAMseq errors

2017-11-29 Thread array chip via R-help
Hi, I am trying to using SAMseq() to analyze my RNA-seq experiment (2 genes 
x 550 samples) with survival endpoint. It quickly give the following error:
> library(samr)Loading required package: imputeLoading required package: 
> matrixStats
Attaching package: ‘matrixStats’
The following objects are masked from ‘package:Biobase’:
    anyMissing, rowMedians
Warning messages:1: package ‘samr’ was built under R version 3.3.3 2: package 
‘matrixStats’ was built under R version 3.3.3
> samfit<-SAMseq(data, PFI.time,censoring.status=PFI.status, 
> resp.type="Survival")
Estimating sequencing depths...Error in quantile.default(prop, c(0.25, 0.75)) : 
  missing values and NaN's not allowed if 'na.rm' is FALSEIn addition: Warning 
message:In sum(x) : integer overflow - use sum(as.numeric(.))Error during 
wrapup: cannot open the connection
> sessionInfo()R version 3.3.2 (2016-10-31)Platform: x86_64-w64-mingw32/x64 
> (64-bit)Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
States.1252    LC_MONETARY=English_United States.1252[4] LC_NUMERIC=C           
                LC_TIME=English_United States.1252    
attached base packages:[1] stats     graphics  grDevices datasets  utils     
methods   base     
other attached packages:[1] samr_2.0             matrixStats_0.52.2   
impute_1.48.0        BiocInstaller_1.24.0 rcom_3.1-3           rscproxy_2.1-1   
   
loaded via a namespace (and not attached):[1] tools_3.3.2

I checked, my data matrix and y variables have no missing values. Anyone has 
suggestions what's going on?
Thank you!
John

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] gsDesign Pocock & OBF boundary

2017-09-24 Thread array chip via R-help
Sorry it didn 't work again. I am on yahoo mail, and just found a switch to 
change from Rick text to Plain text, so here it goes again:


I am learning to use the gsDesign package. 

I have a question about Pocock and OBF boundary. As far as I can understand, 
these 2 boundaries require equal spacing between interim analyses (maybe this 
is not correct?). But looks like I can still use gsDesign to run an analysis 
based on unequal spacing: 

> library(gsDesign)
> gsDesign(k=2,test.type=2,timing=c(0.75,1),alpha=0.05,sfu='Pocock') 


Symmetric two-sided group sequential design with 
90 % power and 5 % Type I Error. 
Spending computations assume trial stops 
if a bound is crossed. 

 Sample 
 Size 
Analysis Ratio*  Z   Nominal p  Spend 
   1  0.796  1.82   0.0346 0.0346 
   2  1.061  1.82   0.0346 0.0154 
   Total  0.0500 

++ alpha spending: 
Pocock boundary. 
* Sample size ratio compared to fixed design with no interim 
  

Can anyone share some light whether the above analysis is still valid? Or for 
unequal spacing, I have to use Lan-Demet’s error spending function 
approximations? Thank you,





From: Jeff Newmiller <jdnew...@dcn.davis.ca.us>

oject.org>; Berend Hasselman <b...@xs4all.nl> 
Cc: R-help Mailing List <r-help@r-project.org>
Sent: Sunday, September 24, 2017 12:41 AM
Subject: Re: [R] gsDesign Pocock & OBF boundary



Still failed. 

The first secret is in your email program settings, to use Plain Text format 
(at least for emails you send to this mailing list).

The second secret tool to use is the reprex package to let you verify that your 
code example will do on our computers what it is doing on your computer before 
you send it to us. That will also involve giving us some sample data or 
referencing some data already available to us in a relevant package. See [1], 
[2] and [3] for more discussion of how to succeed at communicating on the 
Internet regarding R. 

[1] 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

[2] http://adv-r.had.co.nz/Reproducibility.html

[3] https://cran.r-project.org/web/packages/reprex/index.html (read the 
vignette) 
-- 
Sent from my phone. Please excuse my brevity.


On September 23, 2017 9:53:05 PM PDT, array chip via R-help 
<r-help@r-project.org> wrote:
>Sorry for messed up text. Here it goes again:
>I am learning to use the gsDesign package.
>I have a question about Pocock and OBF boundary. As far as I can
>understand, these 2 boundaries require equal spacing between interim
>analyses (maybe this is not correct?). But looks like I can still use
>gsDesign to run an analysis based on unequal spacing: 
>> gsDesign(k=2,test.type=2,timing=c(0.75,1),alpha=0.05,sfu='Pocock')
>Symmetric two-sided group sequential design with 90 %power and 5 % Type
>I Error.Spending computations assume trial stops if a bound is
>crossed.  Sample  Size  Analysis Ratio*  Z  Nominal p 
>Spend1  0.796 1.820.0346 0.03462  1.061 1.82   
>0.0346 0.0154Total  0.0500 ++alpha spending:
>Pocock boundary.*Sample size ratio compared to fixed design with no
>interim 
>Can anyone share some light whether the above analysis is still valid?
>Or for unequal spacing, I have to use Lan-Demet’s error spending
>function approximations? Thank you,
>
>
>
>  From: Berend Hasselman <b...@xs4all.nl>

>Cc: R-help Mailing List <r-help@r-project.org>
> Sent: Friday, September 22, 2017 11:46 PM
> Subject: Re: [R] gsDesign Pocock & OBF boundary
>  
>
>> On 23 Sep 2017, at 01:32, array chip via R-help
><r-help@r-project.org> wrote:
>> 
>> Hi, 
>> 
>> I am learning to use your gsDesign package! I have a question about
>Pocock and OBF boundary. As far as Iunderstand, these 2 boundaries
>require equal spacing between interim analyses(maybe this is not
>correct?). But I can still use gsDesign to run an analysisbased on
>unequal spacing:
>gsDesign(k=2,test.type=2,timing=c(0.75,1),alpha=0.05,sfu='Pocock')Symmetrictwo-sided
>group sequential design with90 %power and 5 % Type I
>Error.Spendingcomputations assume trial stops if a bound is crossed.   
>  Sample  Size  AnalysisRatio*  Z  Nominal p  Spend   
>1  0.796 1.820.0346 0.03462  1.061 1.820.0346 0.0154   
>Total  0.0500  ++alpha
>spending:Pocockboundary.*Sample size ratio compared to fixed design
>with no interim Can anyone share some light whether the above analysis
>is stillvalid? Or for unequal spacing, I have to use Lan-Demet’s error
>spendingfunction approximations? Thank you,
>> [[alternative HTML version deleted]]
>> 
>
>Your example code is a complete mess.
>Do NOT post in html. This is a pl

Re: [R] gsDesign Pocock & OBF boundary

2017-09-23 Thread array chip via R-help
Sorry for messed up text. Here it goes again:
I am learning to use the gsDesign package.
I have a question about Pocock and OBF boundary. As far as I can understand, 
these 2 boundaries require equal spacing between interim analyses (maybe this 
is not correct?). But looks like I can still use gsDesign to run an analysis 
based on unequal spacing: 
> gsDesign(k=2,test.type=2,timing=c(0.75,1),alpha=0.05,sfu='Pocock')
Symmetric two-sided group sequential design with 90 %power and 5 % Type I 
Error.Spending computations assume trial stops if a bound is crossed.          
Sample          Size  Analysis Ratio*  Z  Nominal p  Spend        1  0.796 1.82 
   0.0346 0.0346        2  1.061 1.82    0.0346 0.0154    Total                 
     0.0500 ++alpha spending: Pocock boundary.*Sample size ratio compared to 
fixed design with no interim 
Can anyone share some light whether the above analysis is still valid? Or for 
unequal spacing, I have to use Lan-Demet’s error spending function 
approximations? Thank you,



  From: Berend Hasselman <b...@xs4all.nl>
 To: array chip <arrayprof...@yahoo.com> 
Cc: R-help Mailing List <r-help@r-project.org>
 Sent: Friday, September 22, 2017 11:46 PM
 Subject: Re: [R] gsDesign Pocock & OBF boundary
   

> On 23 Sep 2017, at 01:32, array chip via R-help <r-help@r-project.org> wrote:
> 
> Hi, 
> 
> I am learning to use your gsDesign package! I have a question about Pocock 
> and OBF boundary. As far as Iunderstand, these 2 boundaries require equal 
> spacing between interim analyses(maybe this is not correct?). But I can still 
> use gsDesign to run an analysisbased on unequal spacing: 
> gsDesign(k=2,test.type=2,timing=c(0.75,1),alpha=0.05,sfu='Pocock')Symmetrictwo-sided
>  group sequential design with90 %power and 5 % Type I 
> Error.Spendingcomputations assume trial stops if a bound is crossed.          
> Sample          Size  AnalysisRatio*  Z  Nominal p  Spend        1  0.796 
> 1.82    0.0346 0.0346        2  1.061 1.82    0.0346 0.0154    Total          
>             0.0500  ++alpha spending:Pocockboundary.*Sample size ratio 
> compared to fixed design with no interim Can anyone share some light whether 
> the above analysis is stillvalid? Or for unequal spacing, I have to use 
> Lan-Demet’s error spendingfunction approximations? Thank you,
>     [[alternative HTML version deleted]]
> 

Your example code is a complete mess.
Do NOT post in html. This is a plain text mailing list.
Read the Posting Guide (https://www.r-project.org/posting-guide.html).

Berend Hasselman]

> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] gsDesign Pocock & OBF boundary

2017-09-23 Thread array chip via R-help
Hi, 

I am learning to use your gsDesign package! I have a question about Pocock and 
OBF boundary. As far as Iunderstand, these 2 boundaries require equal spacing 
between interim analyses(maybe this is not correct?). But I can still use 
gsDesign to run an analysisbased on unequal spacing: 
gsDesign(k=2,test.type=2,timing=c(0.75,1),alpha=0.05,sfu='Pocock')Symmetrictwo-sided
 group sequential design with90 %power and 5 % Type I 
Error.Spendingcomputations assume trial stops if a bound is crossed.   
Sample   Size   AnalysisRatio*  Z   Nominal p  Spend1  0.796 
1.82    0.0346 0.03462  1.061 1.82    0.0346 0.0154Total
  0.0500  ++alpha spending:Pocockboundary.*Sample size ratio compared 
to fixed design with no interim Can anyone share some light whether the above 
analysis is stillvalid? Or for unequal spacing, I have to use Lan-Demet’s error 
spendingfunction approximations? Thank you,
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] deviance in GLM vs. summary.glm

2017-05-31 Thread array chip via R-help
Hi, I am running a logistic regression on a simple dataset (attached) using glm:
> dat<-read.table("dat.txt",sep='\t',header=T)
If I use summary() on a logistic model:
> summary(glm(y~x1*x2,dat,family='binomial'))
Coefficients:            Estimate Std. Error z value Pr(>|z|)(Intercept)    
19.57    5377.01   0.004    0.997x1            -18.59    5377.01  -0.003    
0.997x2B           -19.57    5377.01  -0.004    0.997x1:x2B         38.15    
7604.24   0.005    0.996
As you can see, the interaction term is very insignificant (p = 0.996)!
But if I use a anova() to compare a full vs reduced model to evaluate the 
interaction term:
> anova(glm(y~x1+x2,dat,family='binomial'), 
> glm(y~x1*x2,dat,family='binomial'))Analysis of Deviance Table
Model 1: y ~ x1 + x2Model 2: y ~ x1 * x2  Resid. Df Resid. Dev Df Deviance1     
   22     27.067            2        21     21.209  1   5.8579
This follows a chi-square distribution with 1 df, so the corresponding p value 
is:
> 1-pchisq(5.8679,1)[1] 0.01541944
So I get very different p value on the interaction term, can someone share 
what's going wrong here?
Thanks!
Yi

"y" "x1""x2"
"R" 1   "B"
"NR"1   "A"
"R" 0   "B"
"R" 1   "B"
"R" 1   "A"
"R" 1   "A"
"R" 0   "A"
"R" 1   "A"
"R" 0   "B"
"R" 0   "B"
"NR"0   "B"
"NR"0   "B"
"R" 1   "A"
"R" 1   "A"
"NR"1   "A"
"R" 0   "A"
"R" 0   "A"
"NR"0   "B"
"R" 0   "A"
"R" 1   "A"
"NR"1   "A"
"R" 1   "B"
"R" 1   "A"
"R" 1   "B"
"R" 1   "A"
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Lattice xyplot

2017-05-02 Thread array chip via R-help
Thanks all for the clarification!

  From: Jeff Newmiller <jdnew...@dcn.davis.ca.us>
 To: r-help@r-project.org; Bert Gunter <bgunter.4...@gmail.com>; array chip 
<arrayprof...@yahoo.com> 
Cc: "r-help@r-project.org" <r-help@r-project.org>
 Sent: Monday, May 1, 2017 10:53 AM
 Subject: Re: [R] Lattice xyplot
   
It is not a question of whether lattice "understands" the unsorted data... 
imagine trying to plot 4 points to form a square instead of a trend line... you 
would NOT want lattice to sort those points for you. That lattice leaves your 
data alone gives you more flexibility, even while it adds work for certain 
applications. 

-- 
Sent from my phone. Please excuse my brevity.

On May 1, 2017 7:34:09 AM PDT, Bert Gunter <bgunter.4...@gmail.com> wrote:
>Yes. type = "l" connects the points in the order given in the data, so
>if the x's are not already ordered, the plots will be different after
>ordering the x's.
>
>e.g.
>
>> x <- c(3,1,2,4,6,5)
>> y <- 11:16
>> xyplot(y~x. type = "l")
>
>
>As for why ... that's just the way it was designed. You can always
>order the data first, if you don't want this default.
>
>Cheers,
>Bert
>
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>On Sun, Apr 30, 2017 at 6:07 PM, array chip via R-help
><r-help@r-project.org> wrote:
>> Dear all, I am new to lattice, so would appreciate anyone's help on
>the questions below. I am using xyplot to plot some trend in my
>dataset. Using the example dataset attached, I am trying to plot
>variable "y" over variable "time" for each subject "id":
>> dat<-read.table("dat.txt",sep='\t',header=T,row.names=NULL)
>> xyplot(y ~ time, data=dat, groups=id, aspect = "fill", type = c("p",
>"l"),  xlab = "Time", ylab = "Y")
>>
>> It appears that it just worked fine. But if I sort the "dat" first,
>the plot will look somewhat different!
>> dat<-dat[order(dat$id, dat$time),]xyplot(y ~ time, data=dat,
>groups=id, aspect = "fill", type = c("p", "l"),  xlab = "Time", ylab =
>"Y")
>> Why is that? Do you need to sort the data first before using xyplot?
>Why xyplot can not understand the dataset unless it is sorted first?
>> Thanks,
>> John
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Lattice xyplot

2017-05-01 Thread array chip via R-help
Dear all, I am new to lattice, so would appreciate anyone's help on the 
questions below. I am using xyplot to plot some trend in my dataset. Using the 
example dataset attached, I am trying to plot variable "y" over variable "time" 
for each subject "id":
dat<-read.table("dat.txt",sep='\t',header=T,row.names=NULL)
xyplot(y ~ time, data=dat, groups=id, aspect = "fill", type = c("p", "l"),  
xlab = "Time", ylab = "Y")

It appears that it just worked fine. But if I sort the "dat" first, the plot 
will look somewhat different!
dat<-dat[order(dat$id, dat$time),]xyplot(y ~ time, data=dat, groups=id, aspect 
= "fill", type = c("p", "l"),  xlab = "Time", ylab = "Y")
Why is that? Do you need to sort the data first before using xyplot? Why xyplot 
can not understand the dataset unless it is sorted first?
Thanks,
Johnid  timey
107 0   6.5
107 1   5.1
107 2   4.2
112 0   5.6
107 3   6.9
112 1   4
112 2   1
119 0   7.6
112 3   4.9
119 1   5.7
120 0   7.1
203 0   7.4
120 2   2.1
203 1   6.3
123 0   6.8
203 2   3.8
119 3   6.1
123 1   3.9
123 2   3
120 3   6
203 3   7.6
207 0   5.8
207 1   3.1
123 3   5.7
209 3   3.6
208 0   4.4
130 0   5.5
131 0   6.9
133 0   5.7
134 0   5.1
209 0   4.9
128 2   2.9
128 1   4.5
130 2   5.9
131 1   6.9
133 2   2.6
133 1   5.7
403 2   3.1
403 0   4.5
128 0   4.5
134 2   2.3
207 3   4.8
130 3   4.9
207 2   2.3
130 1   3.8
131 2   3.9
133 3   3.1
134 1   2.8
209 1   4.2
208 3   2.3
208 1   5.3
208 2   0.8
128 3   4.4
131 3   6.2
209 2   5.7
134 3   4.1
403 3   2.5
119 2   2.3
120 1   3.8
403 1   3.2
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] standard error of survfit.coxph()

2014-07-21 Thread array chip
Dear Terry,

I was trying to use your explanation of the standard error estimate from 
survfit.coxph() to verify the standard error estimates for the method of 
log(log(S)), but couldn't get the estimates correct. Here is an example using 
the lung dataset:

 fit-coxph(Surv(time,status)~wt.loss,lung)
 surv-survfit(fit,newdata=lung[2:5,])$surv

 logstd-survfit(fit,newdata=lung[2:5,])$std.err  
 logstd.time-survfit(fit,newdata=lung[2:5,])$time


    ## std error of cumulative hazard at time=60

 logstd-logstd[logstd.time==60,]
 logstd
[1] 0.01965858 0.01965858 0.01941871 0.01969248

    ## survival S at months 60

 surv-surv[logstd.time==60,]  
 surv
    2 3 4 5 
0.9249238 0.9249238 0.9253038 0.9263392

Since survfit()$std.err was for cumulative hazard H=log(S), thus based on your 
explanation below, the standard error for log-log method is: (1/S^2)var(H)

 loglogstd-sqrt(1/surv^2*(logstd^2))
 loglogstd

 2  3  4  5 
0.02125427 0.02125427 0.02098631 0.02125839

    ## the upper confidence interval should be
 exp(-exp(log(-log(surv))-qnorm(0.975)*loglogstd))
    2 3 4 5 
0.9278737 0.9278737 0.9282031 0.9292363

But this is different from the output using summary.survfit:

 summary(survfit(fit,newdata=lung[2:5,],conf.type='log-log'),times=60)$upper[1,]
    2 3 4 5 
0.9534814 0.9534814 0.9535646 0.9548478

Can you please suggest what I did wrong in the calculation?

Thanks very much!

John





To: Therneau, Terry M., Ph.D. thern...@mayo.edu; r-help@r-project.org 
r-help@r-project.org 
Sent: Monday, June 30, 2014 10:46 AM
Subject: Re:  standard error of survfit.coxph()



[[elided Yahoo spam]]

John




 From: Therneau, Terry M., Ph.D. thern...@mayo.edu

Sent: Monday, June 30, 2014 6:04 AM
Subject: Re:  standard error of survfit.coxph()


1. The computations behind the scenes produce the variance of the cumulative 
hazard. 
This is true for both an ordinary Kaplan-Meier and a Cox model.  
Transformations to other 
scales are done using simple Taylor series.

   H = cumulative hazard = log(S);  S=survival
   var(H) = var(log(S))  = the starting point
   S = exp(log(S)), so  var(S) is approx [deriv of exp(x)]^2 * var(log(S)) = 
S^2 var(H)
   var(log(log(S)) is approx (1/S^2) var(H)

2. At the time it was written, summary.survfit was used only for
 printing out the survival 
curve at selected times, and the audience for the printout wanted std(S).   
True, that was 
20 years ago, but I don't recall anyone ever asking for summary to do anything 
else.  Your 
request is not a bad idea.
   Note however that the primary impact of using log(S) or S or log(log(S)) 
scale is is on 
the confidence intervals, and they do appear per request in the summary output.

Terry T.


On 06/28/2014 05:00 AM, r-help-requ...@r-project.org wrote:
 Message: 9
 Date: Fri, 27 Jun 2014 12:39:29 -0700

 To:r-help@r-project.org  r-help@r-project.org
 Subject: [R] standard error of survfit.coxph()
 Message-ID:
     1403897969.91269.yahoomail...@web122906.mail.ne1.yahoo.com
 Content-Type: text/plain

 Hi, can anyone help me to understand the standard errors printed in the 
 output of survfit.coxph()?

 time-sample(1:15,100,replace=T)

 status-as.numeric(runif(100,0,1)0.2)
 x-rnorm(100,10,2)

 fit-coxph(Surv(time,status)~x)
 ??? ### method 1

 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
 conf.type='log')$std.err

  [,1]??? [,2]??? [,3]???
 [,4]?? [,5]
 ?[1,] 0.0 0.0 0.0 0.0 0.
 ?[2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
 ?[3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
 ?[4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
 ?[5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
 ?[6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
 ?[7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
 ?[8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
 ?[9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394
 [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
 [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
 [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413
 [13,] 0.070665160 0.071363749 0.069208056 0.066655730
 0.14976433
 [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860
 [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111

 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
 conf.type='log')$time
 ?[1]? 1? 2? 3? 4? 5? 6? 7? 8? 9 10 11 12 13 14 15

 ??? ### method 2

 summary(survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
 conf.type='log'),time=10)$std.err

 ? 1? 2? 3? 4? 5
 [1,] 0.04061384 0.04106186 0.03963184 0.03715246 0.06867532

Re: [R] standard error of survfit.coxph()

2014-07-21 Thread array chip
Terry, I figured out that variance of log(-log(S)) should be (1/H^2)var(H), not 
(1/S^2)var(H)!

Thanks

John






e...@mayo.edu; r-help@r-project.org r-help@r-project.org 
Sent: Monday, July 21, 2014 11:41 AM
Subject: Re:  standard error of survfit.coxph()



Dear Terry,

I was trying to use your explanation of the standard error estimate from 
survfit.coxph() to verify the standard error estimates for the method of 
log(log(S)), but couldn't get the estimates correct. Here is an example using 
the lung dataset:

 fit-coxph(Surv(time,status)~wt.loss,lung)
 surv-survfit(fit,newdata=lung[2:5,])$surv

 logstd-survfit(fit,newdata=lung[2:5,])$std.err  
 logstd.time-survfit(fit,newdata=lung[2:5,])$time


    ## std error of cumulative hazard at time=60

 logstd-logstd[logstd.time==60,]
 logstd
[1] 0.01965858 0.01965858 0.01941871 0.01969248

    ## survival S at months 60

 surv-surv[logstd.time==60,]  
 surv
    2 3 4 5 
0.9249238 0.9249238 0.9253038 0.9263392

Since survfit()$std.err was for cumulative hazard H=log(S), thus based on your 
explanation below, the standard error for log-log method is: (1/S^2)var(H)

 loglogstd-sqrt(1/surv^2*(logstd^2))
 loglogstd

 2  3  4  5 
0.02125427 0.02125427 0.02098631 0.02125839

    ## the upper confidence interval should be
 exp(-exp(log(-log(surv))-qnorm(0.975)*loglogstd))
    2
 3 4 5 
0.9278737 0.9278737 0.9282031 0.9292363

But this is different from the output using summary.survfit:

 summary(survfit(fit,newdata=lung[2:5,],conf.type='log-log'),times=60)$upper[1,]
    2 3 4 5 
0.9534814 0.9534814 0.9535646 0.9548478

Can you please suggest what I did wrong in the calculation?

Thanks very much!

John





To: Therneau, Terry M., Ph.D. thern...@mayo.edu; r-help@r-project.org 
r-help@r-project.org 
Sent: Monday, June 30, 2014 10:46 AM
Subject: Re:  standard error of survfit.coxph()



[[elided Yahoo spam]]

John




 From: Therneau, Terry M., Ph.D. thern...@mayo.edu

Sent: Monday, June 30, 2014 6:04 AM
Subject: Re:  standard error of survfit.coxph()


1. The computations behind the scenes produce the variance of the cumulative 
hazard. 
This is true for both an ordinary Kaplan-Meier and a Cox model.  
Transformations to other 
scales are done using simple Taylor series.

   H = cumulative hazard = log(S);  S=survival
   var(H) = var(log(S))  = the starting point
   S = exp(log(S)), so  var(S) is approx [deriv of exp(x)]^2 * var(log(S)) = 
S^2 var(H)
   var(log(log(S)) is approx (1/S^2) var(H)

2. At the time it was written, summary.survfit was used only for
 printing out the survival 
curve at selected times, and the audience for the printout wanted std(S).   
True, that was 
20 years ago, but I don't recall anyone ever asking for summary to do anything 
else.  Your 
request is not a bad idea.
   Note however that the primary impact of using log(S) or S or log(log(S)) 
scale is is on 
the confidence intervals, and they do appear per request in the summary output.

Terry T.


On 06/28/2014 05:00 AM, r-help-requ...@r-project.org wrote:
 Message: 9
 Date: Fri, 27 Jun 2014 12:39:29 -0700

 To:r-help@r-project.org  r-help@r-project.org
 Subject: [R] standard error of survfit.coxph()
 Message-ID:
     1403897969.91269.yahoomail...@web122906.mail.ne1.yahoo.com
 Content-Type: text/plain

 Hi, can anyone help me to understand the standard errors printed in the 
 output of survfit.coxph()?

 time-sample(1:15,100,replace=T)

 status-as.numeric(runif(100,0,1)0.2)
 x-rnorm(100,10,2)


 fit-coxph(Surv(time,status)~x)
 ??? ### method 1

 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
 conf.type='log')$std.err

  [,1]??? [,2]??? [,3]???
 [,4]?? [,5]
 ?[1,] 0.0 0.0 0.0 0.0 0.
 ?[2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
 ?[3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
 ?[4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
 ?[5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
 ?[6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
 ?[7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
 ?[8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
 ?[9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394
 [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
 [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
 [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413
 [13,] 0.070665160 0.071363749 0.069208056 0.066655730
 0.14976433
 [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860
 [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111

 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
 conf.type='log')$time
 ?[1]? 1? 

Re: [R] standard error of survfit.coxph()

2014-07-21 Thread array chip


Dear Terry/All,

I was trying to use your explanation of the standard error estimate from 
survfit.coxph() to verify the standard error estimates for the method of 
log(log(S)), but couldn't get the estimates correct. Here is an example using 
the lung dataset:

 fit-coxph(Surv(time,status)~wt.loss,lung)
 surv-survfit(fit,newdata=lung[2:5,])$surv

 logstd-survfit(fit,newdata=lung[2:5,])$std.err  
 logstd.time-survfit(fit,newdata=lung[2:5,])$time


    ## std error of cumulative hazard at time=60

 logstd-logstd[logstd.time==60,]
 logstd
[1] 0.01965858 0.01965858 0.01941871 0.01969248

    ## survival S at months 60

 surv-surv[logstd.time==60,]  
 surv
    2 3 4 5 
0.9249238 0.9249238 0.9253038 0.9263392

Since survfit()$std.err was for cumulative hazard H=log(S), thus based on your 
explanation below, the standard error for log-log method is: (1/S^2)var(H)

 loglogstd-sqrt(1/surv^2*(logstd^2))
 loglogstd

 2  3  4  5 
0.02125427 0.02125427 0.02098631 0.02125839

    ## the upper confidence interval should be
 exp(-exp(log(-log(surv))-qnorm(0.975)*loglogstd))
    2
 3 4 5 
0.9278737 0.9278737 0.9282031 0.9292363

But this is different from the output using summary.survfit:

 summary(survfit(fit,newdata=lung[2:5,],conf.type='log-log'),times=60)$upper[1,]
    2 3 4 5 
0.9534814 0.9534814 0.9535646 0.9548478

Can you please suggest what I did wrong in the calculation?

Thanks very much!

John




 From: Therneau, Terry M., Ph.D. thern...@mayo.edu

Sent: Monday, June 30, 2014 6:04 AM
Subject: Re:  standard error of survfit.coxph()


1. The computations behind the scenes produce the variance of the cumulative 
hazard. 
This is true for both an ordinary Kaplan-Meier and a Cox model.  
Transformations to other 
scales are done using simple Taylor series.

   H = cumulative hazard = log(S);  S=survival
   var(H) = var(log(S))  = the starting point
   S = exp(log(S)), so  var(S) is approx [deriv of exp(x)]^2 * var(log(S)) = 
S^2 var(H)
   var(log(log(S)) is approx (1/S^2) var(H)

2. At the time it was written, summary.survfit was used only for printing out 
the survival 
curve at selected times, and the audience for the printout wanted std(S).   
True, that was 
20 years ago, but I don't recall anyone ever asking for summary to do anything 
else.  Your 
request is not a bad idea.
   Note however that the primary impact of using log(S) or S or log(log(S)) 
scale is is on 
the confidence intervals, and they do appear per request in the summary output.

Terry T.


On 06/28/2014 05:00 AM, r-help-requ...@r-project.org wrote:
 Message: 9
 Date: Fri, 27 Jun 2014 12:39:29 -0700

 To:r-help@r-project.org  r-help@r-project.org
 Subject: [R] standard error of survfit.coxph()
 Message-ID:
     1403897969.91269.yahoomail...@web122906.mail.ne1.yahoo.com
 Content-Type: text/plain

 Hi, can anyone help me to understand the standard errors printed in the 
 output of survfit.coxph()?

 time-sample(1:15,100,replace=T)

 status-as.numeric(runif(100,0,1)0.2)
 x-rnorm(100,10,2)

 fit-coxph(Surv(time,status)~x)
 ??? ### method 1

 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
 conf.type='log')$std.err

  [,1]??? [,2]??? [,3]??? [,4]?? [,5]
 ?[1,] 0.0 0.0 0.0 0.0 0.
 ?[2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
 ?[3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
 ?[4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
 ?[5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
 ?[6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
 ?[7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
 ?[8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
 ?[9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394
 [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
 [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
 [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413
 [13,] 0.070665160 0.071363749 0.069208056 0.066655730 0.14976433
 [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860
 [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111

 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
 conf.type='log')$time
 ?[1]? 1? 2? 3? 4? 5? 6? 7? 8? 9 10 11 12 13 14 15

 ??? ### method 2

 summary(survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
 conf.type='log'),time=10)$std.err

 ? 1? 2? 3? 4? 5
 [1,] 0.04061384 0.04106186 0.03963184 0.03715246 0.06867532

 By reading the help of ?survfit.object and ?summary.survfit, the standard 
 error provided in the output of method 1 (survfit()) was for cumulative 
 hazard-log(survival), while the standard error provided in the output of 
 method 2 

Re: [R] standard error of survfit.coxph()

2014-06-30 Thread array chip
Thank you Terry for the explanation!

John




 From: Therneau, Terry M., Ph.D. thern...@mayo.edu

Sent: Monday, June 30, 2014 6:04 AM
Subject: Re:  standard error of survfit.coxph()


1. The computations behind the scenes produce the variance of the cumulative 
hazard. 
This is true for both an ordinary Kaplan-Meier and a Cox model.  
Transformations to other 
scales are done using simple Taylor series.

   H = cumulative hazard = log(S);  S=survival
   var(H) = var(log(S))  = the starting point
   S = exp(log(S)), so  var(S) is approx [deriv of exp(x)]^2 * var(log(S)) = 
S^2 var(H)
   var(log(log(S)) is approx (1/S^2) var(H)

2. At the time it was written, summary.survfit was used only for printing out 
the survival 
curve at selected times, and the audience for the printout wanted std(S).   
True, that was 
20 years ago, but I don't recall anyone ever asking for summary to do anything 
else.  Your 
request is not a bad idea.
   Note however that the primary impact of using log(S) or S or log(log(S)) 
scale is is on 
the confidence intervals, and they do appear per request in the summary output.

Terry T.


On 06/28/2014 05:00 AM, r-help-requ...@r-project.org wrote:
 Message: 9
 Date: Fri, 27 Jun 2014 12:39:29 -0700

 To:r-help@r-project.org  r-help@r-project.org
 Subject: [R] standard error of survfit.coxph()
 Message-ID:
     1403897969.91269.yahoomail...@web122906.mail.ne1.yahoo.com
 Content-Type: text/plain

 Hi, can anyone help me to understand the standard errors printed in the 
 output of survfit.coxph()?

 time-sample(1:15,100,replace=T)

 status-as.numeric(runif(100,0,1)0.2)
 x-rnorm(100,10,2)

 fit-coxph(Surv(time,status)~x)
 ??? ### method 1

 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
 conf.type='log')$std.err

  [,1]??? [,2]??? [,3]??? [,4]?? [,5]
 ?[1,] 0.0 0.0 0.0 0.0 0.
 ?[2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
 ?[3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
 ?[4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
 ?[5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
 ?[6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
 ?[7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
 ?[8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
 ?[9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394
 [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
 [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
 [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413
 [13,] 0.070665160 0.071363749 0.069208056 0.066655730 0.14976433
 [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860
 [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111

 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
 conf.type='log')$time
 ?[1]? 1? 2? 3? 4? 5? 6? 7? 8? 9 10 11 12 13 14 15

 ??? ### method 2

 summary(survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
 conf.type='log'),time=10)$std.err

 ? 1? 2? 3? 4? 5
 [1,] 0.04061384 0.04106186 0.03963184 0.03715246 0.06867532

 By reading the help of ?survfit.object and ?summary.survfit, the standard 
 error provided in the output of method 1 (survfit()) was for cumulative 
 hazard-log(survival), while the standard error provided in the output of 
 method 2 (summary.survfit()) was for survival itself, regardless of how you 
 choose the value for conf.type ('log', 'log-log' or 'plain'). This explains 
 why the standard error output is different between method 1 (10th row) and 
 method 2.

 My question is how do I get standard error estimates for log(-log(survival))?

 Thanks!

 John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] standard error of survfit.coxph()

2014-06-27 Thread array chip
Hi, can anyone help me to understand the standard errors printed in the output 
of survfit.coxph()?

time-sample(1:15,100,replace=T)

status-as.numeric(runif(100,0,1)0.2)
x-rnorm(100,10,2)

fit-coxph(Surv(time,status)~x)
    ### method 1

survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
conf.type='log')$std.err

 [,1]    [,2]    [,3]    [,4]   [,5]
 [1,] 0.0 0.0 0.0 0.0 0.
 [2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
 [3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
 [4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
 [5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
 [6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
 [7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
 [8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
 [9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394
[10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
[11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
[12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413
[13,] 0.070665160 0.071363749 0.069208056 0.066655730 0.14976433
[14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860
[15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111

survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
conf.type='log')$time
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15

    ### method 2

summary(survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], 
conf.type='log'),time=10)$std.err

  1  2  3  4  5
[1,] 0.04061384 0.04106186 0.03963184 0.03715246 0.06867532

By reading the help of ?survfit.object and ?summary.survfit, the standard error 
provided in the output of method 1 (survfit()) was for cumulative 
hazard-log(survival), while the standard error provided in the output of method 
2 (summary.survfit()) was for survival itself, regardless of how you choose the 
value for conf.type ('log', 'log-log' or 'plain'). This explains why the 
standard error output is different between method 1 (10th row) and method 2.

My question is how do I get standard error estimates for log(-log(survival))?

Thanks!

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] prediction based on conditional logistic regression clogit

2014-06-16 Thread array chip
Thank you Peter. Any other suggestions are absolutely welcome!!

John




 From: peter dalgaard pda...@gmail.com

Cc: r-help@r-project.org r-help@r-project.org 
Sent: Monday, June 16, 2014 2:22 AM
Subject: Re: [R] prediction based on conditional logistic regression clogit





 Hi, I am using clogit() from survival package to do conditional logistic 
 regression. I also need to make prediction on an independent dataset to 
 calculate predicted probability. Here is an example:
 
 
 dat - data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), 
 x1=rnorm(150,5,1), x2=rnorm(150,7,1.5))
 dat.test - data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), 
 x1=rnorm(90,5,1), x2=rnorm(90,7,1.5))
 fit-clogit(status~x1+x2+strata(set),dat)
 predict(fit,newdata=dat.test,type='expected')
 Error in Surv(rep(1, 150L), status) : 
   Time and status are different lengths
 
 Can anyone suggest what's wrong here?
 


The direct cause is that clogit() works by using the fact that the likelihood 
is equivalent to a coxph() likelihood with stratification and all observation 
lengths set to 1. Therefore the analysis is formally on Surv(rep(1, 150L), 
status) and that goes belly-up if you apply the same formula to a data set of 
different length. 

However, notice that there is no such thing as predict.clogit(), so you are 
attempting predict.coxph() on a purely formal Cox model. It is unclear to what 
extent predicted values, in the sense of coxph() are compatible with 
predictions in conditional logit models.

I'm rusty on this, but I think what you want is something like

m - model.matrix(~ x1 + x2 - 1, data=dat.test)
pp - exp(m %*% coef(fit))
pps - ave(pp, dat.test$set, FUN=sum)
pp/pps

i.e. the conditional probability that an observation is a case given covariates 
and that there is on case in each set (in the data given, you have sets of 
three with one being a case, so all predicted probabilities are close to 0.33). 
For more general matched sets, I'm not really sure what one wants. Real experts 
are welcome to chime in.

-pd




 Thanks!
 
 John
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] prediction based on conditional logistic regression clogit

2014-06-15 Thread array chip
Hi, I am using clogit() from survival package to do conditional logistic 
regression. I also need to make prediction on an independent dataset to 
calculate predicted probability. Here is an example:


 dat - data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), 
 x1=rnorm(150,5,1), x2=rnorm(150,7,1.5))
 dat.test - data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), 
 x1=rnorm(90,5,1), x2=rnorm(90,7,1.5))
 fit-clogit(status~x1+x2+strata(set),dat)
 predict(fit,newdata=dat.test,type='expected')
Error in Surv(rep(1, 150L), status) : 
  Time and status are different lengths

Can anyone suggest what's wrong here?

Thanks!

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survival estimates for covariate values

2014-05-15 Thread array chip
Hi all, let's say we can fit a Cox model with a numeric variable x as the 
independent variable. The we can calculate, say 10-year survival, for any given 
value of x (0 to 10 in increment of 0.1 in the example below):


 fit - coxph(Surv(time, event)~x,dat)
 surv10yr-
summary(survfit(fit,newdata=data.frame(x=seq(0,10,0.1))),times=10)

I am wondering if anyone can show me how to replicate this in SAS?

very much appreciated!

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survival estimates for covariate values

2014-05-14 Thread array chip
Hi all, let's say we can fit a Cox model with a numeric variable x as the 
independent variable. The we can calculate, say 10-year survival, for any given 
value of x (0 to 10 in increment of 0.1 in the example below):

 fit - coxph(Surv(time, event)~x,dat)
 surv10yr-
summary(survfit(fit,newdata=data.frame(x=seq(0,10,0.1))),times=10)

I am wondering if anyone can show me how to replicate this in SAS?

very much appreciated!

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] median survival

2014-01-28 Thread array chip
Hi, if 50% survival probability horizontal line in a Kaplan-Meier survival 
curve overlap one of the step line between 2 time points t1 and t2, the 
survfit() from survival package estimates median survival as t2 (the longest 
time point). But I saw some articles (page 23: 
http://www.amstat.org/chapters/northeasternillinois/pastevents/presentations/summer05_Ibrahim_J.pdf)
 recommend the smallest time t such time S(t)=0.5. 

What is the convention for the definition of median survival?

Thanks

John

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] unique rows

2014-01-28 Thread array chip
Hi, I wanted to remove redundant rows (with same entry in columns) in a data 
frame. For example, with this data frame:

 dat-cbind(x=c('a','a','b','b','c','c'),y=c('x','x','d','s','g','g'))
 dat
 x   y  
[1,] a x
[2,] a x
[3,] b d
[4,] b s
[5,] c g
[6,] c g

after removing the redundancy, the end results should be

 x   y  
[1,] a x
[2,] b d
[3,] b s
[4,] c g

what is the best way to do this?

Thanks

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] unique rows

2014-01-28 Thread array chip
sorry.. don't know unique().. such a great function





 From: Bert Gunter gunter.ber...@gene.com

Cc: r-help@r-project.org r-help@r-project.org 
Sent: Tuesday, January 28, 2014 2:21 PM
Subject: Re: [R] unique rows


Inline.

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
H. Gilbert Welch





 Hi, I wanted to remove redundant rows (with same entry in columns) in a data 
 frame. For example, with this data frame:

 dat-cbind(x=c('a','a','b','b','c','c'),y=c('x','x','d','s','g','g'))

## this is not a data frame.

And would you kindly explain why you posted here instead of reading ?unique.

-- Bert


 dat
      x   y
 [1,] a x
 [2,] a x
 [3,] b d
 [4,] b s
 [5,] c g
 [6,] c g

 after removing the redundancy, the end results should be

      x   y
 [1,] a x
 [2,] b d
 [3,] b s
 [4,] c g

 what is the best way to do this?

 Thanks

 John
         [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] median survival

2014-01-28 Thread array chip

please ignore. actually the median survival from survfit() is the mean of the 2 
time points.






To: R help r-help@r-project.org 
Sent: Tuesday, January 28, 2014 11:27 AM
Subject: [R] median survival


Hi, if 50% survival probability horizontal line in a Kaplan-Meier survival 
curve overlap one of the step line between 2 time points t1 and t2, the 
survfit() from survival package estimates median survival as t2 (the longest 
time point). But I saw some articles (page 23: 
http://www.amstat.org/chapters/northeasternillinois/pastevents/presentations/summer05_Ibrahim_J.pdf)
 recommend the smallest time t such time S(t)=0.5. 

What is the convention for the definition of median survival?

Thanks

John

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Installation folder

2014-01-08 Thread array chip
Hi, I noticed that when I install/update packages, the installation folder is 
C:/User/My Document/R, not in C:/Program Files/R. R itself was still in Program 
Files folder. Don't know how this has happened. It used to work 
ok.br/br/Any clues or how to correct the problem is 
appreciated!br/br/Thanksbr/br/Johna 
href=http://overview.mail.yahoo.com?.src=iOS;br/br/Sent from Yahoo Mail 
for iPhone/a
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different colours for LatticeExtra graphs

2014-01-08 Thread array chip
a href=http://overview.mail.yahoo.com?.src=iOS;br/br/Sent from Yahoo 
Mail for iPhone/a
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different colours for LatticeExtra graphs

2014-01-08 Thread array chip
a href=http://overview.mail.yahoo.com?.src=iOS;br/br/Sent from Yahoo 
Mail for iPhone/a
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] estimating survival function from Cox model

2013-12-17 Thread array chip
Hi, I have some questions on how to estimate the survival function from a Cox 
model. I know how to do this in R using survfit(). 


But let's say the model was done is another software, and I was only given the 
estimate of baseline cumulative hazard A0(t=10) at the specified time t=10 
(baseline cumulative hazard refers to when covariate X=0)and the beta estimate 
b for the covariate used in Cox model X. 


So the survival function at time 10 for a given covariate value x can be 
calculated as:

A(t=10|X=x) = exp(b*x)*A0(t=10) where A is cumulative hazard when X=x
S(t=10|X=x) = exp(-A(t=10|X=x)) where S is survival function to be calculated

Now I want to calculate confidence interval for S(t=10|X=x). I think I need to 
calculate the CI for cumulative hazard A(t=10|X=x) first and then exponentiate 
it to get CI for S, based on the relationship S = exp(-A).

To get CI for A, I need to calculate the estimate of standard error of A. I 
know the other software can give me the standard error of A0, the baseline 
cumulative hazard. Based on the relationship A = exp(b*x)*A0, I guess I'll need 
the standard error for b. But how do I calculate the standard error for A based 
on standard errors for A0 and b?

Any insights would be greatly appreciated!

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] estimating survival function from Cox model

2013-12-17 Thread array chip
Hi, I have some questions on how to estimate the survival function from a Cox 
model. I know how to do this in R using survfit(). 


But let's say the model was done is another software, and I was only given the 
estimate of baseline cumulative hazard A0(t=10) at the specified time t=10 
(baseline cumulative hazard refers to when covariate X=0)and the beta estimate 
b for the covariate used in Cox model X. 


So the survival function at time 10 for a given covariate value x can be 
calculated as:

A(t=10|X=x) = exp(b*x)*A0(t=10) where A is cumulative hazard when X=x
S(t=10|X=x) = exp(-A(t=10|X=x)) where S is survival function to be calculated

Now I want to calculate confidence interval for S(t=10|X=x). I think I need to 
calculate the CI for cumulative hazard A(t=10|X=x) first and then exponentiate 
it to get CI for S, based on the relationship S = exp(-A).

To get CI for A, I need to calculate the estimate of standard error of A. I 
know the other software can give me the standard error of A0, the baseline 
cumulative hazard. Based on the relationship A = exp(b*x)*A0, I guess I'll need 
the standard error for b. But how do I calculate the standard error for A based 
on standard errors for A0 and b?

Any insights would be greatly appreciated!

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] estimating survival function from Cox model

2013-12-17 Thread array chip
Hi, I have some questions on how to estimate the survival function from a Cox 
model. I know how to do this in R using survfit(). 


But let's say the model was done is another software, and I was only given the 
estimate of baseline cumulative hazard A0(t=10) at the specified time t=10 
(baseline cumulative hazard refers to when covariate X=0)and the beta estimate 
b for the covariate used in Cox model X. 


So the survival function at time 10 for a given covariate value x can be 
calculated as:

A(t=10|X=x) = exp(b*x)*A0(t=10) where A is cumulative hazard when X=x
S(t=10|X=x) = exp(-A(t=10|X=x)) where S is survival function to be calculated

Now I want to calculate confidence interval for S(t=10|X=x). I think I need to 
calculate the CI for cumulative hazard A(t=10|X=x) first and then exponentiate 
it to get CI for S, based on the relationship S = exp(-A).

To get CI for A, I need to calculate the estimate of standard error of A. I 
know the other software can give me the standard error of A0, the baseline 
cumulative hazard. Based on the relationship A = exp(b*x)*A0, I guess I'll need 
the standard error for b. But how do I calculate the standard error for A based 
on standard errors for A0 and b?

Any insights would be greatly appreciated!

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] power analysis is applicable or not

2013-11-12 Thread array chip
Hi, this is a statistical question rather than a pure R question. I have got 
many help from R mailing list in the past, so would like to try here and 
appreciate any input:

I conducted Mantel-Haenszel test to show that the performance of a diagnostic 
test did not show heterogeneity among 4 study sites, i.e. Mantel Haenszel test 
p value  0.05,  so that I could conduct a meta-analysis by combining data of 
all 4 study sites. 

Now one of the reviewers for the manuscript did a powering analysis for Mantel 
Haneszel test showing that with the sample sizes I have, the power for Mantel 
Haeszel test was only 50%. So he argued that I did not have enough power for 
Mantel Haenszel test.

My usage of Mantel Haenszel was NOT to show a significant p value, instead a 
non-sginificant p value was what I was looking for because non-significant p 
value indicate NO heterogeneity among study sites. Powering analysis in general 
is to show whether you have enough sample size to ensure a statistical 
significant difference can be seen with certain likelihood. But this is not how 
I used Mantel Haenszel test. So I think in my scenario, the power analysis is 
NOT applicable because I am simply using the test to demonstrate a 
non-significant p value.

Am I correct on this view?

Thanks and appreciate any thoughts.

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] power analysis is applicable or not

2013-11-12 Thread array chip
Hi Chris,

Thanks for sharing your thoughts.

The reviewer used the heterogeneity that I observed in my study for the power 
analysis. I understand what you have descried. And I agree that with the sample 
size I have, I do not have enough power to detect the heterogeneity that I 
observed with significance.

But if let's say I have enough sample size as calculated by the power analysis, 
then I will have 80% power to detect the heterogeneity, would it be true that I 
will almost very unlikely to declare homogeneity among study sites, so that I 
would almost never be able to combine study sites? This goes to the general 
thinking that if you have a sample size large enough, you will always be able 
to make any difference statistically significant...

On the the hand, making a statistical inference using any statistical test 
(including Mantel Haenszel test), I though, is always valid regardless of 
sample size. For the heterogeneity test, I am just doing that -- making a 
statistical inference with the p value from Mantel Haenszel test.  I am not 
sure if it is correct that it is mandatory to perform a power analysis before 
attempting a statistical test.


Please share your thoughts...

Thanks

John



 From: Christopher W. Ryan cr...@binghamton.edu

Sent: Tuesday, November 12, 2013 6:53 PM
Subject: Re: [R] power analysis is applicable or not


John--

Well, my simple-minded way of thinking about these issues goes something
like this:

You want to know if there is heterogeneity. You gather some data and do
your MH analysis. You never know *for sure* whether there is *really*
heterogeneity in your population; all you know is whether there is any
in your sample--you concluded there was not. Your reviewer calculated
that with the sample size you used, *even if there was heterogeneity in
your population* (unknowable by you or anyone else) then your sample
size only had a 50% probability of detecting it (a 50% probability of
coming up with a p  0.05).  Meaning there *could have been*
heterogeneity there, at a 0.05 signficance level, and you *would* have
seen it, *if* your sample size was larger.

It's when you come up with a non-significant result that the issue of
power is most relevant. If you already have a significant result, then
yes, your sample size was large enough to show a significant result.

An important question is: what *magnitude* of heterogeneity did your
reviewer assume he/she was looking for when he/she did the power
calculation?  And is that magnitude meaningful?

All this being said, power calculations are best done before recruiting
subjects or gathering data.

--Chris Ryan
SUNY Upstate Medical University
Binghamton, NY

array chip wrote:
 Hi, this is a statistical question rather than a pure R question. I have got 
 many help from R mailing list in the past, so would like to try here and 
 appreciate any input:
 
 I conducted Mantel-Haenszel test to show that the performance of a diagnostic 
 test did not show heterogeneity among 4 study sites, i.e. Mantel Haenszel 
 test p value  0.05,  so that I could conduct a meta-analysis by combining 
 data of all 4 study sites. 
 
 Now one of the reviewers for the manuscript did a powering analysis for 
 Mantel Haneszel test showing that with the sample sizes I have, the power for 
 Mantel Haeszel test was only 50%. So he argued that I did not have enough 
 power for Mantel Haenszel test.
 
 My usage of Mantel Haenszel was NOT to show a significant p value, instead a 
 non-sginificant p value was what I was looking for because non-significant p 
 value indicate NO heterogeneity among study sites. Powering analysis in 
 general is to show whether you have enough sample size to ensure a 
 statistical significant difference can be seen with certain likelihood. But 
 this is not how I used Mantel Haenszel test. So I think in my scenario, the 
 power analysis is NOT applicable because I am simply using the test to 
 demonstrate a non-significant p value.
 
 Am I correct on this view?
 
 Thanks and appreciate any thoughts.
 
 John
     [[alternative HTML version deleted]]
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sparse PCA using nsprcomp package

2013-09-11 Thread array chip
Hi Christian,

Thank you so much for sharing your thoughts, I was a real pleasure to read and 
learn! Approximately when do you expect the new release of the package?


Best,

John




 From: Christian Sigg r-h...@sigg-iten.ch

Cc: r-help@r-project.org r-help@r-project.org 
Sent: Monday, September 9, 2013 8:06 AM
Subject: Re: [R] sparse PCA using nsprcomp package
 

Hi John

 1). Assume now I can calculate these adjusted standard deviation from 
 sparse PCA, should the percent variation explained by each sparse PC be 
 calculated using the sum of all these adjusted variance (i.e. square of the 
 adjusted standard deviation) as the denominator (then these percent 
 variation explained will always add up to 1 if all sparse PCs are counted, or 
 using the sum of the PC variances estimated by REGULAR PCA as the denominator 
 (then, adding up all PCs may not be equal to 1)?

It depends on what you want to do with this percentage, but to me the second 
would be more meaningful. A sparse PCA will usually be truncated (fewer than 
all possible components are computed), and due to the additional constraints on 
the principal axes you will usually explain less variance than with standard 
PCA. I would want to know what I lose in a sparse PCA w.r.t. a standard PCA.

Note that you don't actually have to compute the standard PCA if you are only 
interested in the total variance of the data, i.e. the sum of all variances. 
The total variance 

1/(n-1)*sum(diag(t(X)%*%X)) 

for the zero-mean data matrix X is invariant to a rotation of the coordinate 
system and therefore identical to

Z - X%*%W
1/(n-1)*sum(diag(t(Z)%*%Z)) 

so you can skip computing the PCA rotation matrix W. The fastest way to compute 
the total variance is probably

1/(n-1)*sum(X^2) 

because all expressions compute the squared Frobenius norm of X. 

If you want to compare variances of individual components, then compute a 
regular PCA.

I also had a look how the spca function computes the percentage explained 
variation. I don't yet entirely understand what is going on, but the results 
differ from using the asdev function I mentioned in my previous reply. Keep 
that in mind if you want to compare nsprcomp to spca.

 2). How do you choose the 2 important parameters in nsprcomp(), ncomp and k? 
 If for example, my regular PCA showed that I need 20 PCs to account for 80% 
 of the variation in my dataset, does it mean I should set ncomp=20? And then 
 what about any rules setting the value of k?

I don't have any hard answers for this question. 

There are a number of heuristics for choosing the number of components in 
regular PCA (e.g. the PCA book by Jolliffe presents several), and some of them 
should translate to sparse PCA. If you think that 20 PCs or 80% explained 
variance works well for regular PCA, I suggest also using 20 components in 
sparse PCA, then measure the explained variance and then increase the number of 
components (if necessary) to again achieve 80% explained variance.

Same for setting the cardinality parameter k. You could use a criterion such as 
BIC to optimize the trade-off between model fidelity and complexity, but I 
don't have any experience how well this works in practice. What I did so far 
was to check for loadings with small magnitudes. I set k such that all loadings 
have substantial magnitudes.

In the end what matters is what follows after running the algorithm. Are you 
directly interpreting the sparse PCA result, or is this an intermediate step in 
a complete data processing pipeline with a measurable goal? If the latter, 
choose the algortihm parameters that give you the best results at the end of 
the pipeline.

 3). Would you recommend nscumcomp() or nsprcomp() in general?

It depends whether you want to perform a sequential or cumulative analysis of 
your data. If you want maximum variance in the first (second, third, ...) PC, 
and specify the precise cardinality of each principal axis, then use nsprcomp. 
If instead you want to only specify the total cardinality of all loadings and 
leave the distribution of non-zero loadings to the algorithm, use nscumcomp.

There will be substantial improvements to nscumcomp in the next release, if you 
want to use it I suggest you wait until then.

Regards
Christian
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sparse PCA using nsprcomp package

2013-09-05 Thread array chip
Hi all, I am using nsprcomp() from nsprcomp package to run sparse PCA. The 
output is very much like regular PCA by prcomp() in that it provides sdev for 
standard deviation of principle components (PC). 


For regular PCA by prcomp(), we can easily calculate the percent of total 
variance explained by the first k PCs by using cumsum(obj$sdev^2) because these 
PCs are independent of each other so you can simply add up the variance of 
these PCs. For sparse PCA, as far as I understand, the generated PCs are not 
independent of each other anymore, so you can not simply add up variances to 
calculate percentage of variance explained by the first k PCs. For example, in 
the package of elasticnet where spca() also performs sparse PCA, one of the 
output from spca() is pev for percent explained variation which is based on 
so-called adjusted variance that adjusted for the fact that these variances 
of PCs are not independent anymore.

My question is for nsprcomp, how can I calculate percent explained variation by 
using sdev when I know these PCs are not independent of each other?

Thanks!

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sparse PCA using nsprcomp package

2013-09-05 Thread array chip
HI Christian,

Thanks so much for the detailed explanation! I look forward to the new release 
of nsprcomp package! At the meantime, I will use the function below for 
calculation of adjusted standard deviation. I have 2 more questions, hope you 
can shed some lights on:

1). Assume now I can calculate these adjusted standard deviation from sparse 
PCA, should the percent variation explained by each sparse PC be calculated 
using the sum of all these adjusted variance (i.e. square of the adjusted 
standard deviation) as the denominator (then these percent variation explained 
will always add up to 1 if all sparse PCs are counted, or using the sum of the 
PC variances estimated by REGULAR PCA as the denominator (then, adding up all 
PCs may not be equal to 1)?

2). How do you choose the 2 important parameters in nsprcomp(), ncomp and k? If 
for example, my regular PCA showed that I need 20 PCs to account for 80% of the 
variation in my dataset, does it mean I should set ncomp=20? And then what 
about any rules setting the value of k?

3). Would you recommend nscumcomp() or nsprcomp() in general?

Thanks so much again,

John



 From: Christian Sigg r-h...@sigg-iten.ch

Cc: r-help@r-project.org 
Sent: Thursday, September 5, 2013 2:43 PM
Subject: Re: [R] sparse PCA using nsprcomp package


Hi John

I am currently traveling and have sporadic net access, I therefore can only 
answer briefly. It's also quite late, I hope what follows still makes sense...

 For regular PCA by prcomp(), we can easily calculate the percent of total 
 variance explained by the first k PCs by using cumsum(obj$sdev^2) because 
 these PCs are independent of each other so you can simply add up the variance 
 of these PCs. For sparse PCA, as far as I understand, the generated PCs are 
 not independent of each other anymore, so you can not simply add up variances 
 to calculate percentage of variance explained by the first k PCs. For 
 example, in the package of elasticnet where spca() also performs sparse PCA, 
 one of the output from spca() is pev for percent explained variation which 
 is based on so-called adjusted variance that adjusted for the fact that 
 these variances of PCs are not independent anymore.

You are correct that measuring explained variance is more involved with sparse 
(or non-negative) PCA, because the principal axes no longer correspond to 
eigenvectors of the covariance matrix, and are usually not even orthogonal.

The next update for the 'nsprcomp' package is almost done, and one of the 
changes will concern the reported standard deviations. In the current version 
(0.3), the standard deviations are computed from the scores matrix X*W, where X 
is the data matrix and W is the (pseudo-)rotation matrix consisting of the 
sparse loadings. Computing variance this way has the advantage that 'sdev' is 
consistent with the scores matrix, but it has the disadvantage that some of the 
explained variance is counted more than once because of the non-orthogonality 
of the principal axes. One of the symptoms of this counting is that the 
variance of a later component can actually exceed the variance of an earlier 
component, which is not possible in regular PCA.

In the new version of the package, 'sdev' will report the _additional_ standard 
deviation of each component, i.e. the variance not explained by the previous 
components. Given a basis of the space spanned by the previous PAs, the 
variance of the PC is computed after projecting the current PA to the 
ortho-complement space of the basis. This procedure reverts back to standard 
PCA if no sparsity or non-negativity constraints are enforced on the PAs.

 My question is for nsprcomp, how can I calculate percent explained variation 
 by using sdev when I know these PCs are not independent of each other?

The new version of the package will do it for you. Until then, you can use 
something like the following function

asdev - function(X, W) {
    nc - ncol(W)
    sdev - numeric(nc)
    Q - qr.Q(qr(W))
    Xp - X
    for (cc in seq_len(nc)) {
        sdev[cc] - sd(Xp%*%W[ , cc])
        Xp - Xp - Xp%*%Q[ , cc]%*%t(Q[ , cc])  
    }
    return(sdev)
}

to compute the additional variances for given X and W.

The package documentation will explain the above in some more detail, and I 
will also have a small blog post which compares the 'nsprcomp' and 'spca' 
routine from the 'elasticnet' package on the 'marty' data from the EMA package.

Best regards
Christian
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] meta-analysis for sensitivity

2013-04-08 Thread array chip
Hi all, I am new to meta-analysis. Is there any special package that can 
calculate summarized sensitivity with 95% confidence interval for a 
diagnostic test, based on sensitivities from several individual studies? 

Thanks for any suggestions.

John



 From: Jannis bt_jan...@yahoo.de
To: r-help@r-project.org r-help@r-project.org 
Sent: Monday, April 8, 2013 11:12 AM
Subject: [R] checkUsage from codetools shows errors when function uses 
functions from loaded packages

Dear list members,


I frequently program small scripts and wrap them into functions to be able to 
check them with checkUsage. In case these functions (loaded via source or copy 
pasted to the R console) use functions from other packages, I get this error:


no visible global function definition for ‘xxx’

For example:



test = function() {
  require(plotrix)
  color.legend()
}

library(codetools)
checkUsage(test)


Can I tell codetools somehow where to look for these functions without building 
a full blown package?



Cheers
Jannis

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cnfidence intervals for survfit()

2013-03-14 Thread array chip
Hi, I am wondering how the confidence interval for Kaplan-Meier estimator is 
calculated by survfit(). For example, 

 summary(survfit(Surv(time,status)~1,data),times=10)
Call: survfit(formula = Surv(rtime10, rstat10) ~ 1, data = mgi)

 time n.risk n.event survival std.err lower 95% CI upper 95% CI
   10    168  55    0.761  0.0282    0.707    0.818


I am trying to reproduce the upper and lower CI by using standard error. As far 
I understand, the default method for survfit() to calculate confidence interval 
is on the log survival scale, so:


upper CI = exp(log(0.761)+qnorm(0.975)*0.0282) = 0.804
lower CI = exp(log(0.761)-qnorm(0.975)*0.0282) = 0.720


they are not the same as the output from survfit().

Am I missing something?

Thanks

John

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] error installing KEGGSOAP

2013-01-16 Thread array chip
Hi, I am new to bioconductor, trying to install KEGGSOAP package, but got 
warnings() when installing and error message when trying to load the package, 
can anyone suggest what went wrong?

many thanks

John



 source(http://bioconductor.org/biocLite.R;)
Bioconductor version 2.11 (BiocInstaller 1.8.3), ?biocLite for help
 biocLite(KEGGSOAP)
BioC_mirror: http://bioconductor.org
Using Bioconductor version 2.11 (BiocInstaller 1.8.3), R version 2.15.
Installing package(s) 'KEGGSOAP'
trying URL 
'http://bioconductor.org/packages/2.11/bioc/bin/windows/contrib/2.15/KEGGSOAP_1.32.0.zip'
Content type 'application/zip' length 69037 bytes (67 Kb)
opened URL
downloaded 67 Kb

package ‘KEGGSOAP’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
        C:\Users\yzhang\AppData\Local\Temp\RtmpawAjwx\downloaded_packages
Warning message:
installed directory not writable, cannot update packages 'acepack', 'actuar', 
'ada', 'ade4', 'ade4TkGUI',
  'agricolae', 'akima', 'ape', 'aplpack', 'arules', 'bitops', 'boot', 'Cairo', 
'car', 'caTools', 'cba',
  'chron', 'class', 'clue', 'cluster', 'coda', 'colorspace', 'CompQuadForm', 
'corpcor', 'DAAG', 'date',
  'deldir', 'descr', 'deSolve', 'devtools', 'digest', 'diptest', 'doBy', 
'DoE.wrapper', 'e1071', 'effects',
  'ENmisc', 'epiR', 'eRm', 'evaluate', 'evd', 'FactoMineR', 'fArma', 
'fAssets', 'fBasics', 'fdrtool',
  'fExoticOptions', 'fExtremes', 'fGarch', 'fields', 'flexclust', 'fMultivar', 
'fNonlinear', 'fOptions',
  'forecast', 'foreign', 'fpc', 'fracdiff', 'fRegression', 'FrF2', 
'FrF2.catlg128', 'fTrading',
  'fUnitRoots', 'gamlss', 'gamlss.data', 'gamlss.dist', 'gclus', 'gdata', 
'geoR', 'GGally', 'ggm',
  'ggplot2', 'gmodels', 'gridBase', 'gWidgets', 'gWidgetstcltk', 'HH', 
'Hmisc', 'httr', 'igraph',
  'igraph0', 'inline', 'ipred', 'isa2', 'JavaGD', 'JGR', 'kernlab', 'KernSmoot 
[... truncated]


 library(KEGGSOAP)
Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following object(s) are masked from ‘package:stats’:

    xtabs

The following object(s) are masked from ‘package:base’:

    anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, get, 
intersect, lapply, Map, mapply, mget, order, paste,
    pmax, pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int, 
rownames, sapply, setdiff, table, tapply, union, unique

failed to load HTTP resource
Error : .onLoad failed in loadNamespace() for 'KEGGSOAP', details:
  call: NULL
  error: 1: failed to load HTTP resource

Error: package/namespace load failed for ‘KEGGSOAP’


 sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C            
              
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   
base     

other attached packages:
[1] BiocGenerics_0.4.0  BiocInstaller_1.8.3

loaded via a namespace (and not attached):
[1] codetools_0.2-8 RCurl_1.91-1.1  SSOAP_0.8-0     tools_2.15.1    
XML_3.9-4.1     XMLSchema_0.7-2
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] weird merge()

2013-01-11 Thread array chip
Hi James,

I am trying to combine 11 data frames by column, not by row. My original 
message has 11 data text files attached, did they go through so you can try my 
codes?

Thanks

John





 From: J Toll jct...@gmail.com

Cc: r-help@r-project.org r-help@r-project.org 
Sent: Friday, January 11, 2013 1:35 PM
Subject: Re: [R] weird merge()




Hi,

I have some protein array data, each array in a separate text file. So I read 
them in and try to combine them into a single data frame by using merge(). see 
code below (If you download the attached data files into a specific folder, 
the code below should work):


fls-list.files(C:\\folder_of_download,full.names=T) ## get file names
prot-list() ## a list to contain individual files
ind-1
for (i in fls[c(1:11)]) {
    cat(ind,  )
   
    tmp-read.delim(i,header=T,row.names=NULL,na.string='null')
    colnames(tmp)[4]-as.character(tmp$barcode[1])
    prot[[ind]]-tmp[,-(1:2)]
    ind-ind+1
}

        ## try to merge them together
        ## not do this in a loop so I can see where the problem occurs
pro-merge(prot[[1]],prot[[2]],by.x=1,by.y=1,all=T)
pro-merge(pro,prot[[3]],by.x=1,by.y=1,all=T)
pro-merge(pro,prot[[4]],by.x=1,by.y=1,all=T)
pro-merge(pro,prot[[5]],by.x=1,by.y=1,all=T)
pro-merge(pro,prot[[6]],by.x=1,by.y=1,all=T)
pro-merge(pro,prot[[7]],by.x=1,by.y=1,all=T)
pro-merge(pro,prot[[8]],by.x=1,by.y=1,all=T)
pro-merge(pro,prot[[9]],by.x=1,by.y=1,all=T)
pro-merge(pro,prot[[10]],by.x=1,by.y=1,all=T)
pro-merge(pro,prot[[11]],by.x=1,by.y=1,all=T)


I noticed that starting file #8, the merge become more and more slower that 
when it's file #11, the computer was stuck!  Originally I thought something 
wrong with the later files, but when I change the order of merging, the 
slow-down still happens at the 8th files to be merged.

Can anyone suggest what's going on with merging?





I'm not sure exactly what you're trying to do with all that code, but if you're 
just trying to get all eleven files into a data.frame, you could do this:

allFilesAsList - lapply(1:11, function(i) read.delim(paste(p, i, .txt, sep 
= )))
oneBigDataFrame - do.call(rbind, allFilesAsList)

You may need to fix the column names.  Is that anything like what you were 
trying to do?

James
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] weird merge()

2013-01-11 Thread array chip
Hi Dennis,

Actually, I am trying to combine them by COLUMN, so 
that's why I am using merge(). The first loop was to simply read these 
protein data into R as 11 data frames, each data frame is 165 x 2. Then I
 use merge() to combine these data frames into 1 big data frame by 
column with these individual merge() statements. I didn't do it in a 
loop because I want to see at what point the merge() will start to fail.
 And it turns out the merge of the first 7 data frames is working fine. 
Starting from the 8th column, it becomes more and more slow and at the 
11th data frame it appeared stuck on my computer.

Thanks

John







 From: Dennis Murphy djmu...@gmail.com

Sent: Friday, January 11, 2013 1:25 PM
Subject: Re: [R] weird merge()

Hi John:

This doesn't look right. What are you trying to do? [BTW, the variable
names in the attachments have spaces, so most of R's read functions
should choke on them. At the very least, replace the spaces with
underscores.]

If all you are trying to do is row concatenate them (since the two or
three I looked at appear to have the same structure), then it's as
simple as

pro - do.call(rbind, prot)

If this is what you want along with an indicator for each data file,
then the ldply() function in the plyr package might be useful as an
alternative to do.call. It should return an additional variable .id
whose value corresponds to the number (or name) of the list component.

library(plyr)
pro2 - ldply(prot, rbind)

If you want something different, then be more explicit about what you
want, because your merge() code doesn't make a lot of sense to me.


Dennis

PS: Just a little hint: if you're using (almost) the same code
repeatedly, there's probably a more efficient way to do it in R. CS
types call this the DRY principle: Don't Repeat Yourself. I know you
know this, but a little reminder doesn't hurt :)



 Hi,

 I have some protein array data, each array in a separate text file. So I read 
 them in and try to combine them into a single data frame by using merge(). 
 see code below (If you download the attached data files into a specific 
 folder, the code below should work):


 fls-list.files(C:\\folder_of_download,full.names=T) ## get file names
 prot-list() ## a list to contain individual files
 ind-1
 for (i in fls[c(1:11)]) {
     cat(ind,  )

     tmp-read.delim(i,header=T,row.names=NULL,na.string='null')
     colnames(tmp)[4]-as.character(tmp$barcode[1])
     prot[[ind]]-tmp[,-(1:2)]
     ind-ind+1
 }

         ## try to merge them together
         ## not do this in a loop so I can see where the problem occurs
 pro-merge(prot[[1]],prot[[2]],by.x=1,by.y=1,all=T)
 pro-merge(pro,prot[[3]],by.x=1,by.y=1,all=T)
 pro-merge(pro,prot[[4]],by.x=1,by.y=1,all=T)
 pro-merge(pro,prot[[5]],by.x=1,by.y=1,all=T)
 pro-merge(pro,prot[[6]],by.x=1,by.y=1,all=T)
 pro-merge(pro,prot[[7]],by.x=1,by.y=1,all=T)
 pro-merge(pro,prot[[8]],by.x=1,by.y=1,all=T)
 pro-merge(pro,prot[[9]],by.x=1,by.y=1,all=T)
 pro-merge(pro,prot[[10]],by.x=1,by.y=1,all=T)
 pro-merge(pro,prot[[11]],by.x=1,by.y=1,all=T)


 I noticed that starting file #8, the merge become more and more slower that 
 when it's file #11, the computer was stuck!  Originally I thought something 
 wrong with the later files, but when I change the order of merging, the 
 slow-down still happens at the 8th files to be merged.

 Can anyone suggest what's going on with merging?

 Thanks

 John
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] weird merge()

2013-01-11 Thread array chip
I just figured out the reason was the column (the 1st column in each data frame 
gene.name) by which to merge each data frame has no unique values, some 
values were repeated, so when merging, the data frame gets bigger and bigger 
exponentially.

Sorry to bother all.

John




 From: J Toll jct...@gmail.com

Cc: r-help@r-project.org r-help@r-project.org 
Sent: Friday, January 11, 2013 1:35 PM
Subject: Re: [R] weird merge()




Hi,

I have some protein array data, each array in a separate text file. So I read 
them in and try to combine them into a single data frame by using merge(). see 
code below (If you download the attached data files into a specific folder, 
the code below should work):


fls-list.files(C:\\folder_of_download,full.names=T) ## get file names
prot-list() ## a list to contain individual files
ind-1
for (i in fls[c(1:11)]) {
    cat(ind,  )
   
    tmp-read.delim(i,header=T,row.names=NULL,na.string='null')
    colnames(tmp)[4]-as.character(tmp$barcode[1])
    prot[[ind]]-tmp[,-(1:2)]
    ind-ind+1
}

        ## try to merge them together
        ## not do this in a loop so I can see where the problem occurs
pro-merge(prot[[1]],prot[[2]],by.x=1,by.y=1,all=T)
pro-merge(pro,prot[[3]],by.x=1,by.y=1,all=T)
pro-merge(pro,prot[[4]],by.x=1,by.y=1,all=T)
pro-merge(pro,prot[[5]],by.x=1,by.y=1,all=T)
pro-merge(pro,prot[[6]],by.x=1,by.y=1,all=T)
pro-merge(pro,prot[[7]],by.x=1,by.y=1,all=T)
pro-merge(pro,prot[[8]],by.x=1,by.y=1,all=T)
pro-merge(pro,prot[[9]],by.x=1,by.y=1,all=T)
pro-merge(pro,prot[[10]],by.x=1,by.y=1,all=T)
pro-merge(pro,prot[[11]],by.x=1,by.y=1,all=T)


I noticed that starting file #8, the merge become more and more slower that 
when it's file #11, the computer was stuck!  Originally I thought something 
wrong with the later files, but when I change the order of merging, the 
slow-down still happens at the 8th files to be merged.

Can anyone suggest what's going on with merging?





I'm not sure exactly what you're trying to do with all that code, but if you're 
just trying to get all eleven files into a data.frame, you could do this:

allFilesAsList - lapply(1:11, function(i) read.delim(paste(p, i, .txt, sep 
= )))
oneBigDataFrame - do.call(rbind, allFilesAsList)

You may need to fix the column names.  Is that anything like what you were 
trying to do?

James
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sort matrix based on a specific order

2013-01-10 Thread array chip
Hi I have a character matrix with 2 columns A and B, If I want to sort the 
matrix based on the column B, but based on a specific order of characters:

mat-cbind(c('w','x','y','z'),c('a','b','c','d'))
ind-c('c','b','d','a')

I want mat to be sorted by the sequence in ind:

 [,1] [,2]
[1,] y  c 
[2,] x  b 
[3,] z  d 
[4,] w  a

Is there any simple function that can do this?

Thanks

John

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sort matrix based on a specific order

2013-01-10 Thread array chip
Thank you all for the suggestions, fantastic!





 From: Rui Barradas ruipbarra...@sapo.pt

Cc: r-help@r-project.org r-help@r-project.org 
Sent: Thursday, January 10, 2013 10:32 AM
Subject: Re: [R] sort matrix based on a specific order


Hello,

Try the following. order() gives you a permutation of the vector
'ind' and to order that permutation gives its inverse.


mat - cbind(c('w','x','y','z'),c('a','b','c','d'))
ind - c('c','b','d','a')

ord - order(ind)
mat[order(ord), ]


Hope this helps,

Rui Barradas

Em 10-01-2013 18:21, array chip escreveu:

Hi I have a character matrix with 2 columns A and B, If I want to sort the 
matrix based on the column B, but based on a specific order of characters: 
mat-cbind(c('w','x','y','z'),c('a','b','c','d'))
ind-c('c','b','d','a') I want mat to be sorted by the sequence in ind: 
 [,1] [,2]
[1,] y  c 
[2,] x  b 
[3,] z  d 
[4,] w  a Is there any simple function that can do this? Thanks John 
[[alternative HTML version deleted]] 


__ R-help@r-project.org mailing 
list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting 
guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] NRI or IDI for survival data - Hmisc package

2012-11-27 Thread array chip
Hi, I am trying to calculate net reclassification improvement (NRI) and 
Inegrated Discrimination Improvement (IDI) for a survival dataset to compare 2 
risk models. It seems that the improveProb() in Hmisc package does this only 
for binary outcome, while rcorrp.cens() does take survival object, but doesn't 
output NRI or IDI. Can anyone suggest any other packages that can calculate NRI 
and IDI for survival data?

The reference for NRI extending to survival data is:
Pencina MJ, D'Agostino RB, Steyerberg EW (2011): Extensions of net 
reclassification improvement calculations to measure usefulness of new 
biomarkers. Stat in Med 30:11-21

Thanks!

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] decimal points midline

2012-08-08 Thread array chip
Hi, does anyone know how to make decimal points midline when plotting? The 
journal to which we are going to submit a manuscript require this particular 
formatting, and it gives direction how to do this on PC: hold down ALT key and 
type 0183 on the number pad

Thanks

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] decimal points midline

2012-08-08 Thread array chip
Thanks Uwe, options(OutDec=\xB7) worked!



 From: Uwe Ligges lig...@statistik.tu-dortmund.de
To: John Kane jrkrid...@inbox.com 

roject.org 
Sent: Wednesday, August 8, 2012 12:43 PM
Subject: Re: [R] decimal points midline



On 08.08.2012 21:22, John Kane wrote:
 Can you point to an example?  It sounds like the journal is still using 
 typewriters.


Anyway, setting

options(OutDec=ALT0183)

and replace ALT0183 by the keystrokes your described below or by

options(OutDec=\xB7)

Best,
Uwe Ligges




 John Kane
 Kingston ON Canada


 -Original Message-

 Sent: Wed, 8 Aug 2012 11:52:27 -0700 (PDT)
 To: r-help@r-project.org
 Subject: [R] decimal points midline

 Hi, does anyone know how to make decimal points midline when plotting?
 The journal to which we are going to submit a manuscript require this
 particular formatting, and it gives direction how to do this on PC: hold
 down ALT key and type 0183 on the number pad

 Thanks

 John
     [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks  orcas on your 
 desktop!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a simple mixed model

2012-05-27 Thread array chip
Hi Peter, I might be unclear in my description of the data. Each patient was 
measured for a response variable y at 3 time points, there is no drug or 
other treatment involved. The objective was to examine the repeatability of the 
measurements of response variable y. Since this is repeated measure, I 
thought it should be analyzed by a simple mixed model? When you suggested a MxK 
(K=3) design, what is M then?

Thanks very much,

John






 From: peter dalgaard pda...@gmail.com

Cc: r-help@r-project.org r-help@r-project.org 
Sent: Sunday, May 27, 2012 12:09 AM
Subject: Re: [R] a simple mixed model


On May 27, 2012, at 07:12 , array chip wrote:

 Hi, I was reviewing a manuscript where a linear mixed model was used. The 
 data is simple: a response variable y was measured for each subject over 3 
 time points (visit 1, 2 and 3) that were about a week apart between 2 visits. 
 The study is a non-drug study and one of the objectives was to evaluate the 
 repeatability of response variable y. 
 
 
 The author wanted to estimate within-subject variance for that purpose. This 
 is what he wrote within-subject variance was generated from SAS 'Prog Mixed' 
 procedure with study visit as fixed effect and subject as random effect. I 
 know that the study visit was a factor variable, not a numeric variable. 
 Because each subject has 3 repeated measurements from 3 visits, how can a 
 model including subject as random effect still use visit as fixed factor? If 
 I would do it in R, I would just use a simple model to get within-subject 
 variance:
 
 obj-lmer(y~1+(1|subject),data=data)
 
 What does a model obj-lmer(y~visit+(1|subject),data=data) mean?
 
[[elided Yahoo spam]]

Sounds like a pretty standard two-way ANOVA with random row effects. 

If the design is complete (M x K with K = 3 in this case), you look at the row 
and column means. An additive model is assumed and the residual (interaction) 
is used to estimate the error variance. 

The variation of the row means is compared to the residual variance. If tau is 
the variance between row levels, the variance of the row means is sigma^2/K + 
tau, and tau can be estimated by subtraction. 

The column averages can be tested for systematic differences between visits 
with the usual F test. A non-zero effect here indicates that visits 1, 2, 3 
have some _systematic_ difference across all individuals. 

For an incomplete design, the model is the same, but the calculations are less 
simple.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] a simple mixed model

2012-05-26 Thread array chip
Hi, I was reviewing a manuscript where a linear mixed model was used. The data 
is simple: a response variable y was measured for each subject over 3 time 
points (visit 1, 2 and 3) that were about a week apart between 2 visits. The 
study is a non-drug study and one of the objectives was to evaluate the 
repeatability of response variable y. 


The author wanted to estimate within-subject variance for that purpose. This is 
what he wrote within-subject variance was generated from SAS 'Prog Mixed' 
procedure with study visit as fixed effect and subject as random effect. I 
know that the study visit was a factor variable, not a numeric variable. 
Because each subject has 3 repeated measurements from 3 visits, how can a model 
including subject as random effect still use visit as fixed factor? If I would 
do it in R, I would just use a simple model to get within-subject variance:

obj-lmer(y~1+(1|subject),data=data)

What does a model obj-lmer(y~visit+(1|subject),data=data) mean?

appreciate any thoughts!

John

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] low R square value from ANCOVA model

2012-05-08 Thread array chip
Thank you Peter, so if I observe a significant coefficient, that significance 
still holds because the standard error of the coefficient has taken the 
residual error (which is large because large R square) into account, am I 
correct?

John
 


 From: peter dalgaard pda...@gmail.com

Cc: r-help@r-project.org r-help@r-project.org 
Sent: Monday, May 7, 2012 11:07 PM
Subject: Re: [R] low R square value from ANCOVA model
  

On May 8, 2012, at 05:10 , array chip wrote:

 Hi, what does a low R-square value from an ANCOVA model mean? For example, if 
 the R square from the model is about 0.2, does this mean the results should 
 NOT be trusted? I checked the residuals of the model, it looked fine...

It just means that your model has low predictive power (at the individual 
level). I.e. the noise (error) part of the model is large relative to the 
signal (systematic part). Statistical inferences are not compromised by that, 
except of course that large error variation is reflected in large standard 
errors of estimated regression coefficients. 

  
 Thanks for any suggestion.
  
 John
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] low R square value from ANCOVA model

2012-05-08 Thread array chip
Thanks again Peter. What about the argument that because low R square (e.g. 
R^2=0.2) indicated the model variance was not sufficiently explained by the 
factors in the model, there might be additional factors that should be 
identified and included in the model. And If these additional factors were 
indeed included, it might change the significance for the factor of interest 
that previously showed significant coefficient. In other word, if R square is 
low, the significant coefficient observed is not trustworthy.

What's your opinion on this argument?

Many thanks!

John




 From: peter dalgaard pda...@gmail.com

Cc: r-help@r-project.org r-help@r-project.org 
Sent: Monday, May 7, 2012 11:43 PM
Subject: Re: [R] low R square value from ANCOVA model


On May 8, 2012, at 08:34 , array chip wrote:

 Thank you Peter, so if I observe a significant coefficient, that significance 
 still holds because the standard error of the coefficient has taken the 
 residual error (which is large because large R square) into account, am I 
 correct?

In essence, yes. One might quibble over the use of large because, but it's 
not important for the main point.

-pd

 John
 From: peter dalgaard pda...@gmail.com

 Cc: r-help@r-project.org r-help@r-project.org 
 Sent: Monday, May 7, 2012 11:07 PM
 Subject: Re: [R] low R square value from ANCOVA model
 
 
 On May 8, 2012, at 05:10 , array chip wrote:
 
  Hi, what does a low R-square value from an ANCOVA model mean? For example, 
  if the R square from the model is about 0.2, does this mean the results 
  should NOT be trusted? I checked the residuals of the model, it looked 
  fine...
 
 It just means that your model has low predictive power (at the individual 
 level). I.e. the noise (error) part of the model is large relative to the 
 signal (systematic part). Statistical inferences are not compromised by that, 
 except of course that large error variation is reflected in large standard 
 errors of estimated regression coefficients. 
 
   
  Thanks for any suggestion.
   
  John
      [[alternative HTML version deleted]]
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 -- 
 Peter Dalgaard, Professor,
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com
 
 
 
 
 
 
 
 
 
 

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] low R square value from ANCOVA model

2012-05-08 Thread array chip
Hi Peter, searched old mail archive and found this topic had been discussed 
before. The previous discussion was around a situation where there was a very 
large sample size involved so even a small effect still showed up as 
significant even with low R square of the model.

In my case, the sample size is 72, the significance of group effect is due to 
large effect relative to its standard error:

obj-lm(y~age+sex+school+group,dat)

summary(obj)

Coefficients:
  Estimate Std. Error t value Pr(|t|)
(Intercept)   169.8634    13.4678  12.613   2e-16
age    -0.3737 0.2762  -1.353   0.1805
sexM    2.1137 8.6585   0.244   0.8079
schoolS2    4.1711 8.1811   0.510   0.6118
groupG2  -20.8944    10.2807  -2.032   0.0461

Residual standard error: 32.13 on 67 degrees of freedom
Multiple R-squared: 0.1732, Adjusted R-squared: 0.1238 
F-statistic: 3.509 on 4 and 67 DF,  p-value: 0.01163 

So R-squared is quite low (0.17), what's your opinion on the argument that the 
significant coefficient for group is not trustworthy because the model variance 
was not sufficiently accounted for, and if additional factors could be 
identified and included in the model, that might changed the effect of group 
from significant to insignificant.

Many thanks for sharing your thoughts.

John






To: peter dalgaard pda...@gmail.com 
Cc: r-help@r-project.org r-help@r-project.org 
Sent: Tuesday, May 8, 2012 1:45 PM
Subject: Re: [R] low R square value from ANCOVA model


Thanks again Peter. What about the argument that because low R square (e.g. 
R^2=0.2) indicated the model variance was not sufficiently explained by the 
factors in the model, there might be additional factors that should be 
identified and included in the model. And If these additional factors were 
indeed included, it might change the significance for the factor of interest 
that previously showed significant coefficient. In other word, if R square is 
low, the significant coefficient observed is not trustworthy.

What's your opinion on this argument?

Many thanks!

John




 From: peter dalgaard pda...@gmail.com

Cc: r-help@r-project.org r-help@r-project.org 
Sent: Monday, May 7, 2012 11:43 PM
Subject: Re: [R] low R square value from ANCOVA model


On May 8, 2012, at 08:34 , array chip wrote:

 Thank you Peter, so if I observe a significant coefficient, that significance 
 still holds because the standard error of the coefficient has taken the 
 residual error (which is large because large R square) into account, am I 
 correct?

In essence, yes. One might quibble over the use of large because, but it's 
not important for the main point.

-pd

 John
 From: peter dalgaard pda...@gmail.com

 Cc: r-help@r-project.org r-help@r-project.org 
 Sent: Monday, May 7, 2012
 11:07 PM
 Subject: Re: [R] low R square value from ANCOVA model
 
 
 On May 8, 2012, at 05:10 , array chip wrote:
 
  Hi, what does a low R-square value from an ANCOVA model mean? For example, 
  if the R square from the model is about 0.2, does this mean the results 
  should NOT be trusted? I checked the residuals of the model, it looked 
  fine...
 
 It just means that your model has low predictive power (at the individual 
 level). I.e. the noise (error) part of the model is large relative to the 
 signal (systematic part). Statistical inferences are not compromised by that, 
 except of course that large error variation is reflected in large standard 
 errors of estimated regression coefficients. 
 
   
  Thanks for any suggestion.
   
  John
      [[alternative HTML version deleted]]
  
 
 __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 -- 
 Peter Dalgaard, Professor,
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com
 
 
 
 
 
 
 
 
 
 

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] low R square value from ANCOVA model

2012-05-08 Thread array chip
Paul, thanks for your thoughts. blunt, not at all

If I understand correctly, it doesn't help anything to speculate whether there 
might be additional variables existing or not. Given current variables in the 
model, it's perfectly fine to draw conclusions based on significant 
coefficients regardless of R-squared is high or low.

Gary King's article is interesting...

John




 From: Paul Johnson pauljoh...@gmail.com

Cc: peter dalgaard pda...@gmail.com; r-help@r-project.org 
r-help@r-project.org 
Sent: Tuesday, May 8, 2012 8:23 PM
Subject: Re: [R] low R square value from ANCOVA model


 Thanks again Peter. What about the argument that because low R square (e.g. 
 R^2=0.2) indicated the model variance was not sufficiently explained by the 
 factors in the model, there might be additional factors that should be 
 identified and included in the model. And If these additional factors were 
 indeed included, it might change the significance for the factor of interest 
 that previously showed significant coefficient. In other word, if R square is 
 low, the significant coefficient observed is not trustworthy.

 What's your opinion on this argument?

I think that argument is silly. I'm sorry if that is too blunt. Its
just plain superficial.
It reflects a poor understanding of what the linear model is all
about. If you have
other variables that might belong in the model, run them and test.
The R-square,
either low or high, does not have anything direct to say about whether
those other
variables exist.

Here's my authority.

Arthur Goldberger (A Course in Econometrics, 1991, p.177)
“Nothing in the CR (Classical Regression) model requires that R2 be high. 
Hence,
a high R2 is not evidence in favor of the model, and a low R2 is not evidence
against it.”

I found that reference in Anders Skrondal and  Sophia Rabe-Hesketh,
Generalized Latend Variable Modeling: Multilevel, Longitudinal,
and Structural Equation Models, Boca Raton, FL: Chapman and Hall/CRC, 2004.

From Section 8.5.2:

Furthermore, how badly the baseline model fits the data depends greatly
on the magnitude of the parameters of the true model. For instance, consider
estimating a simple parallel measurement model. If the true model is a
congeneric measurement model (with considerable variation in factor loadings
and measurement error variances between items), the fit index could be high
simply because the null model fits very poorly, i.e. because the
reliabilities of
the items are high. However, if the true model is a parallel measurement model
with low reliabilities the fit index could be low although we are estimating the
correct model. Similarly, estimating a simple linear regression model can yield
a high R2 if the relationship is actually quadratic with a considerable linear
trend and a low R2 when the model is true but with a small slope (relative to
the overall variance).

For a detailed argument/explanation of the argument that the R-square is not
a way to decide if a model is good or bad see

King, Gary. (1986). How Not to Lie with Statistics: Avoiding Common Mistakes in
Quantitative Political Science. American Journal of Political Science,
30(3), 666–687. doi:10.2307/2111095

pj
-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] low R square value from ANCOVA model

2012-05-07 Thread array chip
Hi, what does a low R-square value from an ANCOVA model mean? For example, if 
the R square from the model is about 0.2, does this mean the results should NOT 
be trusted? I checked the residuals of the model, it looked fine...
 
Thanks for any suggestion.
 
John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Difference in Kaplan-Meier estimates plus CI

2012-04-05 Thread array chip
Hi, I have the same question as Jason on how to estimate the standard error and 
construct CI around S_1(t) - S_2(t). From summary.survfit(obj), how can I 
combine the 2 survival estimates and the associated standard errors, to get an 
estimate of standard error for the difference / then calculate CI?

For example, my summary(obj) gave me:

    treatment=0 
    time   n.risk  n.event survival  std.err lower 95% CI 
upper 95% CI 
 10.  69.  28.   0.7313   0.0438   
0.6504   0.8223 

    treatment=1 
    time   n.risk  n.event survival  std.err lower 95% CI 
upper 95% CI 
 10.  86.  10.   0.9055   0.0285   
0.8514   0.9631 


The S_1(t=10) - S_2(t=1) = 0.9055-0.7313 = 0.1742. how to calculate the 
standard error this difference based on SD_1 = 0.0438 and SD_2 = 0.0285, and 
then the 95% CI for the difference?

Thanks

John




 From: Thomas Lumley tlum...@uw.edu
To: Jason Connor jcon...@alumni.cmu.edu 
Cc: r-help@r-project.org 
Sent: Wednesday, March 7, 2012 10:58 AM
Subject: Re: [R] Difference in Kaplan-Meier estimates plus CI

On Thu, Mar 8, 2012 at 4:50 AM, Jason Connor jcon...@alumni.cmu.edu wrote:
 I thought this would be trivial, but I can't find a package or function
 that does this.

 I'm hoping someone can guide me to one.

 Imagine a simple case with two survival curves (e.g. treatment  control).

 I just want to calculate the difference in KM estimates at a specific time
 point (e.g. 1 year) plus the estimate's 95% CI.  The former is
 straightforward, but the estimates not so much.

 I know methods exist such as Parzen, Wei, and Ying, but was surprised not
 to find a package that included this.

 Before I code it up, I thought I'd ask if I was just missing it somewhere.

summary.survfit() in the survival package will give you the point
estimate and standard error, and then combining these into a
difference and confidence interval for the difference is easy.

   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] statistical contrasts on 3-way interaction

2012-03-14 Thread array chip
Hi all, 

I was trying to use glht() from multcomp package to construct a contrast on 
interaction term
in a linear model to do some comparisons. I am little uncertain on how to 
construct contrasts on a 3-way interaction containing a continuous variable, 
and hope someone can confirm what I did is correct or wrong:

The linear model has a continuous dependent
variable “y”, with treatment factor “Trt” with value 0 and 1, a factor 
variable
“A” with value 0 and 1, a continuous variable “x”. 
 
A simpler model is:
 
set.seed(10)
dat - cbind(y=c(rnorm(10,3),rnorm(10,4),rnorm(10,3.1),rnorm(10,6)),
x=runif(40,5,15), expand.grid(A=rep(factor(0:1),each=10),Trt=factor(0:1)))
 
fit - lm(y
~ x + Trt * A,dat)
 
My purpose is to test whether treatment effect is significant given
each level of factor A, so I used contrasts:
 
library(multcomp)
 
K - rbind(c(0,0,1,0,0), c(0,0,1,0,1))
rownames(K) - c('Trt 1-0|A=0','Trt 1-0|A=1')
colnames(K) - names(coef(fit))
K
            (Intercept) x Trt1 A1 Trt1:A1
Trt 1-0|A=0           0 0    1  0       0
Trt 1-0|A=1           0 0    1  0       1
 
(glht.fit - summary(glht(fit, linfct =
K), test=adjusted(type='none')))
Linear
Hypotheses:
                 Estimate Std. Error t value
Pr(|t|)   
Trt 1-0|A=0 ==
0  -0.2720     0.3616  -0.752  0.45701   
Trt 1-0|A=1 ==
0   1.0690     0.3564   2.999  0.00496 **
 
Now I suspect independent variable “x” may play a role in the treatment
effect at each level of A, so I would like to add in a 3-way interaction
between Trt, A and x:
 
fit - lm(y
~ x * Trt * A,dat)
 
If my purpose is to test whether treatment is significant at each level
of factor A and certain value of covariate “x”, for example, when x=10, 
would
following code give me what I wanted?
 
K - rbind(c(0,0,1,0,10,0,0,0), c(0,0,1,0,10,0,1,10))
rownames(K) - c('Trt 1-0|A=0 x=10','Trt
1-0|A=1 x=10')
colnames(K) - names(coef(fit))
K
                 (Intercept) x Trt1 A1 x:Trt1 x:A1 Trt1:A1 x:Trt1:A1
Trt 1-0|A=0 x=10           0 0    1  0     10    0       0     
    0
Trt 1-0|A=1 x=10           0 0    1  0     10    0       1     
   10
 
(glht.fit - summary(glht(fit, linfct =
K), test=adjusted(type='none')))
Linear
Hypotheses:
                      Estimate Std. Error t
value Pr(|t|)    
Trt 1-0|A=0
x=10 == 0  -0.3526     0.3254  -1.083 0.286731    
Trt 1-0|A=1
x=10 == 0   1.4621     0.3328   4.394 0.000115 ***
 
So the above test was testing whether treatment effect is significant at each 
level of factor A when x=10, am I correct?

Appreciate if someone would confirm this?

Thanks

John 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] CV for log normal data

2012-02-21 Thread array chip
Hi, I have a microarray dataset from Agilent chips. The data were really log 
ratio between test samples and a universal reference RNA. Because of the nature 
of log ratios, coefficient of variation (CV) doesn't really apply to this kind 
of data due to the fact that mean of log ratio is very close to 0. What kind of 
measurements would people use to measure the dispersion so that I can compare 
across genes on the chip to find stably expressed genes? something similar to 
CV would be easily interpreted?

Thanks

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] CV for log normal data

2012-02-21 Thread array chip
Thank you Peter.

John




 From: Peter Langfelder peter.langfel...@gmail.com
To: Bert Gunter gunter.ber...@gene.com 

Sent: Tuesday, February 21, 2012 2:51 PM
Subject: Re: [R] CV for log normal data
 

 Good advice. But perhaps ?mad or some other perhaps robust plain old
 measure of spread?

The problem is not (lack of) robustness to outliers, the problem is to
find genes whose expression variation is small compared to (mean)
expression. Trouble is, Agilent throws the mean expression information
away, so you have to find heuristic workarounds. I have encountered
the same issue before and haven't really found a good solution.

Peter
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] prediction for linear mixed model

2012-02-20 Thread array chip
Hi, I am wondering if we can make prediction on a linear mixed model by lmer() 
from lme4 package? Specifically I am fitting a very simple glmer() with 
binomial family distribution, and want to see if I can get the predicted 
probability like that in regular logistic regression?
 
fit-glmer(y~x+(1|id),dat,family=binomial)
 
where y is the response variable (0, 1), and x is a continuous variable.
 
I would like to get a predicted probability at a given value of x.
 
Many thanks
 
John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] standard error for lda()

2012-02-09 Thread array chip
Hi, didn't hear any response yet. want to give it another try.. appreciate any 
suggestions.

John




To: r-help@r-project.org r-help@r-project.org 
Sent: Wednesday, February 8, 2012 12:11 PM
Subject: [R] standard error for lda()

Hi, I am wondering if it is possible to get an estimate of standard error of 
the predicted posterior probability from LDA using lda() from MASS? Logistic 
regression using glm() would generate a standard error for predicted 
probability with se.fit=T argument in predict(), so would it make sense to get 
standard error for posterior probability from lda() and how?

Another question about standard error estimate from glm(): is it ok to 
calculate 95% CI for the predicted probability using the standard error based 
on normal apprximation, i.e. predicted_probability +/- 1.96 * standard_error?

Thanks

John
    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] standard error for lda()

2012-02-09 Thread array chip
David, thanks for your response, hope this stirs more...

Ok, a simple code:

y-as.factor(rnorm(100)0.5)

x1-rnorm(100)
x2-rnorm(100)
obj-glm(y~x1+x2,family=binomial)
predict(obj,type='response',se.fit=T)

predict(obj,...) will give predicted probabilities in the fit element; and 
the associated estimated standard errors in the se.fit element (if I 
understand correctly). The predicted probability from logistic regression is 
ultimately a function of y and thus a standard error of it should be able to be 
computed. So one of my questions is whether we can use normal approximation to 
construct 95% CI for the predicted probabilities using standard errors, because 
I am not sure if probabilities would follow normal distribution?

Now, if we try lda():

library(MASS)
obj2-lda(y~x1+x2)

predict(obj2)

where predict(obj2) produces posterior probabilities, the predicted class, etc. 
My question is whether it's possible to produce standard errors for these 
posterior probabilities?

Thanks again.

John




 From: David Winsemius dwinsem...@comcast.net

Cc: r-help@r-project.org r-help@r-project.org 
Sent: Thursday, February 9, 2012 2:59 PM
Subject: Re: [R] standard error for lda()


On Feb 9, 2012, at 4:45 PM, array chip wrote:

 Hi, didn't hear any response yet. want to give it another try.. appreciate 
 any suggestions.
 

My problem after reading this the first time was that I didn't agree with the 
premise that logistic regression would provide a standard error for a 
probability. It provides a standard error around an estimated coefficient 
value. And then you provided no further details or code to create a simulation, 
and there didn't seem much point in trying to teach you statistical terminology 
that you were throwning around in a manner that seems rather cavalier ,  
admittedly this being a very particular reaction from this non-expert audience 
of one.


 John
 
 
 
 
 To: r-help@r-project.org r-help@r-project.org
 Sent: Wednesday, February 8, 2012 12:11 PM
 Subject: [R] standard error for lda()
 
 Hi, I am wondering if it is possible to get an estimate of standard error of 
 the predicted posterior probability from LDA using lda() from MASS? Logistic 
 regression using glm() would generate a standard error for predicted 
 probability with se.fit=T argument in predict(), so would it make sense to 
 get standard error for posterior probability from lda() and how?
 
 Another question about standard error estimate from glm(): is it ok to 
 calculate 95% CI for the predicted probability using the standard error based 
 on normal apprximation, i.e. predicted_probability +/- 1.96 * standard_error?
 
 Thanks
 
 John
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] standard error for lda()

2012-02-08 Thread array chip
Hi, I am wondering if it is possible to get an estimate of standard error of 
the predicted posterior probability from LDA using lda() from MASS? Logistic 
regression using glm() would generate a standard error for predicted 
probability with se.fit=T argument in predict(), so would it make sense to get 
standard error for posterior probability from lda() and how?

Another question about standard error estimate from glm(): is it ok to 
calculate 95% CI for the predicted probability using the standard error based 
on normal apprximation, i.e. predicted_probability +/- 1.96 * standard_error?

Thanks

John
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] forest plots from rmeta package

2011-11-21 Thread array chip
Hi, I am trying forestplot() and metaplot() from rmeta package to plot some 
hazard ratios. Now foreatplot() can draw clipping with  or  at the ends 
of the confidence line using clip argument, but can't use multiple colors for 
different lines. On the other hand, metaplot() can use different colors for 
each line with colors=meta.colors( lines=c(blue,black)) arugument, but 
don't draw clipping symbol  or  at the end of the confidence line (it 
doesn't take clip argument). So is there any other package/function that 
allows me to do both colors and clipping?

Thanks

John

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why NA coefficients

2011-11-08 Thread array chip
Sure it does, but still struggling with what is going on...

Thanks

John



From:David Winsemius dwinsem...@comcast.net
To:David Winsemius dwinsem...@comcast.net

oject.org
Sent:Monday, November 7, 2011 10:27 PM
Subject:Re: [R] why NA coefficients

But this output suggests there may be alligators in the swamp:

 predict(lmod, newdata=data.frame(treat=1, group=2))
         1
0.09133691
Warning message:
In predict.lm(lmod, newdata = data.frame(treat = 1, group = 2)) :
  prediction from a rank-deficient fit may be misleading

--David.
 
 --David.
 
 Thanks
 
 John
 
 
 From: David Winsemius dwinsem...@comcast.net

 Cc: r-help@r-project.org r-help@r-project.org
 Sent: Monday, November 7, 2011 5:13 PM
 Subject: Re: [R] why NA coefficients
 
 
 On Nov 7, 2011, at 7:33 PM, array chip wrote:
 
  Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat 
  has 7 levels, group has 2 levels). I found the coefficient for the last 
  interaction term is always 0, see attached dataset and the code below:
 
  test-read.table(test.txt,sep='\t',header=T,row.names=NULL)
  lm(y~factor(treat)*factor(group),test)
 
  Call:
  lm(formula = y ~ factor(treat) * factor(group), data = test)
 
  Coefficients:
                   (Intercept)                factor(treat)2                
 factor(treat)3
                       0.429244                      0.499982                
       0.352971
                 factor(treat)4                factor(treat)5                
 factor(treat)6
                     -0.204752                      0.142042                 
      0.044155
                 factor(treat)7                factor(group)2  
 factor(treat)2:factor(group)2
                     -0.007775                      -0.337907                
       -0.208734
  factor(treat)3:factor(group)2  factor(treat)4:factor(group)2  
  factor(treat)5:factor(group)2
                     -0.195138                      0.800029                 
      0.227514
  factor(treat)6:factor(group)2  factor(treat)7:factor(group)2
                       0.331548                            NA
 
 
  I guess this is due to model matrix being singular or collinearity among 
  the matrix columns? But I can't figure out how the matrix is singular in 
  this case? Can someone show me why this is the case?
 
 Because you have no cases in one of the crossed categories.
 
 --David Winsemius, MD
 West Hartford, CT
 
 
 
 
 David Winsemius, MD
 West Hartford, CT
 
 __
 R-help@r-project.orgmailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why NA coefficients

2011-11-08 Thread array chip
true, why it has to omit treat 7-group 2

Thanks again



From: David Winsemius dwinsem...@comcast.net

Cc: r-help@r-project.org r-help@r-project.org
Sent: Monday, November 7, 2011 10:19 PM
Subject: Re: [R] why NA coefficients


On Nov 7, 2011, at 10:07 PM, array chip wrote:

 Thanks David. The only category that has no cases is treat 1-group 2:
 
  with(test,table(treat,group))
      group
 treat 1 2
     1 8 0
     2 1 5
     3 5 5
     4 7 3
     5 7 4
     6 3 3
     7 8 2
 
 But why the coefficient for treat 7-group 2 is not estimable?

Well, it had to omit one of them didn't it?

(But I don't know why that level was chosen.)

--David.
 
 Thanks
 
 John
 
 
 From: David Winsemius dwinsem...@comcast.net

 Cc: r-help@r-project.org r-help@r-project.org
 Sent: Monday, November 7, 2011 5:13 PM
 Subject: Re: [R] why NA coefficients
 
 
 On Nov 7, 2011, at 7:33 PM, array chip wrote:
 
  Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat 
  has 7 levels, group has 2 levels). I found the coefficient for the last 
  interaction term is always 0, see attached dataset and the code below:
 
  test-read.table(test.txt,sep='\t',header=T,row.names=NULL)
  lm(y~factor(treat)*factor(group),test)
 
  Call:
  lm(formula = y ~ factor(treat) * factor(group), data = test)
 
  Coefficients:
                   (Intercept)                factor(treat)2                
 factor(treat)3
                       0.429244                      0.499982                 
      0.352971
                 factor(treat)4                factor(treat)5                
 factor(treat)6
                     -0.204752                      0.142042                  
     0.044155
                 factor(treat)7                factor(group)2  
 factor(treat)2:factor(group)2
                     -0.007775                      -0.337907                 
      -0.208734
  factor(treat)3:factor(group)2  factor(treat)4:factor(group)2  
  factor(treat)5:factor(group)2
                     -0.195138                      0.800029                  
     0.227514
  factor(treat)6:factor(group)2  factor(treat)7:factor(group)2
                       0.331548                            NA
 
 
  I guess this is due to model matrix being singular or collinearity among 
  the matrix columns? But I can't figure out how the matrix is singular in 
  this case? Can someone show me why this is the case?
 
 Because you have no cases in one of the crossed categories.
 
 --David Winsemius, MD
 West Hartford, CT
 
 
 

David Winsemius, MD
West Hartford, CT
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why NA coefficients

2011-11-08 Thread array chip
: Monday, November 7, 2011 9:29 PM
 Subject: Re: [R] why NA coefficients

 Hi John:

 What is the estimate of the cell mean \mu_{12}? Which model effects
 involve that cell mean? With this data arrangement, the expected
 population marginal means of treatment 1 and group 2 are not estimable
 either, unless you're willing to assume a no-interaction model.

 Chapters 13 and 14 of Milliken and Johnson's Analysis of Messy Data
 (vol. 1) cover this topic in some detail, but it assumes you're
 familiar with the matrix form of a linear statistical model. Both
 chapters cover the two-way model with interaction - Ch.13 from the
 cell means model approach and Ch. 14 from the model effects approach.
 Because this was written in the mid 80s and republished in the early
 90s, all the code used is in SAS.

 HTH,
 Dennis


 Thanks David. The only category that has no cases is treat 1-group 2:

 with(test,table(treat,group))
  group
 treat 1 2
     1 8 0
     2 1 5
     3 5 5
     4 7 3
     5 7 4
     6 3 3
     7 8 2

 But why the coefficient for treat 7-group 2 is not estimable?

 Thanks

 John




 
 From: David Winsemius dwinsem...@comcast.net

 Cc: r-help@r-project.org r-help@r-project.org
 Sent: Monday, November 7, 2011 5:13 PM
 Subject: Re: [R] why NA coefficients


 On Nov 7, 2011, at 7:33 PM, array chip wrote:

 Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat
 has 7 levels, group has 2 levels). I found the coefficient for the last
 interaction term is always 0, see attached dataset and the code below:

 test-read.table(test.txt,sep='\t',header=T,row.names=NULL)
 lm(y~factor(treat)*factor(group),test)

 Call:
 lm(formula = y ~ factor(treat) * factor(group), data = test)

 Coefficients:
                   (Intercept)                 factor(treat)2
    factor(treat)3
                      0.429244                       0.499982
          0.352971
                factor(treat)4                 factor(treat)5
    factor(treat)6
                     -0.204752                       0.142042
          0.044155
                factor(treat)7                 factor(group)2
 factor(treat)2:factor(group)2
                     -0.007775                      -0.337907
         -0.208734
 factor(treat)3:factor(group)2  factor(treat)4:factor(group)2
 factor(treat)5:factor(group)2
                     -0.195138                       0.800029
          0.227514
 factor(treat)6:factor(group)2  factor(treat)7:factor(group)2
                      0.331548                             NA


 I guess this is due to model matrix being singular or collinearity among
 the matrix columns? But I can't figure out how the matrix is singular in
 this case? Can someone show me why this is the case?

 Because you have no cases in one of the crossed categories.

 --David Winsemius, MD
 West Hartford, CT
        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] why NA coefficients

2011-11-07 Thread array chip
Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat has 7 
levels, group has 2 levels). I found the coefficient for the last interaction 
term is always 0, see attached dataset and the code below:

 test-read.table(test.txt,sep='\t',header=T,row.names=NULL)
 lm(y~factor(treat)*factor(group),test)

Call:
lm(formula = y ~ factor(treat) * factor(group), data = test)

Coefficients:
  (Intercept) factor(treat)2 
factor(treat)3  
 0.429244   
0.499982   0.352971  
   factor(treat)4 factor(treat)5 
factor(treat)6  
    -0.204752   
0.142042   0.044155  
   factor(treat)7 factor(group)2  
factor(treat)2:factor(group)2  
    -0.007775  
-0.337907  -0.208734  
factor(treat)3:factor(group)2  factor(treat)4:factor(group)2  
factor(treat)5:factor(group)2  
    -0.195138   
0.800029   0.227514  
factor(treat)6:factor(group)2  factor(treat)7:factor(group)2  
 0.331548 NA 


I guess this is due to model matrix being singular or collinearity among the 
matrix columns? But I can't figure out how the matrix is singular in this case? 
Can someone show me why this is the case?

Thanks

Johntreat   group   y
3   1   0.659156868932769
2   2   0.531460140598938
5   2   0.111702043330297
3   2   0.531970057170838
3   2   0.190075619611889
6   2   0.0234142139088362
2   1   0.929225602652878
1   1   0.165439655771479
1   1   0.245642073685303
4   2   0.973135874373838
2   2   0.40159740881063
2   2   0.230675351573154
4   1   0.15201729699038
4   2   0.770770706702024
3   2   0.00170516152866185
7   1   0.475904347142205
7   1   0.477398047922179
2   2   0.57676896895282
2   2   0.17242363281548
1   1   0.263812917750329
1   1   0.332802928285673
7   1   0.999866250436753
7   2   0.0795505701098591
5   2   0.723311265930533
4   1   0.301724245771766
4   1   0.174829749623314
4   1   0.161948098335415
4   1   0.0022172573953867
3   1   0.815872102044523
3   1   0.953563597053289
7   1   0.687810138566419
7   1   0.305082862731069
1   1   0.742835792247206
1   1   0.643639815272763
4   1   0.751568211475387
6   1   0.844687921227887
4   2   0.315934179816395
5   1   0.884743785485625
5   1   0.316437228582799
3   2   0.00965709099546075
3   2   0.512446181615815
5   1   0.765263902954757
5   1   0.754491505213082
4   1   0.0271341009065509
7   1   0.242603822145611
7   1   0.12916071084328
6   1   0.531941789202392
6   1   0.043566432675
5   1   0.380349192768335
5   1   0.457562071271241
6   2   0.539795646909624
6   2   0.837911191163585
7   1   0.0539217721670866
1   1   0.910242940066382
1   1   0.129532726714388
5   1   0.440152890980244
3   1   0.945850990246981
3   1   0.53663158044219
5   2   0.866308335913345
5   2   0.142251220298931
7   2   0.0875730190891773
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why NA coefficients

2011-11-07 Thread array chip
Thanks David. The only category that has no cases is treat 1-group 2:

 with(test,table(treat,group))
 group
treat 1 2
    1 8 0
    2 1 5
    3 5 5
    4 7 3
    5 7 4
    6 3 3
    7 8 2

But why the coefficient for treat 7-group 2 is not estimable?  

Thanks

John





From: David Winsemius dwinsem...@comcast.net

Cc: r-help@r-project.org r-help@r-project.org
Sent: Monday, November 7, 2011 5:13 PM
Subject: Re: [R] why NA coefficients


On Nov 7, 2011, at 7:33 PM, array chip wrote:

 Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat has 
 7 levels, group has 2 levels). I found the coefficient for the last 
 interaction term is always 0, see attached dataset and the code below:
 
 test-read.table(test.txt,sep='\t',header=T,row.names=NULL)
 lm(y~factor(treat)*factor(group),test)
 
 Call:
 lm(formula = y ~ factor(treat) * factor(group), data = test)
 
 Coefficients:
                   (Intercept)                 factor(treat)2                 
factor(treat)3
                      0.429244                       0.499982                  
     0.352971
                factor(treat)4                 factor(treat)5                 
factor(treat)6
                     -0.204752                       0.142042                  
     0.044155
                factor(treat)7                 factor(group)2  
factor(treat)2:factor(group)2
                     -0.007775                      -0.337907                  
    -0.208734
 factor(treat)3:factor(group)2  factor(treat)4:factor(group)2  
 factor(treat)5:factor(group)2
                     -0.195138                       0.800029                  
     0.227514
 factor(treat)6:factor(group)2  factor(treat)7:factor(group)2
                      0.331548                             NA
 
 
 I guess this is due to model matrix being singular or collinearity among the 
 matrix columns? But I can't figure out how the matrix is singular in this 
 case? Can someone show me why this is the case?

Because you have no cases in one of the crossed categories.

--David Winsemius, MD
West Hartford, CT
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] why NA coefficients

2011-11-07 Thread array chip
Hi Dennis,

The cell mean mu_12 from the model involves the intercept and factor 2:

Coefficients:
  (Intercept) factor(treat)2 
factor(treat)3  
 0.429244   
0.499982   0.352971  
   factor(treat)4 factor(treat)5 
factor(treat)6  
    -0.204752   
0.142042   0.044155  
   factor(treat)7 factor(group)2  
factor(treat)2:factor(group)2  
    -0.007775  
-0.337907  -0.208734  
factor(treat)3:factor(group)2  factor(treat)4:factor(group)2  
factor(treat)5:factor(group)2  
    -0.195138   
0.800029   0.227514  
factor(treat)6:factor(group)2  factor(treat)7:factor(group)2  
 0.331548 NA

So mu_12 = 0.429244-0.337907 = 0.091337. This can be verified by:

 predict(fit,data.frame(list(treat=1,group=2)))
 1 
0.09133691 
Warning message:
In predict.lm(fit, data.frame(list(treat = 1, group = 2))) :
  prediction from a rank-deficient fit may be misleading


But as you can see, it gave a warning about rank-deficient fit... why this is a 
rank-deficient fit?

Because treat 1_group 2 has no cases, so why it is still estimable while on 
the contrary, treat 7_group 2 which has 2 cases is not?

Thanks

John








From: Dennis Murphy djmu...@gmail.com

Sent: Monday, November 7, 2011 9:29 PM
Subject: Re: [R] why NA coefficients

Hi John:

What is the estimate of the cell mean \mu_{12}? Which model effects
involve that cell mean? With this data arrangement, the expected
population marginal means of treatment 1 and group 2 are not estimable
either, unless you're willing to assume a no-interaction model.

Chapters 13 and 14 of Milliken and Johnson's Analysis of Messy Data
(vol. 1) cover this topic in some detail, but it assumes you're
familiar with the matrix form of a linear statistical model. Both
chapters cover the two-way model with interaction - Ch.13 from the
cell means model approach and Ch. 14 from the model effects approach.
Because this was written in the mid 80s and republished in the early
90s, all the code used is in SAS.

HTH,
Dennis


 Thanks David. The only category that has no cases is treat 1-group 2:

 with(test,table(treat,group))
  group
 treat 1 2
     1 8 0
     2 1 5
     3 5 5
     4 7 3
     5 7 4
     6 3 3
     7 8 2

 But why the coefficient for treat 7-group 2 is not estimable?

 Thanks

 John




 
 From: David Winsemius dwinsem...@comcast.net

 Cc: r-help@r-project.org r-help@r-project.org
 Sent: Monday, November 7, 2011 5:13 PM
 Subject: Re: [R] why NA coefficients


 On Nov 7, 2011, at 7:33 PM, array chip wrote:

 Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat 
 has 7 levels, group has 2 levels). I found the coefficient for the last 
 interaction term is always 0, see attached dataset and the code below:

 test-read.table(test.txt,sep='\t',header=T,row.names=NULL)
 lm(y~factor(treat)*factor(group),test)

 Call:
 lm(formula = y ~ factor(treat) * factor(group), data = test)

 Coefficients:
                   (Intercept)                 factor(treat)2                 
factor(treat)3
                      0.429244                       0.499982                 
      0.352971
                factor(treat)4                 factor(treat)5                 
factor(treat)6
                     -0.204752                       0.142042                 
      0.044155
                factor(treat)7                 factor(group)2  
factor(treat)2:factor(group)2
                     -0.007775                      -0.337907                 
     -0.208734
 factor(treat)3:factor(group)2  factor(treat)4:factor(group)2  
 factor(treat)5:factor(group)2
                     -0.195138                       0.800029                 
      0.227514
 factor(treat)6:factor(group)2  factor(treat)7:factor(group)2
                      0.331548                             NA


 I guess this is due to model matrix being singular or collinearity among the 
 matrix columns? But I can't figure out how the matrix is singular in this 
 case? Can someone show me why this is the case?

 Because you have no cases in one of the crossed categories.

 --David Winsemius, MD
 West Hartford, CT
        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https

[R] compare proportions

2011-09-28 Thread array chip
Hi all, I asked this yesterday, but hadn't got any response yet. Understand 
this is not pure R technical question, but more of statistical. With many 
statistical experts in the list, I would appreciate any suggestions...

Many thanks

John

--


Hi, I have a seemingly simple proportional test.  here is the question I am 
trying to answer:
 
There is a test running each day in the lab, the test comes out as either 
positive or negative. So at the end of each month, we can calculate a positive 
rate in that month as the proportion of positive test results. The data look 
like:
 
Month  # positive   # total tests    positive rate
January 24    205 11.7%
February    31    234 13.2%
March   26    227 11.5%
:
:
:
August  42    241  17.4%
 
The total # of positive before August is 182, and the total # of tests before 
August is 1526. It appears that from January to July, the positive monthly rate 
is between 11% to 13%, the rate in August is up around 17%. So the question is 
whether is up in August is statistically significant?
 
I can think of 3 ways to do this test:
 
1. Use binom.test(), set “p” as the average positive
rate between January and July (=182/1526):
 
 binom.test(42,241,182/1526)
 
    Exact binomial test
 
data:  42 and 241
number of successes = 42, number
of trials = 241, p-value = 0.0125
alternative hypothesis: true
probability of success is not equal to 0.1192661
95 percent confidence interval:
 0.1285821 0.2281769
sample estimates:
probability of success
 0.1742739
 
2. Use prop.test(), where I compare the average
positive rate between January  July with the positive rate in August:
 
 prop.test(c(182,42),c(1526,241))
 
    2-sample test for equality of
proportions with continuity correction
 
data:  c(182, 42) out of c(1526, 241)
X-squared = 5.203, df = 1,
p-value = 0.02255
alternative hypothesis:
two.sided
95 percent confidence interval:
 -0.107988625 -0.002026982
sample estimates:
   prop 1    prop 2
0.1192661 0.1742739

3. Use prop.test(), where I compare the average
MONTHLY positive rate between January  July with the positive rate in
August. The average monthly # of positives is 182/7=26, the average monthly $
of total tests is 1526/7=218:
 
 prop.test(c(26,42),c(218,241))
 
    2-sample test for equality of
proportions with continuity correction
 
data:  c(26, 42) out of c(218, 241)
X-squared = 2.3258, df = 1,
p-value = 0.1272
alternative hypothesis:
two.sided
95 percent confidence interval:
 -0.12375569  0.01374008
sample estimates:
   prop 1    prop 2
0.1192661 0.1742739
 
As you can see that the method 3 gave insignificant p value compared to method 
1  2. While I understand each method is testing different hypothesis, but for 
the question I am trying to answer (does August have higher positive rate 
compare to earlier months?), which method is more relevant? Or I should 
consider some regression techniques, then what type of regression is 
appropriate?
 
Thanks for any suggestions,
 
John


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] compare proportions

2011-09-27 Thread array chip
Hi, I have a seemingly simple proportional test.  here is the question I am 
trying to answer:
 
There is a test running each day in the lab, the test comes out as
either positive or negative. So at the end of each month, we can calculate a
positive rate in that month as the proportion of positive test results. The
data look like:
 
Month      # positive   # total tests    positive rate
January  24 205  11.7%
February    31    234      13.2%
March 26   227 11.5%
:
:
:
August 42    241 17.4%
 
The total # of positive before August is 182, and the total # of tests
before August is 1526. It appears that from January to July, the positive rate
is between 11% to 13%, the rate in August is up around 17%. So the question is
whether is up in August is statistically significant?
 
I can think of 3 ways to do this test:
 
1.1. Use binom.test(), set “p” as the average positive
rate between January and July (=182/1526):
 
 binom.test(42,241,182/1526)
 
    Exact binomial test
 
data:  42 and 241 
number of successes = 42, number
of trials = 241, p-value = 0.0125
alternative hypothesis: true
probability of success is not equal to 0.1192661 
95 percent confidence interval:
 0.1285821 0.2281769 
sample estimates:
probability of success 
 0.1742739
 
2. 2. Use prop.test(), where I compare the average
positive rate between January  July with the positive rate in August:
 
 prop.test(c(182,42),c(1526,241))
 
    2-sample test for equality of
proportions with continuity correction
 
data:  c(182, 42) out of c(1526, 241) 
X-squared = 5.203, df = 1,
p-value = 0.02255
alternative hypothesis:
two.sided 
95 percent confidence interval:
 -0.107988625 -0.002026982 
sample estimates:
   prop 1    prop 2 
0.1192661 0.1742739
3.   
2. 3. Use prop.test(), where I compare the average
monthly positive rate between January  July with the positive rate in
August. The average monthly # of positives is 182/7=26, the average monthly $
of total tests is 1526/7=216:
 
 prop.test(c(26,42),c(218,241))
 
    2-sample test for equality of
proportions with continuity correction
 
data:  c(26, 42) out of c(218, 241) 
X-squared = 2.3258, df = 1,
p-value = 0.1272
alternative hypothesis:
two.sided 
95 percent confidence interval:
 -0.12375569  0.01374008 
sample estimates:
   prop 1    prop 2 
0.1192661 0.1742739
 
As you can see that the method 3 gave insignificant p value compared to
method 1  2. While I understand each method is testing different hypothesis,
but for the question I am trying to answer (does August have higher positive
rate compare to earlier months?), which method is more relevant?
 
Thanks for any suggestion,
 
John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] closed testing procedure

2011-09-11 Thread array chip
Hi, are the methods for multiple testing p value adjustment (Shaffer, Westfall, 
free) implemented in the function adjusted() in multcomp package so called 
closed testing procedure? what about those methods (holm, hochberg, hommel, BH, 
BY) implemented in the p.adjust() in the stats package?
 
Thanks
 
John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] suggestion for proportions

2011-09-09 Thread array chip
Thanks all again for the suggestions. I agree with Wolfgang that mcnemar.test() 
is what I am looking for. The accuracy is the proportion of correct diagnosis 
compared to a gold standard, and I am interested in which diagnosis test is 
better, not particular interested in assessing the agreement between the 2 
tests.

Thanks again.


John


- Original Message -
From: Viechtbauer Wolfgang (STAT) wolfgang.viechtba...@maastrichtuniversity.nl
To: r-help@r-project.org r-help@r-project.org
Cc: csrabak cesar.ra...@gmail.com; array chip arrayprof...@yahoo.com
Sent: Thursday, September 8, 2011 1:24 AM
Subject: RE: [R] suggestion for proportions

I assume you mean Cohen's kappa. This is not what the OP is asking about. The 
OP wants to know how to test for a difference in the proportions of 1's. 
Cohen's kappa will tell you what the level of agreement is between the two 
tests. This is something different.

Also, the OP has now clarified that the data are paired. Therefore, prop.test() 
and binom.test() are not appropriate. So, to answer the OPs question: yes, 
mcnemar.test() is what you should be using.

Wolfgang

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of csrabak
 Sent: Thursday, September 08, 2011 02:31
 To: array chip
 Cc: r-help@r-project.org
 Subject: Re: [R] suggestion for proportions
 
 Em 7/9/2011 16:53, array chip escreveu:
  Hi all, thanks very much for sharing your thoughts. and sorry for my
 describing the problem not clearly, my fault.
 
  My data is paired, that is 2 different diagnostic tests were performed
 on the same individuals. Each individual will have a test results from
 each of the 2 tests. Then in the end, 2 accuracy rates were calculated for
 the 2 tests. And I want to test if there is a significant difference in
 the accuracy (proportion) between the 2 tests. My understanding is that
 prop.test() is appropriate for 2 independent proportions,  whereas in my
 situation, the 2 proportions are not independent calculated from paired
 data, right?
 
  the data would look like:
 
  pid   test1    test2
  p1      1         0
  p2      1         1
  p3      0         1
  :
  :
 
  1=test is correct; 0=not correct
 
  from the data above, we can calculate accuracy for test1 and test2, then
 to compare
 
 
  So mcnemar.test() is good for that, right?
 
  Thanks
 
 John,
 
  From above clarifying I suggest you consider the use of kappa test. For
 a list of possible ways of doing it in R try:
 RSiteSearch(kappa,restrict=functions)
 
 HTH
 
 --
 Cesar Rabak

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] suggestion for proportions

2011-09-07 Thread array chip
Hi, I am wondering if anyone can suggest how to test the equality of 2 
proportions. The caveat here is that the 2 proportions were calculated from the 
same number of samples using 2 different tests. So essentially we are comparing 
2 accuracy rates from same, say 100, samples. I think this is like a paired 
test, but don't know if really we need to consider the paired nature of the 
data, and if yes then how? Or just use prop.test() to compare 2 proportions?

Any suggestion would be greatly appreciated.

Thanks

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] suggestion for proportions

2011-09-07 Thread array chip
Hi all, thanks very much for sharing your thoughts. and sorry for my describing 
the problem not clearly, my fault.

My data is paired, that is 2 different diagnostic tests were performed on the 
same individuals. Each individual will have a test results from each of the 2 
tests. Then in the end, 2 accuracy rates were calculated for the 2 tests. And I 
want to test if there is a significant difference in the accuracy (proportion) 
between the 2 tests. My understanding is that prop.test() is appropriate for 2 
independent proportions,  whereas in my situation, the 2 proportions are not 
independent calculated from paired data, right?

the data would look like:

pid   test1    test2
p1      1         0
p2      1         1
p3      0         1
:
:

1=test is correct; 0=not correct

from the data above, we can calculate accuracy for test1 and test2, then to 
compare


So mcnemar.test() is good for that, right?

Thanks

John




- Original Message -
From: Viechtbauer Wolfgang (STAT) wolfgang.viechtba...@maastrichtuniversity.nl
To: r-help@r-project.org r-help@r-project.org
Cc: Bert Gunter gunter.ber...@gene.com
Sent: Wednesday, September 7, 2011 8:14 AM
Subject: Re: [R] suggestion for proportions

Indeed, the original post leaves some room for interpretation. In any case, I 
hope the OP has enough information now to figure out what approach is best for 
his data.

Best,

Wolfgang

 -Original Message-
 From: Bert Gunter [mailto:gunter.ber...@gene.com]
 Sent: Wednesday, September 07, 2011 16:47
 To: Viechtbauer Wolfgang (STAT)
 Cc: r-help@r-project.org; John Sorkin
 Subject: Re: [R] suggestion for proportions
 
 Wolfgang:
 
 On Wed, Sep 7, 2011 at 7:28 AM, Viechtbauer Wolfgang (STAT)
 wolfgang.viechtba...@maastrichtuniversity.nl wrote:
  Acutally,
 
  ?mcnemar.test
 
  since it is paired data.
 
 Actually, it is unclear to me from the OP's message whether this is the
 case.
 
 In one sentence the OP says that the _number_ of samples is the same,
 and in the next he says that essentially the samples are the same.
 So, as usual, imprecision in the problem description leads to
 imprecision in the solution.
 
 But your point is well taken, of course.
 
 -- Bert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] weird apply() behavior

2011-08-29 Thread array chip
Hi, I had a weird results from using apply(). Here is an simple example:

 y-data.frame(list(a=c(1,NA),b=c('2k','0')))
 y

    a b
1  1   2k
2 NA   0

 apply(y,1,function(x){x-unlist(x); if (!is.na(x[2])  x[2]=='2k'  
 !is.na(x[1])  x[1]=='1') 1 else 0} )

This should print 1 0 as output, as demonstrated by:

 apply(y[1,],1,function(x){x-unlist(x); if (!is.na(x[2])  x[2]=='2k'  
 !is.na(x[1])  x[1]=='1') 1 else 0} )
1 
1 
 apply(y[2,],1,function(x){x-unlist(x); if (!is.na(x[2])  x[2]=='2k'  
 !is.na(x[1])  x[1]=='1') 1 else 0} )
2 
0 


But it actually prints:

 apply(y,1,function(x){x-unlist(x); if (!is.na(x[2])  x[2]=='2k'  
 !is.na(x[1])  x[1]=='1') 1 else 0} )

[1] 0 0


Anyone has any suggestion?

Thanks

John


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] weird apply() behavior

2011-08-29 Thread array chip
Thanks Bill and David!

John



- Original Message -
From: William Dunlap wdun...@tibco.com
To: array chip arrayprof...@yahoo.com; r-help@r-project.org 
r-help@r-project.org
Cc: 
Sent: Monday, August 29, 2011 5:21 PM
Subject: RE: [R] weird apply() behavior

apply() should come with a big warning that it was
written for matrices and can cause a lot of surprises
with data.frames.  apply(X, ...) converts X to a
matrix and that is the kernel of your problem: if any
columns of X are not numeric then as.matrix(X) is
a character matrix whose columns are made with calls
to format(), which means that all entries in the column
have the same number of characters in them.   This is
handy for printing but not for other purposes.

   as.matrix(y)
       a    b
  [1,]  1 2k
  [2,] NA   0
   as.matrix(y[1,])
    a   b
  1 1 2k
   as.matrix(y[2,])
    a  b
  2 NA 0

Another way to debug this on your own is to simplify the
function you are giving to apply.  Then you would see
where things are going wrong (but perhaps not why):

   apply(y, 1, function(x)unlist(x))
    [,1] [,2]
  a  1 NA
  b 2k 0
   apply(y[1,], 1, function(x)unlist(x))
    1
  a 1
  b 2k
   apply(y[2,], 1, function(x)unlist(x))
    2
  a NA
  b 0


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf Of array chip
 Sent: Monday, August 29, 2011 5:03 PM
 To: r-help@r-project.org
 Subject: [R] weird apply() behavior
 
 Hi, I had a weird results from using apply(). Here is an simple example:
 
  y-data.frame(list(a=c(1,NA),b=c('2k','0')))
  y
 
     a b
 1  1   2k
 2 NA   0
 
  apply(y,1,function(x){x-unlist(x); if (!is.na(x[2])  x[2]=='2k'  
  !is.na(x[1])  x[1]=='1') 1 else
 0} )
 
 This should print 1 0 as output, as demonstrated by:
 
  apply(y[1,],1,function(x){x-unlist(x); if (!is.na(x[2])  x[2]=='2k'  
  !is.na(x[1])  x[1]=='1') 1
 else 0} )
 1
 1
  apply(y[2,],1,function(x){x-unlist(x); if (!is.na(x[2])  x[2]=='2k'  
  !is.na(x[1])  x[1]=='1') 1
 else 0} )
 2
 0
 
 
 But it actually prints:
 
  apply(y,1,function(x){x-unlist(x); if (!is.na(x[2])  x[2]=='2k'  
  !is.na(x[1])  x[1]=='1') 1 else
 0} )
 
 [1] 0 0
 
 
 Anyone has any suggestion?
 
 Thanks
 
 John
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survplot() for cph(): Design vs rms

2011-08-26 Thread array chip
Thanks Frank!



- Original Message -
From: Frank Harrell f.harr...@vanderbilt.edu
To: r-help@r-project.org
Cc: 
Sent: Thursday, August 25, 2011 4:33 PM
Subject: Re: [R] survplot() for cph(): Design vs rms

http://biostat.mc.vanderbilt.edu/Rrms shows differences between Design and
rms
Frank

array chip wrote:
 
 Thank you David. The plot from Design package will draw a plot of survival
 probability at 5 years (in my example) versus different age. I will look
 into Predict() in rms package to see how this can be done. 
 
 
 John
 
 
 
 
 
 
 - Original Message -
 From: David Winsemius lt;dwinsem...@comcast.netgt;
 To: array chip lt;arrayprof...@yahoo.comgt;
 Cc: r-h...@stat.math.ethz.ch lt;r-h...@stat.math.ethz.chgt;
 Sent: Thursday, August 25, 2011 3:15 PM
 Subject: Re: [R] survplot() for cph(): Design vs rms
 
 
 On Aug 25, 2011, at 5:11 PM, array chip wrote:
 
 Hi, in Design package, a plot of survival probability vs. a covariate can
 be generated by survplot() on a cph object using the folliowing code:
 
 n - 1000
 set.seed(731)
 age - 50 + 12*rnorm(n)
 label(age) - Age
 sex - factor(sample(c('male','female'), n, TRUE))
 cens - 15*runif(n)
 h - .02*exp(.04*(age-50)+.8*(sex=='Female'))
 dt - -log(runif(n))/h
 label(dt) - 'Follow-up Time'
 e - ifelse(dt = cens,1,0)
 dt - pmin(dt, cens)
 units(dt) - Year
 dd - datadist(age, sex)
 options(datadist='dd')
 S - Surv(dt,e)
 
 
 library(Design)
 
 f - cph(S ~ age, surv=TRUE,x=T,y=T)
 plot(f,age=NA,time=5)
 
 But the same code won't work if I used rms package:
 
 detach(package:Design)
 library(rms)
 
 f - cph(S ~ age, surv=TRUE,x=T,y=T)
 
 plot(f,age=NA,time=5)
 Error in xy.coords(x, y, xlabel, ylabel, log) :
   'x' and 'y' lengths differ
 
 
 Is there a way to plot the same graph using rms package.
 
 I don't remember what that would have done in Design and you have not
 explained what you expected. You should read the rms help page for cph and
 walk through the examples where ht euse of the Predict function is
 illustrated. The plotting support for Predict objects is excellent.
 
 
 I like to use Frank Harrell's new package rms and try to avoid using old
 Design package.
 
 Thanks
 
 John
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius, MD
 West Hartford, CT
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/survplot-for-cph-Design-vs-rms-tp3769430p3769683.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] baseline cumulative hazard by basehaz()

2011-08-26 Thread array chip
Hi, 


basehaz() in survival package is said to estimate the baseline cumulative 
hazard from coxph(), but it actually calculate cumulative hazard by 
-log(survfit(coxph.object)). But survfit() calculate survival based on MEAN 
covariate value, not covariate value of 0. I thought baseline cumulative hazard 
means when covariate value is at 0. Did I misunderstand something?

Thanks

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] survplot() for cph(): Design vs rms

2011-08-25 Thread array chip
Hi, in Design package, a plot of survival probability vs. a covariate can be 
generated by survplot() on a cph object using the folliowing code:

n - 1000
set.seed(731)
age - 50 + 12*rnorm(n)
label(age) - Age
sex - factor(sample(c('male','female'), n, TRUE))
cens - 15*runif(n)
h - .02*exp(.04*(age-50)+.8*(sex=='Female'))
dt - -log(runif(n))/h
label(dt) - 'Follow-up Time'
e - ifelse(dt = cens,1,0)
dt - pmin(dt, cens)
units(dt) - Year
dd - datadist(age, sex)
options(datadist='dd')
S - Surv(dt,e)


library(Design)

f - cph(S ~ age, surv=TRUE,x=T,y=T)
plot(f,age=NA,time=5)

But the same code won't work if I used rms package:

detach(package:Design)
library(rms)

f - cph(S ~ age, surv=TRUE,x=T,y=T)

plot(f,age=NA,time=5)
Error in xy.coords(x, y, xlabel, ylabel, log) : 
  'x' and 'y' lengths differ


Is there a way to plot the same graph using rms package. I like to use Frank 
Harrell's new package rms and try to avoid using old Design package.

Thanks

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survplot() for cph(): Design vs rms

2011-08-25 Thread array chip
Thank you David. The plot from Design package will draw a plot of survival 
probability at 5 years (in my example) versus different age. I will look into 
Predict() in rms package to see how this can be done. 


John






- Original Message -
From: David Winsemius dwinsem...@comcast.net
To: array chip arrayprof...@yahoo.com
Cc: r-h...@stat.math.ethz.ch r-h...@stat.math.ethz.ch
Sent: Thursday, August 25, 2011 3:15 PM
Subject: Re: [R] survplot() for cph(): Design vs rms


On Aug 25, 2011, at 5:11 PM, array chip wrote:

 Hi, in Design package, a plot of survival probability vs. a covariate can be 
 generated by survplot() on a cph object using the folliowing code:
 
 n - 1000
 set.seed(731)
 age - 50 + 12*rnorm(n)
 label(age) - Age
 sex - factor(sample(c('male','female'), n, TRUE))
 cens - 15*runif(n)
 h - .02*exp(.04*(age-50)+.8*(sex=='Female'))
 dt - -log(runif(n))/h
 label(dt) - 'Follow-up Time'
 e - ifelse(dt = cens,1,0)
 dt - pmin(dt, cens)
 units(dt) - Year
 dd - datadist(age, sex)
 options(datadist='dd')
 S - Surv(dt,e)
 
 
 library(Design)
 
 f - cph(S ~ age, surv=TRUE,x=T,y=T)
 plot(f,age=NA,time=5)
 
 But the same code won't work if I used rms package:
 
 detach(package:Design)
 library(rms)
 
 f - cph(S ~ age, surv=TRUE,x=T,y=T)
 
 plot(f,age=NA,time=5)
 Error in xy.coords(x, y, xlabel, ylabel, log) :
   'x' and 'y' lengths differ
 
 
 Is there a way to plot the same graph using rms package.

I don't remember what that would have done in Design and you have not explained 
what you expected. You should read the rms help page for cph and walk through 
the examples where ht euse of the Predict function is illustrated. The plotting 
support for Predict objects is excellent.


 I like to use Frank Harrell's new package rms and try to avoid using old 
 Design package.
 
 Thanks
 
 John
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Labelling all variables at once (using Hmisc label)

2011-08-17 Thread array chip
Hi Frank, it's true to one of your reply to my previous post, can only be seen 
in Nabble.



- Original Message -
From: David Winsemius dwinsem...@comcast.net
To: Frank Harrell f.harr...@vanderbilt.edu
Cc: r-help@r-project.org
Sent: Wednesday, August 17, 2011 3:08 PM
Subject: Re: [R] Labelling all variables at once (using Hmisc label)


On Aug 17, 2011, at 5:49 PM, Frank Harrell wrote:

 I'm puzzled.  I provided a solution that did not require looping.
 Frank

Hi Frank;

Do you realize that some portions of your Nabble postings are not being 
communicated to the ordinary mail-clients? This code did not appear in my copy 
from Nabble. I didn't cut anything. This appears in Nabble when you look there:

 d - data.frame(a=1:2,b=3:4)
 label(d, self=FALSE) - c('A','B')
 contents(d)
Data frame:d    2 observations and 2 variables Maximum # NAs:0 Labels Storage a 
A integer
b B integer



I observed and noted that was happening in a prior message. The missing 
formatted Nabble studd also comes across without linefeeds when pasted, so I 
added some of those back in.

--David.



 
 Monsieur Do wrote:
 
 I did read the help page before posting, but didn't find the direct way...
 My function here works fine. But just for learning purposes, I'd like to
 be able to avoid the loop...
 
 with.labels - function(x, labels=NULL, csvfile=NULL) {
 if(!is.null(csvfile)) labels - read.csv(csvfile, sep=\t, header=F,
 stringsAsFactors=F)[,1]
 for(i in 1:length(x)) label(x[,i]) - labels[i]
 if(length(labels) != length(x)) cat(Warning: data and labels are not of
 same length\n)
 return(x)
 }
 
 Thanks
 
 Message: 11
  Date: Tue, 16 Aug 2011 04:22:07 -0700 (PDT)
  From:
 Frank Harrell lt;f.harr...@vanderbilt.edugt;
  To:
 r-help@r-project.org
  Subject:
 Re: [R] Labelling all variables at once (using Hmisc label)
  Message-ID:
 1313493727519-3746928.p...@n4.nabble.com
  Content-Type:
 text/plain; charset=UTF-8
 
  Do
 require(Hmisc); ?label to see the help file for label.  It will show you
  how to
 do this:
 
  Monsieur
 Do wrote:
  
 I have a dataset and a list of labels. I simply want
 
   to
 apply the labels to the variables, all at once. The only way I was able
   to do
 it was using a loop:
  
   for (i in 1:length(data)) label(data[,i]) -data.labels[i]
  
   I'd like to find the non-loop way to do it, using
 
  
 apply or the like... Any help appreciated.
  
  -
 
  Frank
 Harrell
  Department
 of Biostatistics, Vanderbilt University
     [[alternative HTML version deleted]]
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Labelling-all-variables-at-once-using-Hmisc-label-tp3745660p3751273.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] calibration curve for cph()

2011-08-16 Thread array chip
Dear Frank,

Thanks for suggesting val.surv() function from rms package. It's exactly what I 
need. I tried the example on the help page and tweak towards to my situation, 
but got an error message. Could you suggestion what went wrong?

library(rms)
set.seed(123)  # so can reproduce results
n - 1000
age - 50 + 12*rnorm(n)
sex - factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4)))
cens - 15*runif(n)
h - .02*exp(.04*(age-50)+.8*(sex=='Female'))
t - -log(runif(n))/h
units(t) - 'Year'
label(t) - 'Time to Event'
ev - ifelse(t = cens, 1, 0)
t - pmin(t, cens)
S - Surv(t, ev)



f - cph(S ~ age + sex, x=TRUE, y=TRUE)
val.surv(f, newdata=data.frame(age,sex), S=S, u=5)  ## pretend the same 
training data as the independent dataset

Error in val.surv(f, newdata = data.frame(age, sex), S = S, u = 5) : 
  unused argument(s) (u = 5)


Many thanks

John










- Original Message -
From: Frank Harrell f.harr...@vanderbilt.edu
To: r-help@r-project.org
Cc: 
Sent: Tuesday, August 16, 2011 4:23 AM
Subject: Re: [R] calibration curve for cph()



David Winsemius wrote:
 
 A combination of Predict (your newdata), cut2, and the plotting function
 of your choice ought to suffice. But thought that cross-validation was an
 option. Not at console at the moment (just off airplane.)
 
 Sent from my iPhone
 
 On Aug 15, 2011, at 5:26 PM, array chip lt;arrayprof...@yahoo.comgt;
 wrote:
 
 is there a R function that produces calibration curve on an independetn
 data automatically, just like what calibrate() does on the training data
 itself?
 
 Thanks
 
 John
 
 From: Comcast lt;dwinsem...@comcast.netgt;
 To: array chip lt;arrayprof...@yahoo.comgt;
 Cc: r-help@r-project.org lt;r-help@r-project.orggt;
 Sent: Monday, August 15, 2011 2:04 PM
 Subject: Re: [R] calibration curve for cph()
 
 Build a prediction function using  'Function' that gets applied to set2.
 Calibrate and validate.
 
 -- 
 David
 
 Sent from my iPhone
 
 On Aug 15, 2011, at 11:31 AM, array chip lt;arrayprof...@yahoo.comgt;
 wrote:
 
  Hi, the calibrate.cph() function in rms package generate calibration
 curve for Cox model on the same dataset where the model was derived using
 bootstrapping or cross-validation. If I have the model built on dataset
 1, and now I want to produce a calibration curve for this model on an
 independent dataset 2, how can I do that?
  
  Thanks
  
  John
  
     [[alternative HTML version deleted]]
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
     [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/calibration-curve-for-cph-tp3745328p3746931.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] calibration curve for cph()

2011-08-16 Thread array chip
Oops, thank for reminding. I found that an in-house package interfered with rms 
package, which caused the error. 

Thanks David!

John



- Original Message -
From: David Winsemius dwinsem...@comcast.net
To: array chip arrayprof...@yahoo.com
Cc: Frank Harrell f.harr...@vanderbilt.edu; r-help@r-project.org 
r-help@r-project.org
Sent: Tuesday, August 16, 2011 11:27 AM
Subject: Re: [R] calibration curve for cph()


On Aug 16, 2011, at 1:57 PM, array chip wrote:

 Dear Frank,
 
 Thanks for suggesting val.surv() function from rms package. It's exactly what 
 I need. I tried the example on the help page and tweak towards to my 
 situation, but got an error message. Could you suggestion what went wrong?

It is interesting that neither my mail client nor the archives show the code 
that was sent to nabble.

For posterity it was
require(rms)
?val.surv
 
 library(rms)
 set.seed(123)              # so can reproduce results
 n - 1000
 age - 50 + 12*rnorm(n)
 sex - factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4)))
 cens - 15*runif(n)
 h - .02*exp(.04*(age-50)+.8*(sex=='Female'))
 t - -log(runif(n))/h
 units(t) - 'Year'
 label(t) - 'Time to Event'
 ev - ifelse(t = cens, 1, 0)
 t - pmin(t, cens)
 S - Surv(t, ev)
 
 
 
 f - cph(S ~ age + sex, x=TRUE, y=TRUE)
 val.surv(f, newdata=data.frame(age,sex), S=S, u=5)  ## pretend the same 
 training data as the independent dataset

Unable to reproduce. I get sensible output.  You may need to post 
sessionInfo(). Do you have all the dependencies installed? I noticed that even 
after rms was loaded with messages about also loading Hmisc,  survival, and 
splines, that there was a further message saying polspline was being loaded.

 Error in val.surv(f, newdata = data.frame(age, sex), S = S, u = 5) :
   unused argument(s) (u = 5)
 
--david.

 sessionInfo()
R version 2.13.1 RC (2011-07-03 r56263)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] polspline_1.1.5 rms_3.3-1       Hmisc_3.8-3     survival_2.36-9
[5] TTR_0.20-3      xts_0.8-0       zoo_1.6-5       sos_1.3-0
[9] brew_1.0-6      lattice_0.19-30

loaded via a namespace (and not attached):
[1] cluster_1.14.0 grid_2.13.1    tools_2.13.1


 
 - Original Message -
 From: Frank Harrell f.harr...@vanderbilt.edu
 To: r-help@r-project.org
 Cc:
 Sent: Tuesday, August 16, 2011 4:23 AM
 Subject: Re: [R] calibration curve for cph()
 
 
 
 David Winsemius wrote:
 
 A combination of Predict (your newdata), cut2, and the plotting function
 of your choice ought to suffice. But thought that cross-validation was an
 option. Not at console at the moment (just off airplane.)
 
 Sent from my iPhone
 
 On Aug 15, 2011, at 5:26 PM, array chip lt;arrayprof...@yahoo.comgt;
 wrote:
 
 is there a R function that produces calibration curve on an independetn
 data automatically, just like what calibrate() does on the training data
 itself?
 
 Thanks
 
 John
 
 From: Comcast lt;dwinsem...@comcast.netgt;
 To: array chip lt;arrayprof...@yahoo.comgt;
 Cc: r-help@r-project.org lt;r-help@r-project.orggt;
 Sent: Monday, August 15, 2011 2:04 PM
 Subject: Re: [R] calibration curve for cph()
 
 Build a prediction function using  'Function' that gets applied to set2.
 Calibrate and validate.
 
 --David
 
 Sent from my iPhone
 
 On Aug 15, 2011, at 11:31 AM, array chip lt;arrayprof...@yahoo.comgt;
 wrote:
 
 Hi, the calibrate.cph() function in rms package generate calibration
 curve for Cox model on the same dataset where the model was derived using
 bootstrapping or cross-validation. If I have the model built on dataset
 1, and now I want to produce a calibration curve for this model on an
 independent dataset 2, how can I do that?
 
 Thanks
 
 John

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cuminc() in cmprsk package for cumulative incidence

2011-08-16 Thread array chip
Hi, 


To use cuminc() from cmprsk package, if a subject has 2 
events (both the event of interest and the event of competing risk), 
should I create 2 observations for this subject in the dataset, one for 
each event with different fstatus (1 and 2), or just 1 observation with 
whatever event that happened first? My analysis objective is calculate 
cumulative incidence for the event of interest.

Thanks!

John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] calibration curve for cph()

2011-08-15 Thread array chip
Hi, the calibrate.cph() function in rms package generate calibration curve for 
Cox model on the same dataset where the model was derived using bootstrapping 
or cross-validation. If I have the model built on dataset 1, and now I want to 
produce a calibration curve for this model on an independent dataset 2, how can 
I do that?

Thanks

John

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   >