Re: [R] linear mixed model using lmer
Thanks Jeff for reminding me that the attachment is removed. I put it in my google drive if anyone wants to test the data (https://drive.google.com/file/d/1lgVZVLHeecp9a_sFxEPeg6353O-qXZhM/view?usp=sharing) I'll try the mixed model mailing list as well. John On Friday, March 4, 2022, 04:56:20 PM PST, Jeff Newmiller wrote: a) There is a mailing list for that: https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models b) Read the Posting Guide, as most attachment types are removed to avoid propagating worms/viruses. (None seen upon receipt of this email.) On March 4, 2022 4:41:57 PM PST, array chip via R-help wrote: >Dear all, I have this simple dataset to measure the yeild of a crop collected >in 2 batches (attached). when I ran a simple inear mixed model using lmer to >estimate within-batch and between-batch variability, the between-batch >variability is 0. The run showed that data is singular. Does anyone know why >the data is singular and what's the reason for 0 variability? is it because >the dataset only has 2 batches? >> daty<-read.table("datx.txt",sep='\t',header=T,row.names=NULL) >> library(lme4)> lmer(yield~1+(1|batch),daty) >boundary (singular) fit: see ?isSingular >Linear mixed model fit by REML ['lmerMod'] >Formula: yield ~ 1 + (1 | batch) > Data: daty >REML criterion at convergence: 115.6358 >Random effects: > Groups Name Std.Dev. > batch (Intercept) 0.000 > Residual 2.789 >Number of obs: 24, groups: batch, 2 >Fixed Effects: >(Intercept) > 5.788 > >Thanks! >John -- Sent from my phone. Please excuse my brevity. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2 KM curves on the same plot
Hi Jim, I found out why clip() does not work with lines(survfit.object)! If you look at code of function survival:::lines.survfit, in th middle of the code: do.clip <- getOption("plot.survfit") if (!is.null(xx <- do.clip$plotclip)) clip(xx[1], xx[2], xx[3], xx[4]) This will reset the clipping to the defualt plot region! So I just comment out the last 2 lines of the above 3 lines, and created a customized lines2 function. Now it works! It's fun to learn clip(). Thanks, John On Wednesday, September 30, 2020, 01:47:55 AM PDT, Jim Lemon wrote: Hi John, Hmmm, this works: plot(1:10) xylim<-par("usr") clip(5,xylim[2],xylim[3],xylim[4]) lines(10:1) so I suspect that there is a "lines" method that resets the clipping region out of sight. Fortunately Mark Schwartz provided a way to get your plot so I will give the wall against which I have been banging my head a break. Jim On Wed, Sep 30, 2020 at 1:57 PM array chip wrote: > > Jim, > > I tried a few things, I found that clip() works if I just do some regular > graphing tasks. But as long as I run lines(fit) with "fit" object is a > survfit object, this would reset to default plot region. See the ovarian > example below: > > library(survival) > ovarian1<-ovarian > ovarian1$fustat[ovarian$futime>450]<-0 > ovarian1$futime[ovarian$futime>450]<-450 > ovarian2<-subset(ovarian,futime>450) > > fit1 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian1) > fit2 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian2) > > plot(fit1, xlim=c(0,1200), col = 1:2) > abline(v=450) > xylim<-par("usr") > points(-1,-1) > clip(450,xylim[2],xylim[3],xylim[4]) > abline(h=0.5,col=2) ### YES, clipping() works! > > lines(fit2, col = 3:4,lty=2) ### clipping does not work! reset to default > plot region > abline(h=0.4,col=2) ### NO, clipping() does not work! > > So disappointed with this, otherwise this would be such a simple method to do > what I want. > > Thanks, > > John > > On Tuesday, September 29, 2020, 07:58:53 PM PDT, Jim Lemon > wrote: > > Hi John, > I should have remembered this. For some reason, the clip() function > doesn't operate until you have issued a graphics command. Try: > > points(-1,-1) > > before calling lines() > > Jim > > On Wed, Sep 30, 2020 at 12:26 PM array chip wrote: > > > > Hi Jim, > > > > I tried the clip() function below, surprisingly it did not work! I read the > > R help file and feel your script should work. To have a workable example, I > > used the ovarian dataset in the survival package as an example: > > > > ovarian1<-ovarian > > ovarian1$fustat[ovarian$futime>450]<-0 > > ovarian1$futime[ovarian$futime>450]<-450 > > > > ovarian2<-subset(ovarian,futime>450) > > > > fit1 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian1) > > fit2 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian2) > > > > plot(fit1, xlim=c(0,1200), col = 1:2) > > abline(v=450) > > xylim<-par("usr") > > clip(450,xylim[2],xylim[3],xylim[4]) > > lines(fit2, col = 3:4,lty=2) > > > > I can still see that the extra horizontal line on the top. > > > > Can you or anyone have any suggestion what went wrong? > > > > Thanks, > > > > John > > > > > > On Tuesday, September 29, 2020, 01:35:48 AM PDT, Jim Lemon > > wrote: > > > > > > > > > > > > Hi John, > > Perhaps the most direct way would be: > > > > plot(fit1, col=1:2) > > xylim<-par("usr") > > clip(4,xylim[2],xylim[3],xylim[4]) > > lines(fit2,col=1:2) > > > > Remember that the new clipping rectangle will persist until you or > > something else resets it. > > > > Jim > > > > On Tue, Sep 29, 2020 at 10:34 AM array chip via R-help > > wrote: > > > > > > Hello, > > > > > > Can anyone suggest a simple way to generate a Kaplan-Meier plot with 2 > > > survfit objects, just like this one: > > > > > > https://drive.google.com/file/d/1fEcpdIdE2xYtA6LBQN9ck3JkL6-goabX/view?usp=sharing > > > > > > Suppose I have 2 survfit objects: fit1 is for the curve on the left > > > (survtime has been truncated to the cutoff line: year 5), fit2 is for the > > > curve on the right (minimum survival time is at the cutoff line: year 5), > > > but if I do the following: > > > > > > plot(fit1, col=1:2) > > > lines(fit2,col=1:2) > > > > > > Then I will have an h
Re: [R] 2 KM curves on the same plot
Thank you Jim for helping! Yes, I will try Mark's method. John On Wednesday, September 30, 2020, 01:47:55 AM PDT, Jim Lemon wrote: Hi John, Hmmm, this works: plot(1:10) xylim<-par("usr") clip(5,xylim[2],xylim[3],xylim[4]) lines(10:1) so I suspect that there is a "lines" method that resets the clipping region out of sight. Fortunately Mark Schwartz provided a way to get your plot so I will give the wall against which I have been banging my head a break. Jim On Wed, Sep 30, 2020 at 1:57 PM array chip wrote: > > Jim, > > I tried a few things, I found that clip() works if I just do some regular > graphing tasks. But as long as I run lines(fit) with "fit" object is a > survfit object, this would reset to default plot region. See the ovarian > example below: > > library(survival) > ovarian1<-ovarian > ovarian1$fustat[ovarian$futime>450]<-0 > ovarian1$futime[ovarian$futime>450]<-450 > ovarian2<-subset(ovarian,futime>450) > > fit1 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian1) > fit2 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian2) > > plot(fit1, xlim=c(0,1200), col = 1:2) > abline(v=450) > xylim<-par("usr") > points(-1,-1) > clip(450,xylim[2],xylim[3],xylim[4]) > abline(h=0.5,col=2) ### YES, clipping() works! > > lines(fit2, col = 3:4,lty=2) ### clipping does not work! reset to default > plot region > abline(h=0.4,col=2) ### NO, clipping() does not work! > > So disappointed with this, otherwise this would be such a simple method to do > what I want. > > Thanks, > > John > > On Tuesday, September 29, 2020, 07:58:53 PM PDT, Jim Lemon > wrote: > > Hi John, > I should have remembered this. For some reason, the clip() function > doesn't operate until you have issued a graphics command. Try: > > points(-1,-1) > > before calling lines() > > Jim > > On Wed, Sep 30, 2020 at 12:26 PM array chip wrote: > > > > Hi Jim, > > > > I tried the clip() function below, surprisingly it did not work! I read the > > R help file and feel your script should work. To have a workable example, I > > used the ovarian dataset in the survival package as an example: > > > > ovarian1<-ovarian > > ovarian1$fustat[ovarian$futime>450]<-0 > > ovarian1$futime[ovarian$futime>450]<-450 > > > > ovarian2<-subset(ovarian,futime>450) > > > > fit1 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian1) > > fit2 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian2) > > > > plot(fit1, xlim=c(0,1200), col = 1:2) > > abline(v=450) > > xylim<-par("usr") > > clip(450,xylim[2],xylim[3],xylim[4]) > > lines(fit2, col = 3:4,lty=2) > > > > I can still see that the extra horizontal line on the top. > > > > Can you or anyone have any suggestion what went wrong? > > > > Thanks, > > > > John > > > > > > On Tuesday, September 29, 2020, 01:35:48 AM PDT, Jim Lemon > > wrote: > > > > > > > > > > > > Hi John, > > Perhaps the most direct way would be: > > > > plot(fit1, col=1:2) > > xylim<-par("usr") > > clip(4,xylim[2],xylim[3],xylim[4]) > > lines(fit2,col=1:2) > > > > Remember that the new clipping rectangle will persist until you or > > something else resets it. > > > > Jim > > > > On Tue, Sep 29, 2020 at 10:34 AM array chip via R-help > > wrote: > > > > > > Hello, > > > > > > Can anyone suggest a simple way to generate a Kaplan-Meier plot with 2 > > > survfit objects, just like this one: > > > > > > https://drive.google.com/file/d/1fEcpdIdE2xYtA6LBQN9ck3JkL6-goabX/view?usp=sharing > > > > > > Suppose I have 2 survfit objects: fit1 is for the curve on the left > > > (survtime has been truncated to the cutoff line: year 5), fit2 is for the > > > curve on the right (minimum survival time is at the cutoff line: year 5), > > > but if I do the following: > > > > > > plot(fit1, col=1:2) > > > lines(fit2,col=1:2) > > > > > > Then I will have an horizontal line on the top that connect from 0 to 4 > > > years, which I do not want that to be drawn (see blue arrow below): > > > > > > https://drive.google.com/file/d/178mQGlhnaOg9PA-oE-W_W5CtrGD03ljH/view?usp=sharing > > > > > > Can anyone have a strategy to make this kind of plot happen? > > > > > > Thanks, > > > > > > John > > > > > > > > __ > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2 KM curves on the same plot
Jim, I tried a few things, I found that clip() works if I just do some regular graphing tasks. But as long as I run lines(fit) with "fit" object is a survfit object, this would reset to default plot region. See the ovarian example below: library(survival) ovarian1<-ovarian ovarian1$fustat[ovarian$futime>450]<-0 ovarian1$futime[ovarian$futime>450]<-450 ovarian2<-subset(ovarian,futime>450) fit1 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian1) fit2 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian2) plot(fit1, xlim=c(0,1200), col = 1:2) abline(v=450) xylim<-par("usr") points(-1,-1) clip(450,xylim[2],xylim[3],xylim[4]) abline(h=0.5,col=2) ### YES, clipping() works! lines(fit2, col = 3:4,lty=2) ### clipping does not work! reset to default plot region abline(h=0.4,col=2) ### NO, clipping() does not work! So disappointed with this, otherwise this would be such a simple method to do what I want. Thanks, John On Tuesday, September 29, 2020, 07:58:53 PM PDT, Jim Lemon wrote: Hi John, I should have remembered this. For some reason, the clip() function doesn't operate until you have issued a graphics command. Try: points(-1,-1) before calling lines() Jim On Wed, Sep 30, 2020 at 12:26 PM array chip wrote: > > Hi Jim, > > I tried the clip() function below, surprisingly it did not work! I read the R > help file and feel your script should work. To have a workable example, I > used the ovarian dataset in the survival package as an example: > > ovarian1<-ovarian > ovarian1$fustat[ovarian$futime>450]<-0 > ovarian1$futime[ovarian$futime>450]<-450 > > ovarian2<-subset(ovarian,futime>450) > > fit1 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian1) > fit2 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian2) > > plot(fit1, xlim=c(0,1200), col = 1:2) > abline(v=450) > xylim<-par("usr") > clip(450,xylim[2],xylim[3],xylim[4]) > lines(fit2, col = 3:4,lty=2) > > I can still see that the extra horizontal line on the top. > > Can you or anyone have any suggestion what went wrong? > > Thanks, > > John > > > On Tuesday, September 29, 2020, 01:35:48 AM PDT, Jim Lemon > wrote: > > > > > > Hi John, > Perhaps the most direct way would be: > > plot(fit1, col=1:2) > xylim<-par("usr") > clip(4,xylim[2],xylim[3],xylim[4]) > lines(fit2,col=1:2) > > Remember that the new clipping rectangle will persist until you or > something else resets it. > > Jim > > On Tue, Sep 29, 2020 at 10:34 AM array chip via R-help > wrote: > > > > Hello, > > > > Can anyone suggest a simple way to generate a Kaplan-Meier plot with 2 > > survfit objects, just like this one: > > > > https://drive.google.com/file/d/1fEcpdIdE2xYtA6LBQN9ck3JkL6-goabX/view?usp=sharing > > > > Suppose I have 2 survfit objects: fit1 is for the curve on the left > > (survtime has been truncated to the cutoff line: year 5), fit2 is for the > > curve on the right (minimum survival time is at the cutoff line: year 5), > > but if I do the following: > > > > plot(fit1, col=1:2) > > lines(fit2,col=1:2) > > > > Then I will have an horizontal line on the top that connect from 0 to 4 > > years, which I do not want that to be drawn (see blue arrow below): > > > > https://drive.google.com/file/d/178mQGlhnaOg9PA-oE-W_W5CtrGD03ljH/view?usp=sharing > > > > Can anyone have a strategy to make this kind of plot happen? > > > > Thanks, > > > > John > > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2 KM curves on the same plot
Hi Jim, I tried points(-1,-1) before lines() and before clip(), but either way, it still shows everything, :-( It's interesting that the examples with hist() provided by the R help of clip function works nicely. I also tried a simple linear regression plots below, clip() works, too. dat<-data.frame(x=1:10,y=1:10) fit<-lm(y~x,dat) plot(1:10) abline(fit) xylim<-par("usr") clip(6,xylim[2],xylim[3],xylim[4]) abline(fit,col=2) ## yes, it only shows the fit line from 6 to the right lines(c(2,8),c(5,5)) ## yes, it only shows the line from 6 to the right So it's puzzling that only when using lines() with a survfit() object (ovarian example below), somehow clip() doesn't work John On Tuesday, September 29, 2020, 07:58:53 PM PDT, Jim Lemon wrote: Hi John, I should have remembered this. For some reason, the clip() function doesn't operate until you have issued a graphics command. Try: points(-1,-1) before calling lines() Jim On Wed, Sep 30, 2020 at 12:26 PM array chip wrote: > > Hi Jim, > > I tried the clip() function below, surprisingly it did not work! I read the R > help file and feel your script should work. To have a workable example, I > used the ovarian dataset in the survival package as an example: > > ovarian1<-ovarian > ovarian1$fustat[ovarian$futime>450]<-0 > ovarian1$futime[ovarian$futime>450]<-450 > > ovarian2<-subset(ovarian,futime>450) > > fit1 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian1) > fit2 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian2) > > plot(fit1, xlim=c(0,1200), col = 1:2) > abline(v=450) > xylim<-par("usr") > clip(450,xylim[2],xylim[3],xylim[4]) > lines(fit2, col = 3:4,lty=2) > > I can still see that the extra horizontal line on the top. > > Can you or anyone have any suggestion what went wrong? > > Thanks, > > John > > > On Tuesday, September 29, 2020, 01:35:48 AM PDT, Jim Lemon > wrote: > > > > > > Hi John, > Perhaps the most direct way would be: > > plot(fit1, col=1:2) > xylim<-par("usr") > clip(4,xylim[2],xylim[3],xylim[4]) > lines(fit2,col=1:2) > > Remember that the new clipping rectangle will persist until you or > something else resets it. > > Jim > > On Tue, Sep 29, 2020 at 10:34 AM array chip via R-help > wrote: > > > > Hello, > > > > Can anyone suggest a simple way to generate a Kaplan-Meier plot with 2 > > survfit objects, just like this one: > > > > https://drive.google.com/file/d/1fEcpdIdE2xYtA6LBQN9ck3JkL6-goabX/view?usp=sharing > > > > Suppose I have 2 survfit objects: fit1 is for the curve on the left > > (survtime has been truncated to the cutoff line: year 5), fit2 is for the > > curve on the right (minimum survival time is at the cutoff line: year 5), > > but if I do the following: > > > > plot(fit1, col=1:2) > > lines(fit2,col=1:2) > > > > Then I will have an horizontal line on the top that connect from 0 to 4 > > years, which I do not want that to be drawn (see blue arrow below): > > > > https://drive.google.com/file/d/178mQGlhnaOg9PA-oE-W_W5CtrGD03ljH/view?usp=sharing > > > > Can anyone have a strategy to make this kind of plot happen? > > > > Thanks, > > > > John > > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2 KM curves on the same plot
Hi Jim, I tried the clip() function below, surprisingly it did not work! I read the R help file and feel your script should work. To have a workable example, I used the ovarian dataset in the survival package as an example: ovarian1<-ovarian ovarian1$fustat[ovarian$futime>450]<-0 ovarian1$futime[ovarian$futime>450]<-450 ovarian2<-subset(ovarian,futime>450) fit1 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian1) fit2 <- survfit(Surv(futime, fustat) ~ rx, data = ovarian2) plot(fit1, xlim=c(0,1200), col = 1:2) abline(v=450) xylim<-par("usr") clip(450,xylim[2],xylim[3],xylim[4]) lines(fit2, col = 3:4,lty=2) I can still see that the extra horizontal line on the top. Can you or anyone have any suggestion what went wrong? Thanks, John On Tuesday, September 29, 2020, 01:35:48 AM PDT, Jim Lemon wrote: Hi John, Perhaps the most direct way would be: plot(fit1, col=1:2) xylim<-par("usr") clip(4,xylim[2],xylim[3],xylim[4]) lines(fit2,col=1:2) Remember that the new clipping rectangle will persist until you or something else resets it. Jim On Tue, Sep 29, 2020 at 10:34 AM array chip via R-help wrote: > > Hello, > > Can anyone suggest a simple way to generate a Kaplan-Meier plot with 2 > survfit objects, just like this one: > > https://drive.google.com/file/d/1fEcpdIdE2xYtA6LBQN9ck3JkL6-goabX/view?usp=sharing > > Suppose I have 2 survfit objects: fit1 is for the curve on the left (survtime > has been truncated to the cutoff line: year 5), fit2 is for the curve on the > right (minimum survival time is at the cutoff line: year 5), but if I do the > following: > > plot(fit1, col=1:2) > lines(fit2,col=1:2) > > Then I will have an horizontal line on the top that connect from 0 to 4 > years, which I do not want that to be drawn (see blue arrow below): > > https://drive.google.com/file/d/178mQGlhnaOg9PA-oE-W_W5CtrGD03ljH/view?usp=sharing > > Can anyone have a strategy to make this kind of plot happen? > > Thanks, > > John > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2 KM curves on the same plot
Thank you Marc as well! I'll try both ways! Yes this is an oncology study with time set at 5 John On Tuesday, September 29, 2020, 07:08:39 AM PDT, Marc Schwartz wrote: Hi John, >From the looks of the first plot, it would appear that perhaps you are engaged >in a landmark analysis in an oncology setting, with the landmark time set at 5 >years. On the off chance that you are not familiar with the pros and cons of >that methodology, Google "landmark analysis", which should yield a number of >references. With respect to the plot itself, here is one approach, in addition to the one that Jim suggested. This will use a 1 x 2 matrix of plot regions, and adjust the various axes accordingly. library(survival) ## Create two models with the same data for this use, the second adding 150 to the event times ## to mimic a landmark time at 150 weeks fit <- survfit(Surv(time, status) ~ x, data = aml) fit2 <- survfit(Surv(time, status) ~ x, data = aml) fit2$time <- fit2$time + 150 ## create the x 1 x 2 plot matrix par(mfrow = c(1, 2)) ## Set the plot region margins so there is no space to the right par(mar = c(4, 4, 4, 0)) ## Fix the plot limits for consistency ## xaxs = "i" removes the default 4% extensions to the plot region limits plot(fit, axes = FALSE, xlim = c(0, 150), ylim = c(0, 1), xaxs = "i") axis(1, at = seq(0, 150, 50), line = -1) axis(2, las = 1) ## Set the plot region margins so there is no space to the left par(mar = c(4, 0, 4, 4)) ## Set the plot limits for the second time interval plot(fit2, axes = FALSE, xlim = c(150, 300), ylim = c(0, 1), xaxs = "i") axis(1, at = seq(150, 300, 50), line = -1) axis(4, las = 1) ## Draw the vertical line at 150 weeks axis(2, at = seq(0, 1, 0.2), labels = FALSE, lty = "dashed") Regards, Marc Schwartz > On Sep 28, 2020, at 8:33 PM, array chip via R-help > wrote: > > Hello, > > Can anyone suggest a simple way to generate a Kaplan-Meier plot with 2 > survfit objects, just like this one: > > https://drive.google.com/file/d/1fEcpdIdE2xYtA6LBQN9ck3JkL6-goabX/view?usp=sharing > > Suppose I have 2 survfit objects: fit1 is for the curve on the left (survtime > has been truncated to the cutoff line: year 5), fit2 is for the curve on the > right (minimum survival time is at the cutoff line: year 5), but if I do the > following: > > plot(fit1, col=1:2) > lines(fit2,col=1:2) > > Then I will have an horizontal line on the top that connect from 0 to 4 > years, which I do not want that to be drawn (see blue arrow below): > > https://drive.google.com/file/d/178mQGlhnaOg9PA-oE-W_W5CtrGD03ljH/view?usp=sharing > > Can anyone have a strategy to make this kind of plot happen? > > Thanks, > > John > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 2 KM curves on the same plot
Thank you very much Jim, this is great!! John On Tuesday, September 29, 2020, 01:35:48 AM PDT, Jim Lemon wrote: Hi John, Perhaps the most direct way would be: plot(fit1, col=1:2) xylim<-par("usr") clip(4,xylim[2],xylim[3],xylim[4]) lines(fit2,col=1:2) Remember that the new clipping rectangle will persist until you or something else resets it. Jim On Tue, Sep 29, 2020 at 10:34 AM array chip via R-help wrote: > > Hello, > > Can anyone suggest a simple way to generate a Kaplan-Meier plot with 2 > survfit objects, just like this one: > > https://drive.google.com/file/d/1fEcpdIdE2xYtA6LBQN9ck3JkL6-goabX/view?usp=sharing > > Suppose I have 2 survfit objects: fit1 is for the curve on the left (survtime > has been truncated to the cutoff line: year 5), fit2 is for the curve on the > right (minimum survival time is at the cutoff line: year 5), but if I do the > following: > > plot(fit1, col=1:2) > lines(fit2,col=1:2) > > Then I will have an horizontal line on the top that connect from 0 to 4 > years, which I do not want that to be drawn (see blue arrow below): > > https://drive.google.com/file/d/178mQGlhnaOg9PA-oE-W_W5CtrGD03ljH/view?usp=sharing > > Can anyone have a strategy to make this kind of plot happen? > > Thanks, > > John > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 2 KM curves on the same plot
Hello, Can anyone suggest a simple way to generate a Kaplan-Meier plot with 2 survfit objects, just like this one: https://drive.google.com/file/d/1fEcpdIdE2xYtA6LBQN9ck3JkL6-goabX/view?usp=sharing Suppose I have 2 survfit objects: fit1 is for the curve on the left (survtime has been truncated to the cutoff line: year 5), fit2 is for the curve on the right (minimum survival time is at the cutoff line: year 5), but if I do the following: plot(fit1, col=1:2) lines(fit2,col=1:2) Then I will have an horizontal line on the top that connect from 0 to 4 years, which I do not want that to be drawn (see blue arrow below): https://drive.google.com/file/d/178mQGlhnaOg9PA-oE-W_W5CtrGD03ljH/view?usp=sharing Can anyone have a strategy to make this kind of plot happen? Thanks, John __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [External] R rounding problem?
Thanks Richard. Got it now... On Thursday, September 3, 2020, 10:12:36 PM PDT, Richard M. Heiberger wrote: FAQ 7.31 On Fri, Sep 4, 2020 at 12:47 AM array chip via R-help wrote: > > Hello, > > I made a mistake today on simple counting in R, that almost got me into > trouble. After trying multiple times, I finally figured out it's rounding > issue in R. > > For exmaple, when I just simply type: > > > (6.9-6.3) > 0.6 > [1] TRUE > > 6.9-6.3 should be 0.6 exactly, but R thinks that it's greater than 0.6!! > > Similarly, R thinks 5.6-5.5 is smaller than 0.1: > > > (5.6-5.5) < 0.1 > [1] TRUE > > Why is the above happening? This rounding issue seems to be small, but this > could cause serious problem in real world. > > Can anyone shed a light on how to avoid the issue? > > Thanks, > > Yi > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R rounding problem?
Hello, I made a mistake today on simple counting in R, that almost got me into trouble. After trying multiple times, I finally figured out it's rounding issue in R. For exmaple, when I just simply type: > (6.9-6.3) > 0.6 [1] TRUE 6.9-6.3 should be 0.6 exactly, but R thinks that it's greater than 0.6!! Similarly, R thinks 5.6-5.5 is smaller than 0.1: > (5.6-5.5) < 0.1 [1] TRUE Why is the above happening? This rounding issue seems to be small, but this could cause serious problem in real world. Can anyone shed a light on how to avoid the issue? Thanks, Yi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to generate this kind of graph
Hello everyone, I saw this scatterplots from a paper and thought it looked very nice: https://drive.google.com/file/d/1V7F1gq-J_GIFDOrJs00hwGyXUqCZ_xwa/view?usp=sharing It was similar to stripchart() with 'jitter' method, but it has a special pattern of aligning points which made it look nicer than standard stripchart(). Does anyone know if there is a package in R that can do this kind of plots? Thanks, John __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to generate this type of scatter plots in R
Hello everyone, I saw this scatterplots from a paper and thought it looked very nice: https://drive.google.com/file/d/1V7F1gq-J_GIFDOrJs00hwGyXUqCZ_xwa/view?usp=sharing It was similar to stripchart() with 'jitter' method, but it has a special pattern of aligning points which made it look nicer than standard stripchart(). Does anyone know if there is a package in R that can do this kind of plots? Thanks, John __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R 4.0.2 is released
My current version of R is 3.6.3, there is no problem of installing packages there. I also have no problem installing bioconductor packages with 3.6.3. But installing packages from biocondcutor failed as well on 4.0.2 (also tried 4.0.0 and failed, too) On Tuesday, June 23, 2020, 08:56:45 PM PDT, array chip via R-help wrote: Hi, I downloaded R4.0.2 and installed it succesffully without any error. However, when I opened up a R session (using x64) and tried to install packages, I got the following error message: > utils:::menuInstallPkgs() Warning: failed to download mirrors file (internet routines cannot be loaded); using local file 'C:/PROGRA~1/R/R-40~1.2/doc/CRAN_mirrors.csv' Warning: unable to access index for repository https://rweb.crmda.ku.edu/cran/src/contrib: internet routines cannot be loaded Error in install.packages(lib = .libPaths()[1L], dependencies = NA, type = type) : argument "pkgs" is missing, with no default In addition: Warning message: In download.file(url, destfile = f, quiet = TRUE) : unable to load shared object 'C:/PROGRA~1/R/R-40~1.2/modules/x64/internet.dll': LoadLibrary failure: Access is denied. > sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.2 tools_4.0.2 However, if I load up a session using i386 verson, then everything is ok - I can install packages. Can you and anyone suggest what is going on? Thanks, John On Monday, June 22, 2020, 01:24:20 AM PDT, Peter Dalgaard via R-help wrote: The build system rolled up R-4.0.2.tar.gz (codename "Taking Off Again") this morning. The list below details the changes in this release. You can get the source code from http://cran.r-project.org/src/base/R-4/R-4.0.2.tar.gz or wait for it to be mirrored at a CRAN site nearer to you. Binaries for various platforms will appear in due course. For the R Core Team, Peter Dalgaard These are the checksums (md5 and SHA-256) for the freshly created files, in case you wish to check that they are uncorrupted: MD5 (AUTHORS) = b9c44f9f78cab3184ad9898bebc854b4 MD5 (COPYING) = eb723b61539feef013de476e68b5c50a MD5 (COPYING.LIB) = a6f89e2100d9b6cdffcea4f398e37343 MD5 (FAQ) = 4afa171cd982aaa60f0ba92e2e7bc5d6 MD5 (INSTALL) = 7893f754308ca31f1ccf62055090ad7b MD5 (NEWS) = 566a6bb3642e28e6bf01cf98db31137c MD5 (NEWS.0) = bfcd7c147251b5474d96848c6f57e5a8 MD5 (NEWS.1) = eb78c4d053ec9c32b815cf0c2ebea801 MD5 (NEWS.2) = 496062c138e2def06cebccddfb814ac6 MD5 (NEWS.3) = 012e7f4a80cc8ec947bf3f0ff6117ec8 MD5 (R-latest.tar.gz) = 1eac7293d5fe313a56ddabfda02b437e MD5 (README) = f468f281c919665e276a1b691decbbe6 MD5 (RESOURCES) = 529223fd3ffef95731d0a87353108435 MD5 (THANKS) = 251d20510bfc3cc93b82c5a99f7efcc6 MD5 (VERSION-INFO.dcf) = 62496d3a0fd8cc2ed644ea518c052371 MD5 (R-4/R-4.0.2.tar.gz) = 1eac7293d5fe313a56ddabfda02b437e 2cde824a7b18958e5f06b391c801c8288be0f84fa8934b7ddefef23c67e60c09 AUTHORS e6d6a009505e345fe949e1310334fcb0747f28dae2856759de102ab66b722cb4 COPYING 6095e9ffa777dd22839f7801aa845b31c9ed07f3d6bf8a26dc5d2dec8ccc0ef3 COPYING.LIB eddf87b12197c7b3b19cbc9b11c1beab95b14e3dcd715bf37d2f6a8b2a72c2a1 FAQ f87461be6cbaecc4dce44ac58e5bd52364b0491ccdadaf846cb9b452e9550f31 INSTALL ec05bba338358410fae6b34fed061605989ab3601aba1b3fcb45a610d5dd2eb9 NEWS 4e21b62f515b749f80997063fceab626d7258c7d650e81a662ba8e0640f12f62 NEWS.0 12b30c724117b1b2b11484673906a6dcd48a361f69fc420b36194f9218692d01 NEWS.1 e80de410c77f05ff2012fa70051b89119845f734a7fa5c55857e61e4ed7d5f6e NEWS.2 7201d139947afa52b5e09d26dc01445edf444506264355b2185122bc1ed3dce0 NEWS.3 d3bceab364da0876625e4097808b42512395fdf41292f4915ab1fd257c1bbe75 R-latest.tar.gz 2fdd3e90f23f32692d4b3a0c0452f2c219a10882033d1774f8cadf25886c3ddc README 408737572ecc6e1135fdb2cf7a9dbb1a6cb27967c757f1771b8c39d1fd2f1ab9 RESOURCES c9c7cb32308b4e560a22c858819ade9de524a602abd4e92d1c328c89f8037d73 THANKS 10cc5f566a4a5ce49147e7dcfbe9180dba09ccb9efb17298b067309eb799e92e VERSION-INFO.dcf d3bceab364da0876625e4097808b42512395fdf41292f4915ab1fd257c1bbe75 R-4/R-4.0.2.tar.gz This is the relevant part of the NEWS file CHANGES IN R 4.0.2: UTILITIES: * R CMD check skips vignette re-building (with a warning) if the VignetteBuilder package(s) are not available. BUG FIXES: * Paths with non-ASCII characters caused problems for package loading on Windows PR#17833. * Using tcltk widgets no longer crashes R on Windows. * source(*, echo=TRUE) no longer fails in some cases with empty lines; rep
Re: [R] R 4.0.2 is released
Hi, I downloaded R4.0.2 and installed it succesffully without any error. However, when I opened up a R session (using x64) and tried to install packages, I got the following error message: > utils:::menuInstallPkgs() Warning: failed to download mirrors file (internet routines cannot be loaded); using local file 'C:/PROGRA~1/R/R-40~1.2/doc/CRAN_mirrors.csv' Warning: unable to access index for repository https://rweb.crmda.ku.edu/cran/src/contrib: internet routines cannot be loaded Error in install.packages(lib = .libPaths()[1L], dependencies = NA, type = type) : argument "pkgs" is missing, with no default In addition: Warning message: In download.file(url, destfile = f, quiet = TRUE) : unable to load shared object 'C:/PROGRA~1/R/R-40~1.2/modules/x64/internet.dll': LoadLibrary failure: Access is denied. > sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.0.2 tools_4.0.2 However, if I load up a session using i386 verson, then everything is ok - I can install packages. Can you and anyone suggest what is going on? Thanks, John On Monday, June 22, 2020, 01:24:20 AM PDT, Peter Dalgaard via R-help wrote: The build system rolled up R-4.0.2.tar.gz (codename "Taking Off Again") this morning. The list below details the changes in this release. You can get the source code from http://cran.r-project.org/src/base/R-4/R-4.0.2.tar.gz or wait for it to be mirrored at a CRAN site nearer to you. Binaries for various platforms will appear in due course. For the R Core Team, Peter Dalgaard These are the checksums (md5 and SHA-256) for the freshly created files, in case you wish to check that they are uncorrupted: MD5 (AUTHORS) = b9c44f9f78cab3184ad9898bebc854b4 MD5 (COPYING) = eb723b61539feef013de476e68b5c50a MD5 (COPYING.LIB) = a6f89e2100d9b6cdffcea4f398e37343 MD5 (FAQ) = 4afa171cd982aaa60f0ba92e2e7bc5d6 MD5 (INSTALL) = 7893f754308ca31f1ccf62055090ad7b MD5 (NEWS) = 566a6bb3642e28e6bf01cf98db31137c MD5 (NEWS.0) = bfcd7c147251b5474d96848c6f57e5a8 MD5 (NEWS.1) = eb78c4d053ec9c32b815cf0c2ebea801 MD5 (NEWS.2) = 496062c138e2def06cebccddfb814ac6 MD5 (NEWS.3) = 012e7f4a80cc8ec947bf3f0ff6117ec8 MD5 (R-latest.tar.gz) = 1eac7293d5fe313a56ddabfda02b437e MD5 (README) = f468f281c919665e276a1b691decbbe6 MD5 (RESOURCES) = 529223fd3ffef95731d0a87353108435 MD5 (THANKS) = 251d20510bfc3cc93b82c5a99f7efcc6 MD5 (VERSION-INFO.dcf) = 62496d3a0fd8cc2ed644ea518c052371 MD5 (R-4/R-4.0.2.tar.gz) = 1eac7293d5fe313a56ddabfda02b437e 2cde824a7b18958e5f06b391c801c8288be0f84fa8934b7ddefef23c67e60c09 AUTHORS e6d6a009505e345fe949e1310334fcb0747f28dae2856759de102ab66b722cb4 COPYING 6095e9ffa777dd22839f7801aa845b31c9ed07f3d6bf8a26dc5d2dec8ccc0ef3 COPYING.LIB eddf87b12197c7b3b19cbc9b11c1beab95b14e3dcd715bf37d2f6a8b2a72c2a1 FAQ f87461be6cbaecc4dce44ac58e5bd52364b0491ccdadaf846cb9b452e9550f31 INSTALL ec05bba338358410fae6b34fed061605989ab3601aba1b3fcb45a610d5dd2eb9 NEWS 4e21b62f515b749f80997063fceab626d7258c7d650e81a662ba8e0640f12f62 NEWS.0 12b30c724117b1b2b11484673906a6dcd48a361f69fc420b36194f9218692d01 NEWS.1 e80de410c77f05ff2012fa70051b89119845f734a7fa5c55857e61e4ed7d5f6e NEWS.2 7201d139947afa52b5e09d26dc01445edf444506264355b2185122bc1ed3dce0 NEWS.3 d3bceab364da0876625e4097808b42512395fdf41292f4915ab1fd257c1bbe75 R-latest.tar.gz 2fdd3e90f23f32692d4b3a0c0452f2c219a10882033d1774f8cadf25886c3ddc README 408737572ecc6e1135fdb2cf7a9dbb1a6cb27967c757f1771b8c39d1fd2f1ab9 RESOURCES c9c7cb32308b4e560a22c858819ade9de524a602abd4e92d1c328c89f8037d73 THANKS 10cc5f566a4a5ce49147e7dcfbe9180dba09ccb9efb17298b067309eb799e92e VERSION-INFO.dcf d3bceab364da0876625e4097808b42512395fdf41292f4915ab1fd257c1bbe75 R-4/R-4.0.2.tar.gz This is the relevant part of the NEWS file CHANGES IN R 4.0.2: UTILITIES: * R CMD check skips vignette re-building (with a warning) if the VignetteBuilder package(s) are not available. BUG FIXES: * Paths with non-ASCII characters caused problems for package loading on Windows PR#17833. * Using tcltk widgets no longer crashes R on Windows. * source(*, echo=TRUE) no longer fails in some cases with empty lines; reported by Bill Dunlap in PR#17769. * on.exit() now correctly matches named arguments, thanks to PR#17815 (including patch) by Brodie Gaslam. * regexpr(*, perl=TRUE) no longer returns incorrect positions into text containing characters outside of the Unicode Basic Multilingual Plane on Windows. -- Peter Dalgaard, Professor, Center for
Re: [R] weighed Fleming-Harrington log rank test
Thank you Terry. Right now I can use comp() from survMisc package to do the 2-parameter version of F-H weighting. I think both SAS and stata offer the 2-parameter version, so just thought it would be nice if survdiff() can have that option given it's standard package in R. Thanks! John On Friday, February 16, 2018, 7:08:46 AM PST, Therneau, Terry M., Ph.D.wrote: You are correct that the survreg routine only supports 'rho' of the Fleming-Harrington G-rho tests. This is a function of age -- I wrote the original code back when I was working with Tom (Fleming), and he was only using 1 parameter. Later he and David expanded the test to two parameters. This might be only the second request for the feature in the 30+ years since that date. Terry Therneau [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fleming-Harrington weighted log rank test
Thank you David! On Wednesday, February 14, 2018, 6:05:46 PM PST, David Winsemius <dwinsem...@comcast.net> wrote: > On Feb 14, 2018, at 5:26 PM, David Winsemius <dwinsem...@comcast.net> wrote: > >> >> On Feb 13, 2018, at 4:02 PM, array chip via R-help <r-help@r-project.org> >> wrote: >> >> Hi all, >> >> The survdiff() from survival package has an argument "rho" that implements >> Fleming-Harrington weighted long rank test. >> >> But according to several sources including "survminer" package >> (https://cran.r-project.org/web/packages/survminer/vignettes/Specifiying_weights_in_log-rank_comparisons.html), >> Fleming-Harrington weighted log-rank test should have 2 parameters "p" and >> "q" to control the weighting for earlier vs later times in the follow-up. >> >> For example, setting rho=1 in survdiff() uses the Peto-Peto modification of >> Gehan-Wilcox weights, which I can confirm by setting p=1 & 1=0 in comp() >> from survminer package. similarly rho=0 is equivalent to p=0 & q=0 >> >> I am interested in putting more weights on survival difference in later >> follow-up time. According to comp() from survminer package, that would set >> p=0 & q=1 for Fleming-Harrington weights. >> >> My question is how I can do the same by setting certain values for "rho" in >> the regular survival() function? > > I think that survdiff uses a different version than what you have found. The > G-rho family weights are: > > w_j = [Sˆ(tj)]^ρ > > So rather than two parameters on S(t) and (1-S(t)) as in the p,q version, you > only have one parameter applied to S(t). This class handout says that the > G-rho,gamma weighting scheme is not available in survdiff. > Forgot to paste the link: http://www.ics.uci.edu/~dgillen/STAT255/Handouts/lecture4.pdf > -- > David Winsemius > Alameda, CA, USA > > 'Any technology distinguishable from magic is insufficiently advanced.' > -Gehm's Corollary to Clarke's Third Law David Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fleming-Harrington weighted log rank test
Hi all, The survdiff() from survival package has an argument "rho" that implements Fleming-Harrington weighted long rank test. But according to several sources including "survminer" package (https://cran.r-project.org/web/packages/survminer/vignettes/Specifiying_weights_in_log-rank_comparisons.html), Fleming-Harrington weighted log-rank test should have 2 parameters "p" and "q" to control the weighting for earlier vs later times in the follow-up. For example, setting rho=1 in survdiff() uses the Peto-Peto modification of Gehan-Wilcox weights, which I can confirm by setting p=1 & 1=0 in comp() from survminer package. similarly rho=0 is equivalent to p=0 & q=0 I am interested in putting more weights on survival difference in later follow-up time. According to comp() from survminer package, that would set p=0 & q=1 for Fleming-Harrington weights. My question is how I can do the same by setting certain values for "rho" in the regular survival() function? Thank you, John __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SAMseq errors
Sorry forgot to use plain text format, hope this time it works: Hi, I am trying to using SAMseq() to analyze my RNA-seq experiment (2 genes x 550 samples) with survival endpoint. It quickly give the following error: > library(samr) Loading required package: impute Loading required package: matrixStats Attaching package: ‘matrixStats’ The following objects are masked from ‘package:Biobase’: anyMissing, rowMedians Warning messages: 1: package ‘samr’ was built under R version 3.3.3 2: package ‘matrixStats’ was built under R version 3.3.3 > samfit<-SAMseq(data, PFI.time,censoring.status=PFI.status, > resp.type="Survival") Estimating sequencing depths... Error in quantile.default(prop, c(0.25, 0.75)) : missing values and NaN's not allowed if 'na.rm' is FALSE In addition: Warning message: In sum(x) : integer overflow - use sum(as.numeric(.)) Error during wrapup: cannot open the connection > sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] samr_2.0 matrixStats_0.52.2 impute_1.48.0 BiocInstaller_1.24.0 rcom_3.1-3 rscproxy_2.1-1 loaded via a namespace (and not attached): [1] tools_3.3.2 I checked, my data matrix and y variables have no missing values. Anyone has suggestions what's going on? Thank you! John __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SAMseq errors
Hi, I am trying to using SAMseq() to analyze my RNA-seq experiment (2 genes x 550 samples) with survival endpoint. It quickly give the following error: > library(samr)Loading required package: imputeLoading required package: > matrixStats Attaching package: ‘matrixStats’ The following objects are masked from ‘package:Biobase’: anyMissing, rowMedians Warning messages:1: package ‘samr’ was built under R version 3.3.3 2: package ‘matrixStats’ was built under R version 3.3.3 > samfit<-SAMseq(data, PFI.time,censoring.status=PFI.status, > resp.type="Survival") Estimating sequencing depths...Error in quantile.default(prop, c(0.25, 0.75)) : missing values and NaN's not allowed if 'na.rm' is FALSEIn addition: Warning message:In sum(x) : integer overflow - use sum(as.numeric(.))Error during wrapup: cannot open the connection > sessionInfo()R version 3.3.2 (2016-10-31)Platform: x86_64-w64-mingw32/x64 > (64-bit)Running under: Windows 7 x64 (build 7601) Service Pack 1 locale:[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252[4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages:[1] stats graphics grDevices datasets utils methods base other attached packages:[1] samr_2.0 matrixStats_0.52.2 impute_1.48.0 BiocInstaller_1.24.0 rcom_3.1-3 rscproxy_2.1-1 loaded via a namespace (and not attached):[1] tools_3.3.2 I checked, my data matrix and y variables have no missing values. Anyone has suggestions what's going on? Thank you! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gsDesign Pocock & OBF boundary
Sorry it didn 't work again. I am on yahoo mail, and just found a switch to change from Rick text to Plain text, so here it goes again: I am learning to use the gsDesign package. I have a question about Pocock and OBF boundary. As far as I can understand, these 2 boundaries require equal spacing between interim analyses (maybe this is not correct?). But looks like I can still use gsDesign to run an analysis based on unequal spacing: > library(gsDesign) > gsDesign(k=2,test.type=2,timing=c(0.75,1),alpha=0.05,sfu='Pocock') Symmetric two-sided group sequential design with 90 % power and 5 % Type I Error. Spending computations assume trial stops if a bound is crossed. Sample Size Analysis Ratio* Z Nominal p Spend 1 0.796 1.82 0.0346 0.0346 2 1.061 1.82 0.0346 0.0154 Total 0.0500 ++ alpha spending: Pocock boundary. * Sample size ratio compared to fixed design with no interim Can anyone share some light whether the above analysis is still valid? Or for unequal spacing, I have to use Lan-Demet’s error spending function approximations? Thank you, From: Jeff Newmiller <jdnew...@dcn.davis.ca.us> oject.org>; Berend Hasselman <b...@xs4all.nl> Cc: R-help Mailing List <r-help@r-project.org> Sent: Sunday, September 24, 2017 12:41 AM Subject: Re: [R] gsDesign Pocock & OBF boundary Still failed. The first secret is in your email program settings, to use Plain Text format (at least for emails you send to this mailing list). The second secret tool to use is the reprex package to let you verify that your code example will do on our computers what it is doing on your computer before you send it to us. That will also involve giving us some sample data or referencing some data already available to us in a relevant package. See [1], [2] and [3] for more discussion of how to succeed at communicating on the Internet regarding R. [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example [2] http://adv-r.had.co.nz/Reproducibility.html [3] https://cran.r-project.org/web/packages/reprex/index.html (read the vignette) -- Sent from my phone. Please excuse my brevity. On September 23, 2017 9:53:05 PM PDT, array chip via R-help <r-help@r-project.org> wrote: >Sorry for messed up text. Here it goes again: >I am learning to use the gsDesign package. >I have a question about Pocock and OBF boundary. As far as I can >understand, these 2 boundaries require equal spacing between interim >analyses (maybe this is not correct?). But looks like I can still use >gsDesign to run an analysis based on unequal spacing: >> gsDesign(k=2,test.type=2,timing=c(0.75,1),alpha=0.05,sfu='Pocock') >Symmetric two-sided group sequential design with 90 %power and 5 % Type >I Error.Spending computations assume trial stops if a bound is >crossed. Sample Size Analysis Ratio* Z Nominal p >Spend1 0.796 1.820.0346 0.03462 1.061 1.82 >0.0346 0.0154Total 0.0500 ++alpha spending: >Pocock boundary.*Sample size ratio compared to fixed design with no >interim >Can anyone share some light whether the above analysis is still valid? >Or for unequal spacing, I have to use Lan-Demet’s error spending >function approximations? Thank you, > > > > From: Berend Hasselman <b...@xs4all.nl> >Cc: R-help Mailing List <r-help@r-project.org> > Sent: Friday, September 22, 2017 11:46 PM > Subject: Re: [R] gsDesign Pocock & OBF boundary > > >> On 23 Sep 2017, at 01:32, array chip via R-help ><r-help@r-project.org> wrote: >> >> Hi, >> >> I am learning to use your gsDesign package! I have a question about >Pocock and OBF boundary. As far as Iunderstand, these 2 boundaries >require equal spacing between interim analyses(maybe this is not >correct?). But I can still use gsDesign to run an analysisbased on >unequal spacing: >gsDesign(k=2,test.type=2,timing=c(0.75,1),alpha=0.05,sfu='Pocock')Symmetrictwo-sided >group sequential design with90 %power and 5 % Type I >Error.Spendingcomputations assume trial stops if a bound is crossed. > Sample Size AnalysisRatio* Z Nominal p Spend >1 0.796 1.820.0346 0.03462 1.061 1.820.0346 0.0154 >Total 0.0500 ++alpha >spending:Pocockboundary.*Sample size ratio compared to fixed design >with no interim Can anyone share some light whether the above analysis >is stillvalid? Or for unequal spacing, I have to use Lan-Demet’s error >spendingfunction approximations? Thank you, >> [[alternative HTML version deleted]] >> > >Your example code is a complete mess. >Do NOT post in html. This is a pl
Re: [R] gsDesign Pocock & OBF boundary
Sorry for messed up text. Here it goes again: I am learning to use the gsDesign package. I have a question about Pocock and OBF boundary. As far as I can understand, these 2 boundaries require equal spacing between interim analyses (maybe this is not correct?). But looks like I can still use gsDesign to run an analysis based on unequal spacing: > gsDesign(k=2,test.type=2,timing=c(0.75,1),alpha=0.05,sfu='Pocock') Symmetric two-sided group sequential design with 90 %power and 5 % Type I Error.Spending computations assume trial stops if a bound is crossed. Sample Size Analysis Ratio* Z Nominal p Spend 1 0.796 1.82 0.0346 0.0346 2 1.061 1.82 0.0346 0.0154 Total 0.0500 ++alpha spending: Pocock boundary.*Sample size ratio compared to fixed design with no interim Can anyone share some light whether the above analysis is still valid? Or for unequal spacing, I have to use Lan-Demet’s error spending function approximations? Thank you, From: Berend Hasselman <b...@xs4all.nl> To: array chip <arrayprof...@yahoo.com> Cc: R-help Mailing List <r-help@r-project.org> Sent: Friday, September 22, 2017 11:46 PM Subject: Re: [R] gsDesign Pocock & OBF boundary > On 23 Sep 2017, at 01:32, array chip via R-help <r-help@r-project.org> wrote: > > Hi, > > I am learning to use your gsDesign package! I have a question about Pocock > and OBF boundary. As far as Iunderstand, these 2 boundaries require equal > spacing between interim analyses(maybe this is not correct?). But I can still > use gsDesign to run an analysisbased on unequal spacing: > gsDesign(k=2,test.type=2,timing=c(0.75,1),alpha=0.05,sfu='Pocock')Symmetrictwo-sided > group sequential design with90 %power and 5 % Type I > Error.Spendingcomputations assume trial stops if a bound is crossed. > Sample Size AnalysisRatio* Z Nominal p Spend 1 0.796 > 1.82 0.0346 0.0346 2 1.061 1.82 0.0346 0.0154 Total > 0.0500 ++alpha spending:Pocockboundary.*Sample size ratio > compared to fixed design with no interim Can anyone share some light whether > the above analysis is stillvalid? Or for unequal spacing, I have to use > Lan-Demet’s error spendingfunction approximations? Thank you, > [[alternative HTML version deleted]] > Your example code is a complete mess. Do NOT post in html. This is a plain text mailing list. Read the Posting Guide (https://www.r-project.org/posting-guide.html). Berend Hasselman] > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gsDesign Pocock & OBF boundary
Hi, I am learning to use your gsDesign package! I have a question about Pocock and OBF boundary. As far as Iunderstand, these 2 boundaries require equal spacing between interim analyses(maybe this is not correct?). But I can still use gsDesign to run an analysisbased on unequal spacing: gsDesign(k=2,test.type=2,timing=c(0.75,1),alpha=0.05,sfu='Pocock')Symmetrictwo-sided group sequential design with90 %power and 5 % Type I Error.Spendingcomputations assume trial stops if a bound is crossed. Sample Size AnalysisRatio* Z Nominal p Spend1 0.796 1.82 0.0346 0.03462 1.061 1.82 0.0346 0.0154Total 0.0500 ++alpha spending:Pocockboundary.*Sample size ratio compared to fixed design with no interim Can anyone share some light whether the above analysis is stillvalid? Or for unequal spacing, I have to use Lan-Demet’s error spendingfunction approximations? Thank you, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] deviance in GLM vs. summary.glm
Hi, I am running a logistic regression on a simple dataset (attached) using glm: > dat<-read.table("dat.txt",sep='\t',header=T) If I use summary() on a logistic model: > summary(glm(y~x1*x2,dat,family='binomial')) Coefficients: Estimate Std. Error z value Pr(>|z|)(Intercept) 19.57 5377.01 0.004 0.997x1 -18.59 5377.01 -0.003 0.997x2B -19.57 5377.01 -0.004 0.997x1:x2B 38.15 7604.24 0.005 0.996 As you can see, the interaction term is very insignificant (p = 0.996)! But if I use a anova() to compare a full vs reduced model to evaluate the interaction term: > anova(glm(y~x1+x2,dat,family='binomial'), > glm(y~x1*x2,dat,family='binomial'))Analysis of Deviance Table Model 1: y ~ x1 + x2Model 2: y ~ x1 * x2 Resid. Df Resid. Dev Df Deviance1 22 27.067 2 21 21.209 1 5.8579 This follows a chi-square distribution with 1 df, so the corresponding p value is: > 1-pchisq(5.8679,1)[1] 0.01541944 So I get very different p value on the interaction term, can someone share what's going wrong here? Thanks! Yi "y" "x1""x2" "R" 1 "B" "NR"1 "A" "R" 0 "B" "R" 1 "B" "R" 1 "A" "R" 1 "A" "R" 0 "A" "R" 1 "A" "R" 0 "B" "R" 0 "B" "NR"0 "B" "NR"0 "B" "R" 1 "A" "R" 1 "A" "NR"1 "A" "R" 0 "A" "R" 0 "A" "NR"0 "B" "R" 0 "A" "R" 1 "A" "NR"1 "A" "R" 1 "B" "R" 1 "A" "R" 1 "B" "R" 1 "A" __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lattice xyplot
Thanks all for the clarification! From: Jeff Newmiller <jdnew...@dcn.davis.ca.us> To: r-help@r-project.org; Bert Gunter <bgunter.4...@gmail.com>; array chip <arrayprof...@yahoo.com> Cc: "r-help@r-project.org" <r-help@r-project.org> Sent: Monday, May 1, 2017 10:53 AM Subject: Re: [R] Lattice xyplot It is not a question of whether lattice "understands" the unsorted data... imagine trying to plot 4 points to form a square instead of a trend line... you would NOT want lattice to sort those points for you. That lattice leaves your data alone gives you more flexibility, even while it adds work for certain applications. -- Sent from my phone. Please excuse my brevity. On May 1, 2017 7:34:09 AM PDT, Bert Gunter <bgunter.4...@gmail.com> wrote: >Yes. type = "l" connects the points in the order given in the data, so >if the x's are not already ordered, the plots will be different after >ordering the x's. > >e.g. > >> x <- c(3,1,2,4,6,5) >> y <- 11:16 >> xyplot(y~x. type = "l") > > >As for why ... that's just the way it was designed. You can always >order the data first, if you don't want this default. > >Cheers, >Bert > >Bert Gunter > >"The trouble with having an open mind is that people keep coming along >and sticking things into it." >-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > >On Sun, Apr 30, 2017 at 6:07 PM, array chip via R-help ><r-help@r-project.org> wrote: >> Dear all, I am new to lattice, so would appreciate anyone's help on >the questions below. I am using xyplot to plot some trend in my >dataset. Using the example dataset attached, I am trying to plot >variable "y" over variable "time" for each subject "id": >> dat<-read.table("dat.txt",sep='\t',header=T,row.names=NULL) >> xyplot(y ~ time, data=dat, groups=id, aspect = "fill", type = c("p", >"l"), xlab = "Time", ylab = "Y") >> >> It appears that it just worked fine. But if I sort the "dat" first, >the plot will look somewhat different! >> dat<-dat[order(dat$id, dat$time),]xyplot(y ~ time, data=dat, >groups=id, aspect = "fill", type = c("p", "l"), xlab = "Time", ylab = >"Y") >> Why is that? Do you need to sort the data first before using xyplot? >Why xyplot can not understand the dataset unless it is sorted first? >> Thanks, >> John >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Lattice xyplot
Dear all, I am new to lattice, so would appreciate anyone's help on the questions below. I am using xyplot to plot some trend in my dataset. Using the example dataset attached, I am trying to plot variable "y" over variable "time" for each subject "id": dat<-read.table("dat.txt",sep='\t',header=T,row.names=NULL) xyplot(y ~ time, data=dat, groups=id, aspect = "fill", type = c("p", "l"), xlab = "Time", ylab = "Y") It appears that it just worked fine. But if I sort the "dat" first, the plot will look somewhat different! dat<-dat[order(dat$id, dat$time),]xyplot(y ~ time, data=dat, groups=id, aspect = "fill", type = c("p", "l"), xlab = "Time", ylab = "Y") Why is that? Do you need to sort the data first before using xyplot? Why xyplot can not understand the dataset unless it is sorted first? Thanks, Johnid timey 107 0 6.5 107 1 5.1 107 2 4.2 112 0 5.6 107 3 6.9 112 1 4 112 2 1 119 0 7.6 112 3 4.9 119 1 5.7 120 0 7.1 203 0 7.4 120 2 2.1 203 1 6.3 123 0 6.8 203 2 3.8 119 3 6.1 123 1 3.9 123 2 3 120 3 6 203 3 7.6 207 0 5.8 207 1 3.1 123 3 5.7 209 3 3.6 208 0 4.4 130 0 5.5 131 0 6.9 133 0 5.7 134 0 5.1 209 0 4.9 128 2 2.9 128 1 4.5 130 2 5.9 131 1 6.9 133 2 2.6 133 1 5.7 403 2 3.1 403 0 4.5 128 0 4.5 134 2 2.3 207 3 4.8 130 3 4.9 207 2 2.3 130 1 3.8 131 2 3.9 133 3 3.1 134 1 2.8 209 1 4.2 208 3 2.3 208 1 5.3 208 2 0.8 128 3 4.4 131 3 6.2 209 2 5.7 134 3 4.1 403 3 2.5 119 2 2.3 120 1 3.8 403 1 3.2 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] standard error of survfit.coxph()
Dear Terry, I was trying to use your explanation of the standard error estimate from survfit.coxph() to verify the standard error estimates for the method of log(log(S)), but couldn't get the estimates correct. Here is an example using the lung dataset: fit-coxph(Surv(time,status)~wt.loss,lung) surv-survfit(fit,newdata=lung[2:5,])$surv logstd-survfit(fit,newdata=lung[2:5,])$std.err logstd.time-survfit(fit,newdata=lung[2:5,])$time ## std error of cumulative hazard at time=60 logstd-logstd[logstd.time==60,] logstd [1] 0.01965858 0.01965858 0.01941871 0.01969248 ## survival S at months 60 surv-surv[logstd.time==60,] surv 2 3 4 5 0.9249238 0.9249238 0.9253038 0.9263392 Since survfit()$std.err was for cumulative hazard H=log(S), thus based on your explanation below, the standard error for log-log method is: (1/S^2)var(H) loglogstd-sqrt(1/surv^2*(logstd^2)) loglogstd 2 3 4 5 0.02125427 0.02125427 0.02098631 0.02125839 ## the upper confidence interval should be exp(-exp(log(-log(surv))-qnorm(0.975)*loglogstd)) 2 3 4 5 0.9278737 0.9278737 0.9282031 0.9292363 But this is different from the output using summary.survfit: summary(survfit(fit,newdata=lung[2:5,],conf.type='log-log'),times=60)$upper[1,] 2 3 4 5 0.9534814 0.9534814 0.9535646 0.9548478 Can you please suggest what I did wrong in the calculation? Thanks very much! John To: Therneau, Terry M., Ph.D. thern...@mayo.edu; r-help@r-project.org r-help@r-project.org Sent: Monday, June 30, 2014 10:46 AM Subject: Re: standard error of survfit.coxph() [[elided Yahoo spam]] John From: Therneau, Terry M., Ph.D. thern...@mayo.edu Sent: Monday, June 30, 2014 6:04 AM Subject: Re: standard error of survfit.coxph() 1. The computations behind the scenes produce the variance of the cumulative hazard. This is true for both an ordinary Kaplan-Meier and a Cox model. Transformations to other scales are done using simple Taylor series. H = cumulative hazard = log(S); S=survival var(H) = var(log(S)) = the starting point S = exp(log(S)), so var(S) is approx [deriv of exp(x)]^2 * var(log(S)) = S^2 var(H) var(log(log(S)) is approx (1/S^2) var(H) 2. At the time it was written, summary.survfit was used only for printing out the survival curve at selected times, and the audience for the printout wanted std(S). True, that was 20 years ago, but I don't recall anyone ever asking for summary to do anything else. Your request is not a bad idea. Note however that the primary impact of using log(S) or S or log(log(S)) scale is is on the confidence intervals, and they do appear per request in the summary output. Terry T. On 06/28/2014 05:00 AM, r-help-requ...@r-project.org wrote: Message: 9 Date: Fri, 27 Jun 2014 12:39:29 -0700 To:r-help@r-project.org r-help@r-project.org Subject: [R] standard error of survfit.coxph() Message-ID: 1403897969.91269.yahoomail...@web122906.mail.ne1.yahoo.com Content-Type: text/plain Hi, can anyone help me to understand the standard errors printed in the output of survfit.coxph()? time-sample(1:15,100,replace=T) status-as.numeric(runif(100,0,1)0.2) x-rnorm(100,10,2) fit-coxph(Surv(time,status)~x) ??? ### method 1 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log')$std.err [,1]??? [,2]??? [,3]??? [,4]?? [,5] ?[1,] 0.0 0.0 0.0 0.0 0. ?[2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 ?[3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 ?[4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 ?[5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 ?[6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 ?[7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 ?[8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 ?[9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394 [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413 [13,] 0.070665160 0.071363749 0.069208056 0.066655730 0.14976433 [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860 [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log')$time ?[1]? 1? 2? 3? 4? 5? 6? 7? 8? 9 10 11 12 13 14 15 ??? ### method 2 summary(survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log'),time=10)$std.err ? 1? 2? 3? 4? 5 [1,] 0.04061384 0.04106186 0.03963184 0.03715246 0.06867532
Re: [R] standard error of survfit.coxph()
Terry, I figured out that variance of log(-log(S)) should be (1/H^2)var(H), not (1/S^2)var(H)! Thanks John e...@mayo.edu; r-help@r-project.org r-help@r-project.org Sent: Monday, July 21, 2014 11:41 AM Subject: Re: standard error of survfit.coxph() Dear Terry, I was trying to use your explanation of the standard error estimate from survfit.coxph() to verify the standard error estimates for the method of log(log(S)), but couldn't get the estimates correct. Here is an example using the lung dataset: fit-coxph(Surv(time,status)~wt.loss,lung) surv-survfit(fit,newdata=lung[2:5,])$surv logstd-survfit(fit,newdata=lung[2:5,])$std.err logstd.time-survfit(fit,newdata=lung[2:5,])$time ## std error of cumulative hazard at time=60 logstd-logstd[logstd.time==60,] logstd [1] 0.01965858 0.01965858 0.01941871 0.01969248 ## survival S at months 60 surv-surv[logstd.time==60,] surv 2 3 4 5 0.9249238 0.9249238 0.9253038 0.9263392 Since survfit()$std.err was for cumulative hazard H=log(S), thus based on your explanation below, the standard error for log-log method is: (1/S^2)var(H) loglogstd-sqrt(1/surv^2*(logstd^2)) loglogstd 2 3 4 5 0.02125427 0.02125427 0.02098631 0.02125839 ## the upper confidence interval should be exp(-exp(log(-log(surv))-qnorm(0.975)*loglogstd)) 2 3 4 5 0.9278737 0.9278737 0.9282031 0.9292363 But this is different from the output using summary.survfit: summary(survfit(fit,newdata=lung[2:5,],conf.type='log-log'),times=60)$upper[1,] 2 3 4 5 0.9534814 0.9534814 0.9535646 0.9548478 Can you please suggest what I did wrong in the calculation? Thanks very much! John To: Therneau, Terry M., Ph.D. thern...@mayo.edu; r-help@r-project.org r-help@r-project.org Sent: Monday, June 30, 2014 10:46 AM Subject: Re: standard error of survfit.coxph() [[elided Yahoo spam]] John From: Therneau, Terry M., Ph.D. thern...@mayo.edu Sent: Monday, June 30, 2014 6:04 AM Subject: Re: standard error of survfit.coxph() 1. The computations behind the scenes produce the variance of the cumulative hazard. This is true for both an ordinary Kaplan-Meier and a Cox model. Transformations to other scales are done using simple Taylor series. H = cumulative hazard = log(S); S=survival var(H) = var(log(S)) = the starting point S = exp(log(S)), so var(S) is approx [deriv of exp(x)]^2 * var(log(S)) = S^2 var(H) var(log(log(S)) is approx (1/S^2) var(H) 2. At the time it was written, summary.survfit was used only for printing out the survival curve at selected times, and the audience for the printout wanted std(S). True, that was 20 years ago, but I don't recall anyone ever asking for summary to do anything else. Your request is not a bad idea. Note however that the primary impact of using log(S) or S or log(log(S)) scale is is on the confidence intervals, and they do appear per request in the summary output. Terry T. On 06/28/2014 05:00 AM, r-help-requ...@r-project.org wrote: Message: 9 Date: Fri, 27 Jun 2014 12:39:29 -0700 To:r-help@r-project.org r-help@r-project.org Subject: [R] standard error of survfit.coxph() Message-ID: 1403897969.91269.yahoomail...@web122906.mail.ne1.yahoo.com Content-Type: text/plain Hi, can anyone help me to understand the standard errors printed in the output of survfit.coxph()? time-sample(1:15,100,replace=T) status-as.numeric(runif(100,0,1)0.2) x-rnorm(100,10,2) fit-coxph(Surv(time,status)~x) ??? ### method 1 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log')$std.err [,1]??? [,2]??? [,3]??? [,4]?? [,5] ?[1,] 0.0 0.0 0.0 0.0 0. ?[2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 ?[3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 ?[4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 ?[5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 ?[6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 ?[7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 ?[8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 ?[9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394 [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413 [13,] 0.070665160 0.071363749 0.069208056 0.066655730 0.14976433 [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860 [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log')$time ?[1]? 1?
Re: [R] standard error of survfit.coxph()
Dear Terry/All, I was trying to use your explanation of the standard error estimate from survfit.coxph() to verify the standard error estimates for the method of log(log(S)), but couldn't get the estimates correct. Here is an example using the lung dataset: fit-coxph(Surv(time,status)~wt.loss,lung) surv-survfit(fit,newdata=lung[2:5,])$surv logstd-survfit(fit,newdata=lung[2:5,])$std.err logstd.time-survfit(fit,newdata=lung[2:5,])$time ## std error of cumulative hazard at time=60 logstd-logstd[logstd.time==60,] logstd [1] 0.01965858 0.01965858 0.01941871 0.01969248 ## survival S at months 60 surv-surv[logstd.time==60,] surv 2 3 4 5 0.9249238 0.9249238 0.9253038 0.9263392 Since survfit()$std.err was for cumulative hazard H=log(S), thus based on your explanation below, the standard error for log-log method is: (1/S^2)var(H) loglogstd-sqrt(1/surv^2*(logstd^2)) loglogstd 2 3 4 5 0.02125427 0.02125427 0.02098631 0.02125839 ## the upper confidence interval should be exp(-exp(log(-log(surv))-qnorm(0.975)*loglogstd)) 2 3 4 5 0.9278737 0.9278737 0.9282031 0.9292363 But this is different from the output using summary.survfit: summary(survfit(fit,newdata=lung[2:5,],conf.type='log-log'),times=60)$upper[1,] 2 3 4 5 0.9534814 0.9534814 0.9535646 0.9548478 Can you please suggest what I did wrong in the calculation? Thanks very much! John From: Therneau, Terry M., Ph.D. thern...@mayo.edu Sent: Monday, June 30, 2014 6:04 AM Subject: Re: standard error of survfit.coxph() 1. The computations behind the scenes produce the variance of the cumulative hazard. This is true for both an ordinary Kaplan-Meier and a Cox model. Transformations to other scales are done using simple Taylor series. H = cumulative hazard = log(S); S=survival var(H) = var(log(S)) = the starting point S = exp(log(S)), so var(S) is approx [deriv of exp(x)]^2 * var(log(S)) = S^2 var(H) var(log(log(S)) is approx (1/S^2) var(H) 2. At the time it was written, summary.survfit was used only for printing out the survival curve at selected times, and the audience for the printout wanted std(S). True, that was 20 years ago, but I don't recall anyone ever asking for summary to do anything else. Your request is not a bad idea. Note however that the primary impact of using log(S) or S or log(log(S)) scale is is on the confidence intervals, and they do appear per request in the summary output. Terry T. On 06/28/2014 05:00 AM, r-help-requ...@r-project.org wrote: Message: 9 Date: Fri, 27 Jun 2014 12:39:29 -0700 To:r-help@r-project.org r-help@r-project.org Subject: [R] standard error of survfit.coxph() Message-ID: 1403897969.91269.yahoomail...@web122906.mail.ne1.yahoo.com Content-Type: text/plain Hi, can anyone help me to understand the standard errors printed in the output of survfit.coxph()? time-sample(1:15,100,replace=T) status-as.numeric(runif(100,0,1)0.2) x-rnorm(100,10,2) fit-coxph(Surv(time,status)~x) ??? ### method 1 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log')$std.err [,1]??? [,2]??? [,3]??? [,4]?? [,5] ?[1,] 0.0 0.0 0.0 0.0 0. ?[2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 ?[3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 ?[4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 ?[5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 ?[6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 ?[7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 ?[8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 ?[9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394 [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413 [13,] 0.070665160 0.071363749 0.069208056 0.066655730 0.14976433 [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860 [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log')$time ?[1]? 1? 2? 3? 4? 5? 6? 7? 8? 9 10 11 12 13 14 15 ??? ### method 2 summary(survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log'),time=10)$std.err ? 1? 2? 3? 4? 5 [1,] 0.04061384 0.04106186 0.03963184 0.03715246 0.06867532 By reading the help of ?survfit.object and ?summary.survfit, the standard error provided in the output of method 1 (survfit()) was for cumulative hazard-log(survival), while the standard error provided in the output of method 2
Re: [R] standard error of survfit.coxph()
Thank you Terry for the explanation! John From: Therneau, Terry M., Ph.D. thern...@mayo.edu Sent: Monday, June 30, 2014 6:04 AM Subject: Re: standard error of survfit.coxph() 1. The computations behind the scenes produce the variance of the cumulative hazard. This is true for both an ordinary Kaplan-Meier and a Cox model. Transformations to other scales are done using simple Taylor series. H = cumulative hazard = log(S); S=survival var(H) = var(log(S)) = the starting point S = exp(log(S)), so var(S) is approx [deriv of exp(x)]^2 * var(log(S)) = S^2 var(H) var(log(log(S)) is approx (1/S^2) var(H) 2. At the time it was written, summary.survfit was used only for printing out the survival curve at selected times, and the audience for the printout wanted std(S). True, that was 20 years ago, but I don't recall anyone ever asking for summary to do anything else. Your request is not a bad idea. Note however that the primary impact of using log(S) or S or log(log(S)) scale is is on the confidence intervals, and they do appear per request in the summary output. Terry T. On 06/28/2014 05:00 AM, r-help-requ...@r-project.org wrote: Message: 9 Date: Fri, 27 Jun 2014 12:39:29 -0700 To:r-help@r-project.org r-help@r-project.org Subject: [R] standard error of survfit.coxph() Message-ID: 1403897969.91269.yahoomail...@web122906.mail.ne1.yahoo.com Content-Type: text/plain Hi, can anyone help me to understand the standard errors printed in the output of survfit.coxph()? time-sample(1:15,100,replace=T) status-as.numeric(runif(100,0,1)0.2) x-rnorm(100,10,2) fit-coxph(Surv(time,status)~x) ??? ### method 1 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log')$std.err [,1]??? [,2]??? [,3]??? [,4]?? [,5] ?[1,] 0.0 0.0 0.0 0.0 0. ?[2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 ?[3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 ?[4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 ?[5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 ?[6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 ?[7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 ?[8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 ?[9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394 [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413 [13,] 0.070665160 0.071363749 0.069208056 0.066655730 0.14976433 [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860 [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log')$time ?[1]? 1? 2? 3? 4? 5? 6? 7? 8? 9 10 11 12 13 14 15 ??? ### method 2 summary(survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log'),time=10)$std.err ? 1? 2? 3? 4? 5 [1,] 0.04061384 0.04106186 0.03963184 0.03715246 0.06867532 By reading the help of ?survfit.object and ?summary.survfit, the standard error provided in the output of method 1 (survfit()) was for cumulative hazard-log(survival), while the standard error provided in the output of method 2 (summary.survfit()) was for survival itself, regardless of how you choose the value for conf.type ('log', 'log-log' or 'plain'). This explains why the standard error output is different between method 1 (10th row) and method 2. My question is how do I get standard error estimates for log(-log(survival))? Thanks! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] standard error of survfit.coxph()
Hi, can anyone help me to understand the standard errors printed in the output of survfit.coxph()? time-sample(1:15,100,replace=T) status-as.numeric(runif(100,0,1)0.2) x-rnorm(100,10,2) fit-coxph(Surv(time,status)~x) ### method 1 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log')$std.err [,1] [,2] [,3] [,4] [,5] [1,] 0.0 0.0 0.0 0.0 0. [2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 [3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819 [4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 [5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 [6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371 [7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 [8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161 [9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394 [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028 [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413 [13,] 0.070665160 0.071363749 0.069208056 0.066655730 0.14976433 [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860 [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111 survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log')$time [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ### method 2 summary(survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log'),time=10)$std.err 1 2 3 4 5 [1,] 0.04061384 0.04106186 0.03963184 0.03715246 0.06867532 By reading the help of ?survfit.object and ?summary.survfit, the standard error provided in the output of method 1 (survfit()) was for cumulative hazard-log(survival), while the standard error provided in the output of method 2 (summary.survfit()) was for survival itself, regardless of how you choose the value for conf.type ('log', 'log-log' or 'plain'). This explains why the standard error output is different between method 1 (10th row) and method 2. My question is how do I get standard error estimates for log(-log(survival))? Thanks! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prediction based on conditional logistic regression clogit
Thank you Peter. Any other suggestions are absolutely welcome!! John From: peter dalgaard pda...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, June 16, 2014 2:22 AM Subject: Re: [R] prediction based on conditional logistic regression clogit Hi, I am using clogit() from survival package to do conditional logistic regression. I also need to make prediction on an independent dataset to calculate predicted probability. Here is an example: dat - data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), x1=rnorm(150,5,1), x2=rnorm(150,7,1.5)) dat.test - data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), x1=rnorm(90,5,1), x2=rnorm(90,7,1.5)) fit-clogit(status~x1+x2+strata(set),dat) predict(fit,newdata=dat.test,type='expected') Error in Surv(rep(1, 150L), status) : Time and status are different lengths Can anyone suggest what's wrong here? The direct cause is that clogit() works by using the fact that the likelihood is equivalent to a coxph() likelihood with stratification and all observation lengths set to 1. Therefore the analysis is formally on Surv(rep(1, 150L), status) and that goes belly-up if you apply the same formula to a data set of different length. However, notice that there is no such thing as predict.clogit(), so you are attempting predict.coxph() on a purely formal Cox model. It is unclear to what extent predicted values, in the sense of coxph() are compatible with predictions in conditional logit models. I'm rusty on this, but I think what you want is something like m - model.matrix(~ x1 + x2 - 1, data=dat.test) pp - exp(m %*% coef(fit)) pps - ave(pp, dat.test$set, FUN=sum) pp/pps i.e. the conditional probability that an observation is a case given covariates and that there is on case in each set (in the data given, you have sets of three with one being a case, so all predicted probabilities are close to 0.33). For more general matched sets, I'm not really sure what one wants. Real experts are welcome to chime in. -pd Thanks! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] prediction based on conditional logistic regression clogit
Hi, I am using clogit() from survival package to do conditional logistic regression. I also need to make prediction on an independent dataset to calculate predicted probability. Here is an example: dat - data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), x1=rnorm(150,5,1), x2=rnorm(150,7,1.5)) dat.test - data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), x1=rnorm(90,5,1), x2=rnorm(90,7,1.5)) fit-clogit(status~x1+x2+strata(set),dat) predict(fit,newdata=dat.test,type='expected') Error in Surv(rep(1, 150L), status) : Time and status are different lengths Can anyone suggest what's wrong here? Thanks! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survival estimates for covariate values
Hi all, let's say we can fit a Cox model with a numeric variable x as the independent variable. The we can calculate, say 10-year survival, for any given value of x (0 to 10 in increment of 0.1 in the example below): fit - coxph(Surv(time, event)~x,dat) surv10yr- summary(survfit(fit,newdata=data.frame(x=seq(0,10,0.1))),times=10) I am wondering if anyone can show me how to replicate this in SAS? very much appreciated! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survival estimates for covariate values
Hi all, let's say we can fit a Cox model with a numeric variable x as the independent variable. The we can calculate, say 10-year survival, for any given value of x (0 to 10 in increment of 0.1 in the example below): fit - coxph(Surv(time, event)~x,dat) surv10yr- summary(survfit(fit,newdata=data.frame(x=seq(0,10,0.1))),times=10) I am wondering if anyone can show me how to replicate this in SAS? very much appreciated! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] median survival
Hi, if 50% survival probability horizontal line in a Kaplan-Meier survival curve overlap one of the step line between 2 time points t1 and t2, the survfit() from survival package estimates median survival as t2 (the longest time point). But I saw some articles (page 23: http://www.amstat.org/chapters/northeasternillinois/pastevents/presentations/summer05_Ibrahim_J.pdf) recommend the smallest time t such time S(t)=0.5. What is the convention for the definition of median survival? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] unique rows
Hi, I wanted to remove redundant rows (with same entry in columns) in a data frame. For example, with this data frame: dat-cbind(x=c('a','a','b','b','c','c'),y=c('x','x','d','s','g','g')) dat x y [1,] a x [2,] a x [3,] b d [4,] b s [5,] c g [6,] c g after removing the redundancy, the end results should be x y [1,] a x [2,] b d [3,] b s [4,] c g what is the best way to do this? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] unique rows
sorry.. don't know unique().. such a great function From: Bert Gunter gunter.ber...@gene.com Cc: r-help@r-project.org r-help@r-project.org Sent: Tuesday, January 28, 2014 2:21 PM Subject: Re: [R] unique rows Inline. -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. H. Gilbert Welch Hi, I wanted to remove redundant rows (with same entry in columns) in a data frame. For example, with this data frame: dat-cbind(x=c('a','a','b','b','c','c'),y=c('x','x','d','s','g','g')) ## this is not a data frame. And would you kindly explain why you posted here instead of reading ?unique. -- Bert dat x y [1,] a x [2,] a x [3,] b d [4,] b s [5,] c g [6,] c g after removing the redundancy, the end results should be x y [1,] a x [2,] b d [3,] b s [4,] c g what is the best way to do this? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] median survival
please ignore. actually the median survival from survfit() is the mean of the 2 time points. To: R help r-help@r-project.org Sent: Tuesday, January 28, 2014 11:27 AM Subject: [R] median survival Hi, if 50% survival probability horizontal line in a Kaplan-Meier survival curve overlap one of the step line between 2 time points t1 and t2, the survfit() from survival package estimates median survival as t2 (the longest time point). But I saw some articles (page 23: http://www.amstat.org/chapters/northeasternillinois/pastevents/presentations/summer05_Ibrahim_J.pdf) recommend the smallest time t such time S(t)=0.5. What is the convention for the definition of median survival? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Installation folder
Hi, I noticed that when I install/update packages, the installation folder is C:/User/My Document/R, not in C:/Program Files/R. R itself was still in Program Files folder. Don't know how this has happened. It used to work ok.br/br/Any clues or how to correct the problem is appreciated!br/br/Thanksbr/br/Johna href=http://overview.mail.yahoo.com?.src=iOS;br/br/Sent from Yahoo Mail for iPhone/a [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different colours for LatticeExtra graphs
a href=http://overview.mail.yahoo.com?.src=iOS;br/br/Sent from Yahoo Mail for iPhone/a [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different colours for LatticeExtra graphs
a href=http://overview.mail.yahoo.com?.src=iOS;br/br/Sent from Yahoo Mail for iPhone/a [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] estimating survival function from Cox model
Hi, I have some questions on how to estimate the survival function from a Cox model. I know how to do this in R using survfit(). But let's say the model was done is another software, and I was only given the estimate of baseline cumulative hazard A0(t=10) at the specified time t=10 (baseline cumulative hazard refers to when covariate X=0)and the beta estimate b for the covariate used in Cox model X. So the survival function at time 10 for a given covariate value x can be calculated as: A(t=10|X=x) = exp(b*x)*A0(t=10) where A is cumulative hazard when X=x S(t=10|X=x) = exp(-A(t=10|X=x)) where S is survival function to be calculated Now I want to calculate confidence interval for S(t=10|X=x). I think I need to calculate the CI for cumulative hazard A(t=10|X=x) first and then exponentiate it to get CI for S, based on the relationship S = exp(-A). To get CI for A, I need to calculate the estimate of standard error of A. I know the other software can give me the standard error of A0, the baseline cumulative hazard. Based on the relationship A = exp(b*x)*A0, I guess I'll need the standard error for b. But how do I calculate the standard error for A based on standard errors for A0 and b? Any insights would be greatly appreciated! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] estimating survival function from Cox model
Hi, I have some questions on how to estimate the survival function from a Cox model. I know how to do this in R using survfit(). But let's say the model was done is another software, and I was only given the estimate of baseline cumulative hazard A0(t=10) at the specified time t=10 (baseline cumulative hazard refers to when covariate X=0)and the beta estimate b for the covariate used in Cox model X. So the survival function at time 10 for a given covariate value x can be calculated as: A(t=10|X=x) = exp(b*x)*A0(t=10) where A is cumulative hazard when X=x S(t=10|X=x) = exp(-A(t=10|X=x)) where S is survival function to be calculated Now I want to calculate confidence interval for S(t=10|X=x). I think I need to calculate the CI for cumulative hazard A(t=10|X=x) first and then exponentiate it to get CI for S, based on the relationship S = exp(-A). To get CI for A, I need to calculate the estimate of standard error of A. I know the other software can give me the standard error of A0, the baseline cumulative hazard. Based on the relationship A = exp(b*x)*A0, I guess I'll need the standard error for b. But how do I calculate the standard error for A based on standard errors for A0 and b? Any insights would be greatly appreciated! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] estimating survival function from Cox model
Hi, I have some questions on how to estimate the survival function from a Cox model. I know how to do this in R using survfit(). But let's say the model was done is another software, and I was only given the estimate of baseline cumulative hazard A0(t=10) at the specified time t=10 (baseline cumulative hazard refers to when covariate X=0)and the beta estimate b for the covariate used in Cox model X. So the survival function at time 10 for a given covariate value x can be calculated as: A(t=10|X=x) = exp(b*x)*A0(t=10) where A is cumulative hazard when X=x S(t=10|X=x) = exp(-A(t=10|X=x)) where S is survival function to be calculated Now I want to calculate confidence interval for S(t=10|X=x). I think I need to calculate the CI for cumulative hazard A(t=10|X=x) first and then exponentiate it to get CI for S, based on the relationship S = exp(-A). To get CI for A, I need to calculate the estimate of standard error of A. I know the other software can give me the standard error of A0, the baseline cumulative hazard. Based on the relationship A = exp(b*x)*A0, I guess I'll need the standard error for b. But how do I calculate the standard error for A based on standard errors for A0 and b? Any insights would be greatly appreciated! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] power analysis is applicable or not
Hi, this is a statistical question rather than a pure R question. I have got many help from R mailing list in the past, so would like to try here and appreciate any input: I conducted Mantel-Haenszel test to show that the performance of a diagnostic test did not show heterogeneity among 4 study sites, i.e. Mantel Haenszel test p value 0.05, so that I could conduct a meta-analysis by combining data of all 4 study sites. Now one of the reviewers for the manuscript did a powering analysis for Mantel Haneszel test showing that with the sample sizes I have, the power for Mantel Haeszel test was only 50%. So he argued that I did not have enough power for Mantel Haenszel test. My usage of Mantel Haenszel was NOT to show a significant p value, instead a non-sginificant p value was what I was looking for because non-significant p value indicate NO heterogeneity among study sites. Powering analysis in general is to show whether you have enough sample size to ensure a statistical significant difference can be seen with certain likelihood. But this is not how I used Mantel Haenszel test. So I think in my scenario, the power analysis is NOT applicable because I am simply using the test to demonstrate a non-significant p value. Am I correct on this view? Thanks and appreciate any thoughts. John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] power analysis is applicable or not
Hi Chris, Thanks for sharing your thoughts. The reviewer used the heterogeneity that I observed in my study for the power analysis. I understand what you have descried. And I agree that with the sample size I have, I do not have enough power to detect the heterogeneity that I observed with significance. But if let's say I have enough sample size as calculated by the power analysis, then I will have 80% power to detect the heterogeneity, would it be true that I will almost very unlikely to declare homogeneity among study sites, so that I would almost never be able to combine study sites? This goes to the general thinking that if you have a sample size large enough, you will always be able to make any difference statistically significant... On the the hand, making a statistical inference using any statistical test (including Mantel Haenszel test), I though, is always valid regardless of sample size. For the heterogeneity test, I am just doing that -- making a statistical inference with the p value from Mantel Haenszel test. I am not sure if it is correct that it is mandatory to perform a power analysis before attempting a statistical test. Please share your thoughts... Thanks John From: Christopher W. Ryan cr...@binghamton.edu Sent: Tuesday, November 12, 2013 6:53 PM Subject: Re: [R] power analysis is applicable or not John-- Well, my simple-minded way of thinking about these issues goes something like this: You want to know if there is heterogeneity. You gather some data and do your MH analysis. You never know *for sure* whether there is *really* heterogeneity in your population; all you know is whether there is any in your sample--you concluded there was not. Your reviewer calculated that with the sample size you used, *even if there was heterogeneity in your population* (unknowable by you or anyone else) then your sample size only had a 50% probability of detecting it (a 50% probability of coming up with a p 0.05). Meaning there *could have been* heterogeneity there, at a 0.05 signficance level, and you *would* have seen it, *if* your sample size was larger. It's when you come up with a non-significant result that the issue of power is most relevant. If you already have a significant result, then yes, your sample size was large enough to show a significant result. An important question is: what *magnitude* of heterogeneity did your reviewer assume he/she was looking for when he/she did the power calculation? And is that magnitude meaningful? All this being said, power calculations are best done before recruiting subjects or gathering data. --Chris Ryan SUNY Upstate Medical University Binghamton, NY array chip wrote: Hi, this is a statistical question rather than a pure R question. I have got many help from R mailing list in the past, so would like to try here and appreciate any input: I conducted Mantel-Haenszel test to show that the performance of a diagnostic test did not show heterogeneity among 4 study sites, i.e. Mantel Haenszel test p value 0.05, so that I could conduct a meta-analysis by combining data of all 4 study sites. Now one of the reviewers for the manuscript did a powering analysis for Mantel Haneszel test showing that with the sample sizes I have, the power for Mantel Haeszel test was only 50%. So he argued that I did not have enough power for Mantel Haenszel test. My usage of Mantel Haenszel was NOT to show a significant p value, instead a non-sginificant p value was what I was looking for because non-significant p value indicate NO heterogeneity among study sites. Powering analysis in general is to show whether you have enough sample size to ensure a statistical significant difference can be seen with certain likelihood. But this is not how I used Mantel Haenszel test. So I think in my scenario, the power analysis is NOT applicable because I am simply using the test to demonstrate a non-significant p value. Am I correct on this view? Thanks and appreciate any thoughts. John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sparse PCA using nsprcomp package
Hi Christian, Thank you so much for sharing your thoughts, I was a real pleasure to read and learn! Approximately when do you expect the new release of the package? Best, John From: Christian Sigg r-h...@sigg-iten.ch Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, September 9, 2013 8:06 AM Subject: Re: [R] sparse PCA using nsprcomp package Hi John 1). Assume now I can calculate these adjusted standard deviation from sparse PCA, should the percent variation explained by each sparse PC be calculated using the sum of all these adjusted variance (i.e. square of the adjusted standard deviation) as the denominator (then these percent variation explained will always add up to 1 if all sparse PCs are counted, or using the sum of the PC variances estimated by REGULAR PCA as the denominator (then, adding up all PCs may not be equal to 1)? It depends on what you want to do with this percentage, but to me the second would be more meaningful. A sparse PCA will usually be truncated (fewer than all possible components are computed), and due to the additional constraints on the principal axes you will usually explain less variance than with standard PCA. I would want to know what I lose in a sparse PCA w.r.t. a standard PCA. Note that you don't actually have to compute the standard PCA if you are only interested in the total variance of the data, i.e. the sum of all variances. The total variance 1/(n-1)*sum(diag(t(X)%*%X)) for the zero-mean data matrix X is invariant to a rotation of the coordinate system and therefore identical to Z - X%*%W 1/(n-1)*sum(diag(t(Z)%*%Z)) so you can skip computing the PCA rotation matrix W. The fastest way to compute the total variance is probably 1/(n-1)*sum(X^2) because all expressions compute the squared Frobenius norm of X. If you want to compare variances of individual components, then compute a regular PCA. I also had a look how the spca function computes the percentage explained variation. I don't yet entirely understand what is going on, but the results differ from using the asdev function I mentioned in my previous reply. Keep that in mind if you want to compare nsprcomp to spca. 2). How do you choose the 2 important parameters in nsprcomp(), ncomp and k? If for example, my regular PCA showed that I need 20 PCs to account for 80% of the variation in my dataset, does it mean I should set ncomp=20? And then what about any rules setting the value of k? I don't have any hard answers for this question. There are a number of heuristics for choosing the number of components in regular PCA (e.g. the PCA book by Jolliffe presents several), and some of them should translate to sparse PCA. If you think that 20 PCs or 80% explained variance works well for regular PCA, I suggest also using 20 components in sparse PCA, then measure the explained variance and then increase the number of components (if necessary) to again achieve 80% explained variance. Same for setting the cardinality parameter k. You could use a criterion such as BIC to optimize the trade-off between model fidelity and complexity, but I don't have any experience how well this works in practice. What I did so far was to check for loadings with small magnitudes. I set k such that all loadings have substantial magnitudes. In the end what matters is what follows after running the algorithm. Are you directly interpreting the sparse PCA result, or is this an intermediate step in a complete data processing pipeline with a measurable goal? If the latter, choose the algortihm parameters that give you the best results at the end of the pipeline. 3). Would you recommend nscumcomp() or nsprcomp() in general? It depends whether you want to perform a sequential or cumulative analysis of your data. If you want maximum variance in the first (second, third, ...) PC, and specify the precise cardinality of each principal axis, then use nsprcomp. If instead you want to only specify the total cardinality of all loadings and leave the distribution of non-zero loadings to the algorithm, use nscumcomp. There will be substantial improvements to nscumcomp in the next release, if you want to use it I suggest you wait until then. Regards Christian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sparse PCA using nsprcomp package
Hi all, I am using nsprcomp() from nsprcomp package to run sparse PCA. The output is very much like regular PCA by prcomp() in that it provides sdev for standard deviation of principle components (PC). For regular PCA by prcomp(), we can easily calculate the percent of total variance explained by the first k PCs by using cumsum(obj$sdev^2) because these PCs are independent of each other so you can simply add up the variance of these PCs. For sparse PCA, as far as I understand, the generated PCs are not independent of each other anymore, so you can not simply add up variances to calculate percentage of variance explained by the first k PCs. For example, in the package of elasticnet where spca() also performs sparse PCA, one of the output from spca() is pev for percent explained variation which is based on so-called adjusted variance that adjusted for the fact that these variances of PCs are not independent anymore. My question is for nsprcomp, how can I calculate percent explained variation by using sdev when I know these PCs are not independent of each other? Thanks! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sparse PCA using nsprcomp package
HI Christian, Thanks so much for the detailed explanation! I look forward to the new release of nsprcomp package! At the meantime, I will use the function below for calculation of adjusted standard deviation. I have 2 more questions, hope you can shed some lights on: 1). Assume now I can calculate these adjusted standard deviation from sparse PCA, should the percent variation explained by each sparse PC be calculated using the sum of all these adjusted variance (i.e. square of the adjusted standard deviation) as the denominator (then these percent variation explained will always add up to 1 if all sparse PCs are counted, or using the sum of the PC variances estimated by REGULAR PCA as the denominator (then, adding up all PCs may not be equal to 1)? 2). How do you choose the 2 important parameters in nsprcomp(), ncomp and k? If for example, my regular PCA showed that I need 20 PCs to account for 80% of the variation in my dataset, does it mean I should set ncomp=20? And then what about any rules setting the value of k? 3). Would you recommend nscumcomp() or nsprcomp() in general? Thanks so much again, John From: Christian Sigg r-h...@sigg-iten.ch Cc: r-help@r-project.org Sent: Thursday, September 5, 2013 2:43 PM Subject: Re: [R] sparse PCA using nsprcomp package Hi John I am currently traveling and have sporadic net access, I therefore can only answer briefly. It's also quite late, I hope what follows still makes sense... For regular PCA by prcomp(), we can easily calculate the percent of total variance explained by the first k PCs by using cumsum(obj$sdev^2) because these PCs are independent of each other so you can simply add up the variance of these PCs. For sparse PCA, as far as I understand, the generated PCs are not independent of each other anymore, so you can not simply add up variances to calculate percentage of variance explained by the first k PCs. For example, in the package of elasticnet where spca() also performs sparse PCA, one of the output from spca() is pev for percent explained variation which is based on so-called adjusted variance that adjusted for the fact that these variances of PCs are not independent anymore. You are correct that measuring explained variance is more involved with sparse (or non-negative) PCA, because the principal axes no longer correspond to eigenvectors of the covariance matrix, and are usually not even orthogonal. The next update for the 'nsprcomp' package is almost done, and one of the changes will concern the reported standard deviations. In the current version (0.3), the standard deviations are computed from the scores matrix X*W, where X is the data matrix and W is the (pseudo-)rotation matrix consisting of the sparse loadings. Computing variance this way has the advantage that 'sdev' is consistent with the scores matrix, but it has the disadvantage that some of the explained variance is counted more than once because of the non-orthogonality of the principal axes. One of the symptoms of this counting is that the variance of a later component can actually exceed the variance of an earlier component, which is not possible in regular PCA. In the new version of the package, 'sdev' will report the _additional_ standard deviation of each component, i.e. the variance not explained by the previous components. Given a basis of the space spanned by the previous PAs, the variance of the PC is computed after projecting the current PA to the ortho-complement space of the basis. This procedure reverts back to standard PCA if no sparsity or non-negativity constraints are enforced on the PAs. My question is for nsprcomp, how can I calculate percent explained variation by using sdev when I know these PCs are not independent of each other? The new version of the package will do it for you. Until then, you can use something like the following function asdev - function(X, W) { nc - ncol(W) sdev - numeric(nc) Q - qr.Q(qr(W)) Xp - X for (cc in seq_len(nc)) { sdev[cc] - sd(Xp%*%W[ , cc]) Xp - Xp - Xp%*%Q[ , cc]%*%t(Q[ , cc]) } return(sdev) } to compute the additional variances for given X and W. The package documentation will explain the above in some more detail, and I will also have a small blog post which compares the 'nsprcomp' and 'spca' routine from the 'elasticnet' package on the 'marty' data from the EMA package. Best regards Christian [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] meta-analysis for sensitivity
Hi all, I am new to meta-analysis. Is there any special package that can calculate summarized sensitivity with 95% confidence interval for a diagnostic test, based on sensitivities from several individual studies? Thanks for any suggestions. John From: Jannis bt_jan...@yahoo.de To: r-help@r-project.org r-help@r-project.org Sent: Monday, April 8, 2013 11:12 AM Subject: [R] checkUsage from codetools shows errors when function uses functions from loaded packages Dear list members, I frequently program small scripts and wrap them into functions to be able to check them with checkUsage. In case these functions (loaded via source or copy pasted to the R console) use functions from other packages, I get this error: no visible global function definition for âxxxâ For example: test = function() {  require(plotrix)  color.legend() } library(codetools) checkUsage(test) Can I tell codetools somehow where to look for these functions without building a full blown package? Cheers Jannis __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cnfidence intervals for survfit()
Hi, I am wondering how the confidence interval for Kaplan-Meier estimator is calculated by survfit(). For example, summary(survfit(Surv(time,status)~1,data),times=10) Call: survfit(formula = Surv(rtime10, rstat10) ~ 1, data = mgi) time n.risk n.event survival std.err lower 95% CI upper 95% CI 10 168 55 0.761 0.0282 0.707 0.818 I am trying to reproduce the upper and lower CI by using standard error. As far I understand, the default method for survfit() to calculate confidence interval is on the log survival scale, so: upper CI = exp(log(0.761)+qnorm(0.975)*0.0282) = 0.804 lower CI = exp(log(0.761)-qnorm(0.975)*0.0282) = 0.720 they are not the same as the output from survfit(). Am I missing something? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error installing KEGGSOAP
Hi, I am new to bioconductor, trying to install KEGGSOAP package, but got warnings() when installing and error message when trying to load the package, can anyone suggest what went wrong? many thanks John source(http://bioconductor.org/biocLite.R;) Bioconductor version 2.11 (BiocInstaller 1.8.3), ?biocLite for help biocLite(KEGGSOAP) BioC_mirror: http://bioconductor.org Using Bioconductor version 2.11 (BiocInstaller 1.8.3), R version 2.15. Installing package(s) 'KEGGSOAP' trying URL 'http://bioconductor.org/packages/2.11/bioc/bin/windows/contrib/2.15/KEGGSOAP_1.32.0.zip' Content type 'application/zip' length 69037 bytes (67 Kb) opened URL downloaded 67 Kb package âKEGGSOAPâ successfully unpacked and MD5 sums checked The downloaded binary packages are in        C:\Users\yzhang\AppData\Local\Temp\RtmpawAjwx\downloaded_packages Warning message: installed directory not writable, cannot update packages 'acepack', 'actuar', 'ada', 'ade4', 'ade4TkGUI',  'agricolae', 'akima', 'ape', 'aplpack', 'arules', 'bitops', 'boot', 'Cairo', 'car', 'caTools', 'cba',  'chron', 'class', 'clue', 'cluster', 'coda', 'colorspace', 'CompQuadForm', 'corpcor', 'DAAG', 'date',  'deldir', 'descr', 'deSolve', 'devtools', 'digest', 'diptest', 'doBy', 'DoE.wrapper', 'e1071', 'effects',  'ENmisc', 'epiR', 'eRm', 'evaluate', 'evd', 'FactoMineR', 'fArma', 'fAssets', 'fBasics', 'fdrtool',  'fExoticOptions', 'fExtremes', 'fGarch', 'fields', 'flexclust', 'fMultivar', 'fNonlinear', 'fOptions',  'forecast', 'foreign', 'fpc', 'fracdiff', 'fRegression', 'FrF2', 'FrF2.catlg128', 'fTrading',  'fUnitRoots', 'gamlss', 'gamlss.data', 'gamlss.dist', 'gclus', 'gdata', 'geoR', 'GGally', 'ggm',  'ggplot2', 'gmodels', 'gridBase', 'gWidgets', 'gWidgetstcltk', 'HH', 'Hmisc', 'httr', 'igraph',  'igraph0', 'inline', 'ipred', 'isa2', 'JavaGD', 'JGR', 'kernlab', 'KernSmoot [... truncated] library(KEGGSOAP) Loading required package: BiocGenerics Attaching package: âBiocGenericsâ The following object(s) are masked from âpackage:statsâ:    xtabs The following object(s) are masked from âpackage:baseâ:    anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, get, intersect, lapply, Map, mapply, mget, order, paste,    pmax, pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int, rownames, sapply, setdiff, table, tapply, union, unique failed to load HTTP resource Error : .onLoad failed in loadNamespace() for 'KEGGSOAP', details:  call: NULL  error: 1: failed to load HTTP resource Error: package/namespace load failed for âKEGGSOAPâ sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252  [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                         [5] LC_TIME=English_United States.1252   attached base packages: [1] stats    graphics grDevices datasets utils    methods  base    other attached packages: [1] BiocGenerics_0.4.0 BiocInstaller_1.8.3 loaded via a namespace (and not attached): [1] codetools_0.2-8 RCurl_1.91-1.1 SSOAP_0.8-0    tools_2.15.1   XML_3.9-4.1    XMLSchema_0.7-2 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weird merge()
Hi James, I am trying to combine 11 data frames by column, not by row. My original message has 11 data text files attached, did they go through so you can try my codes? Thanks John From: J Toll jct...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Friday, January 11, 2013 1:35 PM Subject: Re: [R] weird merge() Hi, I have some protein array data, each array in a separate text file. So I read them in and try to combine them into a single data frame by using merge(). see code below (If you download the attached data files into a specific folder, the code below should work): fls-list.files(C:\\folder_of_download,full.names=T) ## get file names prot-list() ## a list to contain individual files ind-1 for (i in fls[c(1:11)]) { cat(ind, ) tmp-read.delim(i,header=T,row.names=NULL,na.string='null') colnames(tmp)[4]-as.character(tmp$barcode[1]) prot[[ind]]-tmp[,-(1:2)] ind-ind+1 } ## try to merge them together ## not do this in a loop so I can see where the problem occurs pro-merge(prot[[1]],prot[[2]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[3]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[4]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[5]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[6]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[7]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[8]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[9]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[10]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[11]],by.x=1,by.y=1,all=T) I noticed that starting file #8, the merge become more and more slower that when it's file #11, the computer was stuck! Originally I thought something wrong with the later files, but when I change the order of merging, the slow-down still happens at the 8th files to be merged. Can anyone suggest what's going on with merging? I'm not sure exactly what you're trying to do with all that code, but if you're just trying to get all eleven files into a data.frame, you could do this: allFilesAsList - lapply(1:11, function(i) read.delim(paste(p, i, .txt, sep = ))) oneBigDataFrame - do.call(rbind, allFilesAsList) You may need to fix the column names. Is that anything like what you were trying to do? James [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weird merge()
Hi Dennis, Actually, I am trying to combine them by COLUMN, so that's why I am using merge(). The first loop was to simply read these protein data into R as 11 data frames, each data frame is 165 x 2. Then I use merge() to combine these data frames into 1 big data frame by column with these individual merge() statements. I didn't do it in a loop because I want to see at what point the merge() will start to fail. And it turns out the merge of the first 7 data frames is working fine. Starting from the 8th column, it becomes more and more slow and at the 11th data frame it appeared stuck on my computer. Thanks John From: Dennis Murphy djmu...@gmail.com Sent: Friday, January 11, 2013 1:25 PM Subject: Re: [R] weird merge() Hi John: This doesn't look right. What are you trying to do? [BTW, the variable names in the attachments have spaces, so most of R's read functions should choke on them. At the very least, replace the spaces with underscores.] If all you are trying to do is row concatenate them (since the two or three I looked at appear to have the same structure), then it's as simple as pro - do.call(rbind, prot) If this is what you want along with an indicator for each data file, then the ldply() function in the plyr package might be useful as an alternative to do.call. It should return an additional variable .id whose value corresponds to the number (or name) of the list component. library(plyr) pro2 - ldply(prot, rbind) If you want something different, then be more explicit about what you want, because your merge() code doesn't make a lot of sense to me. Dennis PS: Just a little hint: if you're using (almost) the same code repeatedly, there's probably a more efficient way to do it in R. CS types call this the DRY principle: Don't Repeat Yourself. I know you know this, but a little reminder doesn't hurt :) Hi, I have some protein array data, each array in a separate text file. So I read them in and try to combine them into a single data frame by using merge(). see code below (If you download the attached data files into a specific folder, the code below should work): fls-list.files(C:\\folder_of_download,full.names=T) ## get file names prot-list() ## a list to contain individual files ind-1 for (i in fls[c(1:11)]) { cat(ind, ) tmp-read.delim(i,header=T,row.names=NULL,na.string='null') colnames(tmp)[4]-as.character(tmp$barcode[1]) prot[[ind]]-tmp[,-(1:2)] ind-ind+1 } ## try to merge them together ## not do this in a loop so I can see where the problem occurs pro-merge(prot[[1]],prot[[2]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[3]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[4]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[5]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[6]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[7]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[8]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[9]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[10]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[11]],by.x=1,by.y=1,all=T) I noticed that starting file #8, the merge become more and more slower that when it's file #11, the computer was stuck! Originally I thought something wrong with the later files, but when I change the order of merging, the slow-down still happens at the 8th files to be merged. Can anyone suggest what's going on with merging? Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weird merge()
I just figured out the reason was the column (the 1st column in each data frame gene.name) by which to merge each data frame has no unique values, some values were repeated, so when merging, the data frame gets bigger and bigger exponentially. Sorry to bother all. John From: J Toll jct...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Friday, January 11, 2013 1:35 PM Subject: Re: [R] weird merge() Hi, I have some protein array data, each array in a separate text file. So I read them in and try to combine them into a single data frame by using merge(). see code below (If you download the attached data files into a specific folder, the code below should work): fls-list.files(C:\\folder_of_download,full.names=T) ## get file names prot-list() ## a list to contain individual files ind-1 for (i in fls[c(1:11)]) { cat(ind, ) tmp-read.delim(i,header=T,row.names=NULL,na.string='null') colnames(tmp)[4]-as.character(tmp$barcode[1]) prot[[ind]]-tmp[,-(1:2)] ind-ind+1 } ## try to merge them together ## not do this in a loop so I can see where the problem occurs pro-merge(prot[[1]],prot[[2]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[3]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[4]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[5]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[6]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[7]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[8]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[9]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[10]],by.x=1,by.y=1,all=T) pro-merge(pro,prot[[11]],by.x=1,by.y=1,all=T) I noticed that starting file #8, the merge become more and more slower that when it's file #11, the computer was stuck! Originally I thought something wrong with the later files, but when I change the order of merging, the slow-down still happens at the 8th files to be merged. Can anyone suggest what's going on with merging? I'm not sure exactly what you're trying to do with all that code, but if you're just trying to get all eleven files into a data.frame, you could do this: allFilesAsList - lapply(1:11, function(i) read.delim(paste(p, i, .txt, sep = ))) oneBigDataFrame - do.call(rbind, allFilesAsList) You may need to fix the column names. Is that anything like what you were trying to do? James [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sort matrix based on a specific order
Hi I have a character matrix with 2 columns A and B, If I want to sort the matrix based on the column B, but based on a specific order of characters: mat-cbind(c('w','x','y','z'),c('a','b','c','d')) ind-c('c','b','d','a') I want mat to be sorted by the sequence in ind: [,1] [,2] [1,] y c [2,] x b [3,] z d [4,] w a Is there any simple function that can do this? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sort matrix based on a specific order
Thank you all for the suggestions, fantastic! From: Rui Barradas ruipbarra...@sapo.pt Cc: r-help@r-project.org r-help@r-project.org Sent: Thursday, January 10, 2013 10:32 AM Subject: Re: [R] sort matrix based on a specific order Hello, Try the following. order() gives you a permutation of the vector 'ind' and to order that permutation gives its inverse. mat - cbind(c('w','x','y','z'),c('a','b','c','d')) ind - c('c','b','d','a') ord - order(ind) mat[order(ord), ] Hope this helps, Rui Barradas Em 10-01-2013 18:21, array chip escreveu: Hi I have a character matrix with 2 columns A and B, If I want to sort the matrix based on the column B, but based on a specific order of characters: mat-cbind(c('w','x','y','z'),c('a','b','c','d')) ind-c('c','b','d','a') I want mat to be sorted by the sequence in ind: [,1] [,2] [1,] y c [2,] x b [3,] z d [4,] w a Is there any simple function that can do this? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] NRI or IDI for survival data - Hmisc package
Hi, I am trying to calculate net reclassification improvement (NRI) and Inegrated Discrimination Improvement (IDI) for a survival dataset to compare 2 risk models. It seems that the improveProb() in Hmisc package does this only for binary outcome, while rcorrp.cens() does take survival object, but doesn't output NRI or IDI. Can anyone suggest any other packages that can calculate NRI and IDI for survival data? The reference for NRI extending to survival data is: Pencina MJ, D'Agostino RB, Steyerberg EW (2011): Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat in Med 30:11-21 Thanks! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] decimal points midline
Hi, does anyone know how to make decimal points midline when plotting? The journal to which we are going to submit a manuscript require this particular formatting, and it gives direction how to do this on PC: hold down ALT key and type 0183 on the number pad Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] decimal points midline
Thanks Uwe, options(OutDec=\xB7) worked! From: Uwe Ligges lig...@statistik.tu-dortmund.de To: John Kane jrkrid...@inbox.com roject.org Sent: Wednesday, August 8, 2012 12:43 PM Subject: Re: [R] decimal points midline On 08.08.2012 21:22, John Kane wrote: Can you point to an example? It sounds like the journal is still using typewriters. Anyway, setting options(OutDec=ALT0183) and replace ALT0183 by the keystrokes your described below or by options(OutDec=\xB7) Best, Uwe Ligges John Kane Kingston ON Canada -Original Message- Sent: Wed, 8 Aug 2012 11:52:27 -0700 (PDT) To: r-help@r-project.org Subject: [R] decimal points midline Hi, does anyone know how to make decimal points midline when plotting? The journal to which we are going to submit a manuscript require this particular formatting, and it gives direction how to do this on PC: hold down ALT key and type 0183 on the number pad Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks orcas on your desktop! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a simple mixed model
Hi Peter, I might be unclear in my description of the data. Each patient was measured for a response variable y at 3 time points, there is no drug or other treatment involved. The objective was to examine the repeatability of the measurements of response variable y. Since this is repeated measure, I thought it should be analyzed by a simple mixed model? When you suggested a MxK (K=3) design, what is M then? Thanks very much, John From: peter dalgaard pda...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Sunday, May 27, 2012 12:09 AM Subject: Re: [R] a simple mixed model On May 27, 2012, at 07:12 , array chip wrote: Hi, I was reviewing a manuscript where a linear mixed model was used. The data is simple: a response variable y was measured for each subject over 3 time points (visit 1, 2 and 3) that were about a week apart between 2 visits. The study is a non-drug study and one of the objectives was to evaluate the repeatability of response variable y. The author wanted to estimate within-subject variance for that purpose. This is what he wrote within-subject variance was generated from SAS 'Prog Mixed' procedure with study visit as fixed effect and subject as random effect. I know that the study visit was a factor variable, not a numeric variable. Because each subject has 3 repeated measurements from 3 visits, how can a model including subject as random effect still use visit as fixed factor? If I would do it in R, I would just use a simple model to get within-subject variance: obj-lmer(y~1+(1|subject),data=data) What does a model obj-lmer(y~visit+(1|subject),data=data) mean? [[elided Yahoo spam]] Sounds like a pretty standard two-way ANOVA with random row effects. If the design is complete (M x K with K = 3 in this case), you look at the row and column means. An additive model is assumed and the residual (interaction) is used to estimate the error variance. The variation of the row means is compared to the residual variance. If tau is the variance between row levels, the variance of the row means is sigma^2/K + tau, and tau can be estimated by subtraction. The column averages can be tested for systematic differences between visits with the usual F test. A non-zero effect here indicates that visits 1, 2, 3 have some _systematic_ difference across all individuals. For an incomplete design, the model is the same, but the calculations are less simple. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] a simple mixed model
Hi, I was reviewing a manuscript where a linear mixed model was used. The data is simple: a response variable y was measured for each subject over 3 time points (visit 1, 2 and 3) that were about a week apart between 2 visits. The study is a non-drug study and one of the objectives was to evaluate the repeatability of response variable y. The author wanted to estimate within-subject variance for that purpose. This is what he wrote within-subject variance was generated from SAS 'Prog Mixed' procedure with study visit as fixed effect and subject as random effect. I know that the study visit was a factor variable, not a numeric variable. Because each subject has 3 repeated measurements from 3 visits, how can a model including subject as random effect still use visit as fixed factor? If I would do it in R, I would just use a simple model to get within-subject variance: obj-lmer(y~1+(1|subject),data=data) What does a model obj-lmer(y~visit+(1|subject),data=data) mean? appreciate any thoughts! John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] low R square value from ANCOVA model
Thank you Peter, so if I observe a significant coefficient, that significance still holds because the standard error of the coefficient has taken the residual error (which is large because large R square) into account, am I correct? John From: peter dalgaard pda...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, May 7, 2012 11:07 PM Subject: Re: [R] low R square value from ANCOVA model On May 8, 2012, at 05:10 , array chip wrote: Hi, what does a low R-square value from an ANCOVA model mean? For example, if the R square from the model is about 0.2, does this mean the results should NOT be trusted? I checked the residuals of the model, it looked fine... It just means that your model has low predictive power (at the individual level). I.e. the noise (error) part of the model is large relative to the signal (systematic part). Statistical inferences are not compromised by that, except of course that large error variation is reflected in large standard errors of estimated regression coefficients. Thanks for any suggestion. John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] low R square value from ANCOVA model
Thanks again Peter. What about the argument that because low R square (e.g. R^2=0.2) indicated the model variance was not sufficiently explained by the factors in the model, there might be additional factors that should be identified and included in the model. And If these additional factors were indeed included, it might change the significance for the factor of interest that previously showed significant coefficient. In other word, if R square is low, the significant coefficient observed is not trustworthy. What's your opinion on this argument? Many thanks! John From: peter dalgaard pda...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, May 7, 2012 11:43 PM Subject: Re: [R] low R square value from ANCOVA model On May 8, 2012, at 08:34 , array chip wrote: Thank you Peter, so if I observe a significant coefficient, that significance still holds because the standard error of the coefficient has taken the residual error (which is large because large R square) into account, am I correct? In essence, yes. One might quibble over the use of large because, but it's not important for the main point. -pd John From: peter dalgaard pda...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, May 7, 2012 11:07 PM Subject: Re: [R] low R square value from ANCOVA model On May 8, 2012, at 05:10 , array chip wrote: Hi, what does a low R-square value from an ANCOVA model mean? For example, if the R square from the model is about 0.2, does this mean the results should NOT be trusted? I checked the residuals of the model, it looked fine... It just means that your model has low predictive power (at the individual level). I.e. the noise (error) part of the model is large relative to the signal (systematic part). Statistical inferences are not compromised by that, except of course that large error variation is reflected in large standard errors of estimated regression coefficients. Thanks for any suggestion. John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] low R square value from ANCOVA model
Hi Peter, searched old mail archive and found this topic had been discussed before. The previous discussion was around a situation where there was a very large sample size involved so even a small effect still showed up as significant even with low R square of the model. In my case, the sample size is 72, the significance of group effect is due to large effect relative to its standard error: obj-lm(y~age+sex+school+group,dat) summary(obj) Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 169.8634 13.4678 12.613 2e-16 age -0.3737 0.2762 -1.353 0.1805 sexM 2.1137 8.6585 0.244 0.8079 schoolS2 4.1711 8.1811 0.510 0.6118 groupG2 -20.8944 10.2807 -2.032 0.0461 Residual standard error: 32.13 on 67 degrees of freedom Multiple R-squared: 0.1732, Adjusted R-squared: 0.1238 F-statistic: 3.509 on 4 and 67 DF, p-value: 0.01163 So R-squared is quite low (0.17), what's your opinion on the argument that the significant coefficient for group is not trustworthy because the model variance was not sufficiently accounted for, and if additional factors could be identified and included in the model, that might changed the effect of group from significant to insignificant. Many thanks for sharing your thoughts. John To: peter dalgaard pda...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Tuesday, May 8, 2012 1:45 PM Subject: Re: [R] low R square value from ANCOVA model Thanks again Peter. What about the argument that because low R square (e.g. R^2=0.2) indicated the model variance was not sufficiently explained by the factors in the model, there might be additional factors that should be identified and included in the model. And If these additional factors were indeed included, it might change the significance for the factor of interest that previously showed significant coefficient. In other word, if R square is low, the significant coefficient observed is not trustworthy. What's your opinion on this argument? Many thanks! John From: peter dalgaard pda...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, May 7, 2012 11:43 PM Subject: Re: [R] low R square value from ANCOVA model On May 8, 2012, at 08:34 , array chip wrote: Thank you Peter, so if I observe a significant coefficient, that significance still holds because the standard error of the coefficient has taken the residual error (which is large because large R square) into account, am I correct? In essence, yes. One might quibble over the use of large because, but it's not important for the main point. -pd John From: peter dalgaard pda...@gmail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, May 7, 2012 11:07 PM Subject: Re: [R] low R square value from ANCOVA model On May 8, 2012, at 05:10 , array chip wrote: Hi, what does a low R-square value from an ANCOVA model mean? For example, if the R square from the model is about 0.2, does this mean the results should NOT be trusted? I checked the residuals of the model, it looked fine... It just means that your model has low predictive power (at the individual level). I.e. the noise (error) part of the model is large relative to the signal (systematic part). Statistical inferences are not compromised by that, except of course that large error variation is reflected in large standard errors of estimated regression coefficients. Thanks for any suggestion. John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] low R square value from ANCOVA model
Paul, thanks for your thoughts. blunt, not at all If I understand correctly, it doesn't help anything to speculate whether there might be additional variables existing or not. Given current variables in the model, it's perfectly fine to draw conclusions based on significant coefficients regardless of R-squared is high or low. Gary King's article is interesting... John From: Paul Johnson pauljoh...@gmail.com Cc: peter dalgaard pda...@gmail.com; r-help@r-project.org r-help@r-project.org Sent: Tuesday, May 8, 2012 8:23 PM Subject: Re: [R] low R square value from ANCOVA model Thanks again Peter. What about the argument that because low R square (e.g. R^2=0.2) indicated the model variance was not sufficiently explained by the factors in the model, there might be additional factors that should be identified and included in the model. And If these additional factors were indeed included, it might change the significance for the factor of interest that previously showed significant coefficient. In other word, if R square is low, the significant coefficient observed is not trustworthy. What's your opinion on this argument? I think that argument is silly. I'm sorry if that is too blunt. Its just plain superficial. It reflects a poor understanding of what the linear model is all about. If you have other variables that might belong in the model, run them and test. The R-square, either low or high, does not have anything direct to say about whether those other variables exist. Here's my authority. Arthur Goldberger (A Course in Econometrics, 1991, p.177) âNothing in the CR (Classical Regression) model requires that R2 be high. Hence, a high R2 is not evidence in favor of the model, and a low R2 is not evidence against it.â I found that reference in Anders Skrondal and Sophia Rabe-Hesketh, Generalized Latend Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models, Boca Raton, FL: Chapman and Hall/CRC, 2004. From Section 8.5.2: Furthermore, how badly the baseline model fits the data depends greatly on the magnitude of the parameters of the true model. For instance, consider estimating a simple parallel measurement model. If the true model is a congeneric measurement model (with considerable variation in factor loadings and measurement error variances between items), the fit index could be high simply because the null model fits very poorly, i.e. because the reliabilities of the items are high. However, if the true model is a parallel measurement model with low reliabilities the fit index could be low although we are estimating the correct model. Similarly, estimating a simple linear regression model can yield a high R2 if the relationship is actually quadratic with a considerable linear trend and a low R2 when the model is true but with a small slope (relative to the overall variance). For a detailed argument/explanation of the argument that the R-square is not a way to decide if a model is good or bad see King, Gary. (1986). How Not to Lie with Statistics: Avoiding Common Mistakes in Quantitative Political Science. American Journal of Political Science, 30(3), 666â687. doi:10.2307/2111095 pj -- Paul E. Johnson Professor, Political Science   Assoc. Director 1541 Lilac Lane, Room 504   Center for Research Methods University of Kansas        University of Kansas http://pj.freefaculty.org       http://quant.ku.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] low R square value from ANCOVA model
Hi, what does a low R-square value from an ANCOVA model mean? For example, if the R square from the model is about 0.2, does this mean the results should NOT be trusted? I checked the residuals of the model, it looked fine... Thanks for any suggestion. John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Difference in Kaplan-Meier estimates plus CI
Hi, I have the same question as Jason on how to estimate the standard error and construct CI around S_1(t) - S_2(t). From summary.survfit(obj), how can I combine the 2 survival estimates and the associated standard errors, to get an estimate of standard error for the difference / then calculate CI? For example, my summary(obj) gave me: treatment=0 time n.risk n.event survival std.err lower 95% CI upper 95% CI 10. 69. 28. 0.7313 0.0438 0.6504 0.8223 treatment=1 time n.risk n.event survival std.err lower 95% CI upper 95% CI 10. 86. 10. 0.9055 0.0285 0.8514 0.9631 The S_1(t=10) - S_2(t=1) = 0.9055-0.7313 = 0.1742. how to calculate the standard error this difference based on SD_1 = 0.0438 and SD_2 = 0.0285, and then the 95% CI for the difference? Thanks John From: Thomas Lumley tlum...@uw.edu To: Jason Connor jcon...@alumni.cmu.edu Cc: r-help@r-project.org Sent: Wednesday, March 7, 2012 10:58 AM Subject: Re: [R] Difference in Kaplan-Meier estimates plus CI On Thu, Mar 8, 2012 at 4:50 AM, Jason Connor jcon...@alumni.cmu.edu wrote: I thought this would be trivial, but I can't find a package or function that does this. I'm hoping someone can guide me to one. Imagine a simple case with two survival curves (e.g. treatment control). I just want to calculate the difference in KM estimates at a specific time point (e.g. 1 year) plus the estimate's 95% CI. The former is straightforward, but the estimates not so much. I know methods exist such as Parzen, Wei, and Ying, but was surprised not to find a package that included this. Before I code it up, I thought I'd ask if I was just missing it somewhere. summary.survfit() in the survival package will give you the point estimate and standard error, and then combining these into a difference and confidence interval for the difference is easy. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] statistical contrasts on 3-way interaction
Hi all, I was trying to use glht() from multcomp package to construct a contrast on interaction term in a linear model to do some comparisons. I am little uncertain on how to construct contrasts on a 3-way interaction containing a continuous variable, and hope someone can confirm what I did is correct or wrong: The linear model has a continuous dependent variable âyâ, with treatment factor âTrtâ with value 0 and 1, a factor variable âAâ with value 0 and 1, a continuous variable âxâ.  A simpler model is:  set.seed(10) dat - cbind(y=c(rnorm(10,3),rnorm(10,4),rnorm(10,3.1),rnorm(10,6)), x=runif(40,5,15), expand.grid(A=rep(factor(0:1),each=10),Trt=factor(0:1)))  fit - lm(y ~ x + Trt * A,dat)  My purpose is to test whether treatment effect is significant given each level of factor A, so I used contrasts:  library(multcomp)  K - rbind(c(0,0,1,0,0), c(0,0,1,0,1)) rownames(K) - c('Trt 1-0|A=0','Trt 1-0|A=1') colnames(K) - names(coef(fit)) K       (Intercept) x Trt1 A1 Trt1:A1 Trt 1-0|A=0      0 0   1  0    0 Trt 1-0|A=1      0 0   1  0    1  (glht.fit - summary(glht(fit, linfct = K), test=adjusted(type='none'))) Linear Hypotheses:                 Estimate Std. Error t value Pr(|t|)  Trt 1-0|A=0 == 0 -0.2720    0.3616 -0.752 0.45701  Trt 1-0|A=1 == 0  1.0690    0.3564  2.999 0.00496 **  Now I suspect independent variable âxâ may play a role in the treatment effect at each level of A, so I would like to add in a 3-way interaction between Trt, A and x:  fit - lm(y ~ x * Trt * A,dat)  If my purpose is to test whether treatment is significant at each level of factor A and certain value of covariate âxâ, for example, when x=10, would following code give me what I wanted?  K - rbind(c(0,0,1,0,10,0,0,0), c(0,0,1,0,10,0,1,10)) rownames(K) - c('Trt 1-0|A=0 x=10','Trt 1-0|A=1 x=10') colnames(K) - names(coef(fit)) K          (Intercept) x Trt1 A1 x:Trt1 x:A1 Trt1:A1 x:Trt1:A1 Trt 1-0|A=0 x=10      0 0   1  0   10   0    0     0 Trt 1-0|A=1 x=10      0 0   1  0   10   0    1     10  (glht.fit - summary(glht(fit, linfct = K), test=adjusted(type='none'))) Linear Hypotheses:                      Estimate Std. Error t value Pr(|t|)   Trt 1-0|A=0 x=10 == 0 -0.3526    0.3254 -1.083 0.286731   Trt 1-0|A=1 x=10 == 0  1.4621    0.3328  4.394 0.000115 ***  So the above test was testing whether treatment effect is significant at each level of factor A when x=10, am I correct? Appreciate if someone would confirm this? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] CV for log normal data
Hi, I have a microarray dataset from Agilent chips. The data were really log ratio between test samples and a universal reference RNA. Because of the nature of log ratios, coefficient of variation (CV) doesn't really apply to this kind of data due to the fact that mean of log ratio is very close to 0. What kind of measurements would people use to measure the dispersion so that I can compare across genes on the chip to find stably expressed genes? something similar to CV would be easily interpreted? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] CV for log normal data
Thank you Peter. John From: Peter Langfelder peter.langfel...@gmail.com To: Bert Gunter gunter.ber...@gene.com Sent: Tuesday, February 21, 2012 2:51 PM Subject: Re: [R] CV for log normal data Good advice. But perhaps ?mad or some other perhaps robust plain old measure of spread? The problem is not (lack of) robustness to outliers, the problem is to find genes whose expression variation is small compared to (mean) expression. Trouble is, Agilent throws the mean expression information away, so you have to find heuristic workarounds. I have encountered the same issue before and haven't really found a good solution. Peter [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] prediction for linear mixed model
Hi, I am wondering if we can make prediction on a linear mixed model by lmer() from lme4 package? Specifically I am fitting a very simple glmer() with binomial family distribution, and want to see if I can get the predicted probability like that in regular logistic regression? fit-glmer(y~x+(1|id),dat,family=binomial) where y is the response variable (0, 1), and x is a continuous variable. I would like to get a predicted probability at a given value of x. Many thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] standard error for lda()
Hi, didn't hear any response yet. want to give it another try.. appreciate any suggestions. John To: r-help@r-project.org r-help@r-project.org Sent: Wednesday, February 8, 2012 12:11 PM Subject: [R] standard error for lda() Hi, I am wondering if it is possible to get an estimate of standard error of the predicted posterior probability from LDA using lda() from MASS? Logistic regression using glm() would generate a standard error for predicted probability with se.fit=T argument in predict(), so would it make sense to get standard error for posterior probability from lda() and how? Another question about standard error estimate from glm(): is it ok to calculate 95% CI for the predicted probability using the standard error based on normal apprximation, i.e. predicted_probability +/- 1.96 * standard_error? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] standard error for lda()
David, thanks for your response, hope this stirs more... Ok, a simple code: y-as.factor(rnorm(100)0.5) x1-rnorm(100) x2-rnorm(100) obj-glm(y~x1+x2,family=binomial) predict(obj,type='response',se.fit=T) predict(obj,...) will give predicted probabilities in the fit element; and the associated estimated standard errors in the se.fit element (if I understand correctly). The predicted probability from logistic regression is ultimately a function of y and thus a standard error of it should be able to be computed. So one of my questions is whether we can use normal approximation to construct 95% CI for the predicted probabilities using standard errors, because I am not sure if probabilities would follow normal distribution? Now, if we try lda(): library(MASS) obj2-lda(y~x1+x2) predict(obj2) where predict(obj2) produces posterior probabilities, the predicted class, etc. My question is whether it's possible to produce standard errors for these posterior probabilities? Thanks again. John From: David Winsemius dwinsem...@comcast.net Cc: r-help@r-project.org r-help@r-project.org Sent: Thursday, February 9, 2012 2:59 PM Subject: Re: [R] standard error for lda() On Feb 9, 2012, at 4:45 PM, array chip wrote: Hi, didn't hear any response yet. want to give it another try.. appreciate any suggestions. My problem after reading this the first time was that I didn't agree with the premise that logistic regression would provide a standard error for a probability. It provides a standard error around an estimated coefficient value. And then you provided no further details or code to create a simulation, and there didn't seem much point in trying to teach you statistical terminology that you were throwning around in a manner that seems rather cavalier , admittedly this being a very particular reaction from this non-expert audience of one. John To: r-help@r-project.org r-help@r-project.org Sent: Wednesday, February 8, 2012 12:11 PM Subject: [R] standard error for lda() Hi, I am wondering if it is possible to get an estimate of standard error of the predicted posterior probability from LDA using lda() from MASS? Logistic regression using glm() would generate a standard error for predicted probability with se.fit=T argument in predict(), so would it make sense to get standard error for posterior probability from lda() and how? Another question about standard error estimate from glm(): is it ok to calculate 95% CI for the predicted probability using the standard error based on normal apprximation, i.e. predicted_probability +/- 1.96 * standard_error? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] standard error for lda()
Hi, I am wondering if it is possible to get an estimate of standard error of the predicted posterior probability from LDA using lda() from MASS? Logistic regression using glm() would generate a standard error for predicted probability with se.fit=T argument in predict(), so would it make sense to get standard error for posterior probability from lda() and how? Another question about standard error estimate from glm(): is it ok to calculate 95% CI for the predicted probability using the standard error based on normal apprximation, i.e. predicted_probability +/- 1.96 * standard_error? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] forest plots from rmeta package
Hi, I am trying forestplot() and metaplot() from rmeta package to plot some hazard ratios. Now foreatplot() can draw clipping with or at the ends of the confidence line using clip argument, but can't use multiple colors for different lines. On the other hand, metaplot() can use different colors for each line with colors=meta.colors( lines=c(blue,black)) arugument, but don't draw clipping symbol or at the end of the confidence line (it doesn't take clip argument). So is there any other package/function that allows me to do both colors and clipping? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] why NA coefficients
Sure it does, but still struggling with what is going on... Thanks John From:David Winsemius dwinsem...@comcast.net To:David Winsemius dwinsem...@comcast.net oject.org Sent:Monday, November 7, 2011 10:27 PM Subject:Re: [R] why NA coefficients But this output suggests there may be alligators in the swamp: predict(lmod, newdata=data.frame(treat=1, group=2)) 1 0.09133691 Warning message: In predict.lm(lmod, newdata = data.frame(treat = 1, group = 2)) : prediction from a rank-deficient fit may be misleading --David. --David. Thanks John From: David Winsemius dwinsem...@comcast.net Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, November 7, 2011 5:13 PM Subject: Re: [R] why NA coefficients On Nov 7, 2011, at 7:33 PM, array chip wrote: Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat has 7 levels, group has 2 levels). I found the coefficient for the last interaction term is always 0, see attached dataset and the code below: test-read.table(test.txt,sep='\t',header=T,row.names=NULL) lm(y~factor(treat)*factor(group),test) Call: lm(formula = y ~ factor(treat) * factor(group), data = test) Coefficients: (Intercept) factor(treat)2 factor(treat)3 0.429244 0.499982 0.352971 factor(treat)4 factor(treat)5 factor(treat)6 -0.204752 0.142042 0.044155 factor(treat)7 factor(group)2 factor(treat)2:factor(group)2 -0.007775 -0.337907 -0.208734 factor(treat)3:factor(group)2 factor(treat)4:factor(group)2 factor(treat)5:factor(group)2 -0.195138 0.800029 0.227514 factor(treat)6:factor(group)2 factor(treat)7:factor(group)2 0.331548 NA I guess this is due to model matrix being singular or collinearity among the matrix columns? But I can't figure out how the matrix is singular in this case? Can someone show me why this is the case? Because you have no cases in one of the crossed categories. --David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT __ R-help@r-project.orgmailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] why NA coefficients
true, why it has to omit treat 7-group 2 Thanks again From: David Winsemius dwinsem...@comcast.net Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, November 7, 2011 10:19 PM Subject: Re: [R] why NA coefficients On Nov 7, 2011, at 10:07 PM, array chip wrote: Thanks David. The only category that has no cases is treat 1-group 2: with(test,table(treat,group)) group treat 1 2 1 8 0 2 1 5 3 5 5 4 7 3 5 7 4 6 3 3 7 8 2 But why the coefficient for treat 7-group 2 is not estimable? Well, it had to omit one of them didn't it? (But I don't know why that level was chosen.) --David. Thanks John From: David Winsemius dwinsem...@comcast.net Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, November 7, 2011 5:13 PM Subject: Re: [R] why NA coefficients On Nov 7, 2011, at 7:33 PM, array chip wrote: Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat has 7 levels, group has 2 levels). I found the coefficient for the last interaction term is always 0, see attached dataset and the code below: test-read.table(test.txt,sep='\t',header=T,row.names=NULL) lm(y~factor(treat)*factor(group),test) Call: lm(formula = y ~ factor(treat) * factor(group), data = test) Coefficients: (Intercept) factor(treat)2 factor(treat)3 0.429244 0.499982 0.352971 factor(treat)4 factor(treat)5 factor(treat)6 -0.204752 0.142042 0.044155 factor(treat)7 factor(group)2 factor(treat)2:factor(group)2 -0.007775 -0.337907 -0.208734 factor(treat)3:factor(group)2 factor(treat)4:factor(group)2 factor(treat)5:factor(group)2 -0.195138 0.800029 0.227514 factor(treat)6:factor(group)2 factor(treat)7:factor(group)2 0.331548 NA I guess this is due to model matrix being singular or collinearity among the matrix columns? But I can't figure out how the matrix is singular in this case? Can someone show me why this is the case? Because you have no cases in one of the crossed categories. --David Winsemius, MD West Hartford, CT David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] why NA coefficients
: Monday, November 7, 2011 9:29 PM Subject: Re: [R] why NA coefficients Hi John: What is the estimate of the cell mean \mu_{12}? Which model effects involve that cell mean? With this data arrangement, the expected population marginal means of treatment 1 and group 2 are not estimable either, unless you're willing to assume a no-interaction model. Chapters 13 and 14 of Milliken and Johnson's Analysis of Messy Data (vol. 1) cover this topic in some detail, but it assumes you're familiar with the matrix form of a linear statistical model. Both chapters cover the two-way model with interaction - Ch.13 from the cell means model approach and Ch. 14 from the model effects approach. Because this was written in the mid 80s and republished in the early 90s, all the code used is in SAS. HTH, Dennis Thanks David. The only category that has no cases is treat 1-group 2: with(test,table(treat,group)) group treat 1 2 1 8 0 2 1 5 3 5 5 4 7 3 5 7 4 6 3 3 7 8 2 But why the coefficient for treat 7-group 2 is not estimable? Thanks John From: David Winsemius dwinsem...@comcast.net Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, November 7, 2011 5:13 PM Subject: Re: [R] why NA coefficients On Nov 7, 2011, at 7:33 PM, array chip wrote: Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat has 7 levels, group has 2 levels). I found the coefficient for the last interaction term is always 0, see attached dataset and the code below: test-read.table(test.txt,sep='\t',header=T,row.names=NULL) lm(y~factor(treat)*factor(group),test) Call: lm(formula = y ~ factor(treat) * factor(group), data = test) Coefficients: (Intercept) factor(treat)2 factor(treat)3 0.429244 0.499982 0.352971 factor(treat)4 factor(treat)5 factor(treat)6 -0.204752 0.142042 0.044155 factor(treat)7 factor(group)2 factor(treat)2:factor(group)2 -0.007775 -0.337907 -0.208734 factor(treat)3:factor(group)2 factor(treat)4:factor(group)2 factor(treat)5:factor(group)2 -0.195138 0.800029 0.227514 factor(treat)6:factor(group)2 factor(treat)7:factor(group)2 0.331548 NA I guess this is due to model matrix being singular or collinearity among the matrix columns? But I can't figure out how the matrix is singular in this case? Can someone show me why this is the case? Because you have no cases in one of the crossed categories. --David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] why NA coefficients
Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat has 7 levels, group has 2 levels). I found the coefficient for the last interaction term is always 0, see attached dataset and the code below: test-read.table(test.txt,sep='\t',header=T,row.names=NULL) lm(y~factor(treat)*factor(group),test) Call: lm(formula = y ~ factor(treat) * factor(group), data = test) Coefficients: (Intercept) factor(treat)2 factor(treat)3 0.429244 0.499982 0.352971 factor(treat)4 factor(treat)5 factor(treat)6 -0.204752 0.142042 0.044155 factor(treat)7 factor(group)2 factor(treat)2:factor(group)2 -0.007775 -0.337907 -0.208734 factor(treat)3:factor(group)2 factor(treat)4:factor(group)2 factor(treat)5:factor(group)2 -0.195138 0.800029 0.227514 factor(treat)6:factor(group)2 factor(treat)7:factor(group)2 0.331548 NA I guess this is due to model matrix being singular or collinearity among the matrix columns? But I can't figure out how the matrix is singular in this case? Can someone show me why this is the case? Thanks Johntreat group y 3 1 0.659156868932769 2 2 0.531460140598938 5 2 0.111702043330297 3 2 0.531970057170838 3 2 0.190075619611889 6 2 0.0234142139088362 2 1 0.929225602652878 1 1 0.165439655771479 1 1 0.245642073685303 4 2 0.973135874373838 2 2 0.40159740881063 2 2 0.230675351573154 4 1 0.15201729699038 4 2 0.770770706702024 3 2 0.00170516152866185 7 1 0.475904347142205 7 1 0.477398047922179 2 2 0.57676896895282 2 2 0.17242363281548 1 1 0.263812917750329 1 1 0.332802928285673 7 1 0.999866250436753 7 2 0.0795505701098591 5 2 0.723311265930533 4 1 0.301724245771766 4 1 0.174829749623314 4 1 0.161948098335415 4 1 0.0022172573953867 3 1 0.815872102044523 3 1 0.953563597053289 7 1 0.687810138566419 7 1 0.305082862731069 1 1 0.742835792247206 1 1 0.643639815272763 4 1 0.751568211475387 6 1 0.844687921227887 4 2 0.315934179816395 5 1 0.884743785485625 5 1 0.316437228582799 3 2 0.00965709099546075 3 2 0.512446181615815 5 1 0.765263902954757 5 1 0.754491505213082 4 1 0.0271341009065509 7 1 0.242603822145611 7 1 0.12916071084328 6 1 0.531941789202392 6 1 0.043566432675 5 1 0.380349192768335 5 1 0.457562071271241 6 2 0.539795646909624 6 2 0.837911191163585 7 1 0.0539217721670866 1 1 0.910242940066382 1 1 0.129532726714388 5 1 0.440152890980244 3 1 0.945850990246981 3 1 0.53663158044219 5 2 0.866308335913345 5 2 0.142251220298931 7 2 0.0875730190891773 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] why NA coefficients
Thanks David. The only category that has no cases is treat 1-group 2: with(test,table(treat,group)) group treat 1 2 1 8 0 2 1 5 3 5 5 4 7 3 5 7 4 6 3 3 7 8 2 But why the coefficient for treat 7-group 2 is not estimable? Thanks John From: David Winsemius dwinsem...@comcast.net Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, November 7, 2011 5:13 PM Subject: Re: [R] why NA coefficients On Nov 7, 2011, at 7:33 PM, array chip wrote: Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat has 7 levels, group has 2 levels). I found the coefficient for the last interaction term is always 0, see attached dataset and the code below: test-read.table(test.txt,sep='\t',header=T,row.names=NULL) lm(y~factor(treat)*factor(group),test) Call: lm(formula = y ~ factor(treat) * factor(group), data = test) Coefficients: (Intercept) factor(treat)2 factor(treat)3 0.429244 0.499982 0.352971 factor(treat)4 factor(treat)5 factor(treat)6 -0.204752 0.142042 0.044155 factor(treat)7 factor(group)2 factor(treat)2:factor(group)2 -0.007775 -0.337907 -0.208734 factor(treat)3:factor(group)2 factor(treat)4:factor(group)2 factor(treat)5:factor(group)2 -0.195138 0.800029 0.227514 factor(treat)6:factor(group)2 factor(treat)7:factor(group)2 0.331548 NA I guess this is due to model matrix being singular or collinearity among the matrix columns? But I can't figure out how the matrix is singular in this case? Can someone show me why this is the case? Because you have no cases in one of the crossed categories. --David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] why NA coefficients
Hi Dennis, The cell mean mu_12 from the model involves the intercept and factor 2: Coefficients: (Intercept) factor(treat)2 factor(treat)3 0.429244 0.499982 0.352971 factor(treat)4 factor(treat)5 factor(treat)6 -0.204752 0.142042 0.044155 factor(treat)7 factor(group)2 factor(treat)2:factor(group)2 -0.007775 -0.337907 -0.208734 factor(treat)3:factor(group)2 factor(treat)4:factor(group)2 factor(treat)5:factor(group)2 -0.195138 0.800029 0.227514 factor(treat)6:factor(group)2 factor(treat)7:factor(group)2 0.331548 NA So mu_12 = 0.429244-0.337907 = 0.091337. This can be verified by: predict(fit,data.frame(list(treat=1,group=2))) 1 0.09133691 Warning message: In predict.lm(fit, data.frame(list(treat = 1, group = 2))) : prediction from a rank-deficient fit may be misleading But as you can see, it gave a warning about rank-deficient fit... why this is a rank-deficient fit? Because treat 1_group 2 has no cases, so why it is still estimable while on the contrary, treat 7_group 2 which has 2 cases is not? Thanks John From: Dennis Murphy djmu...@gmail.com Sent: Monday, November 7, 2011 9:29 PM Subject: Re: [R] why NA coefficients Hi John: What is the estimate of the cell mean \mu_{12}? Which model effects involve that cell mean? With this data arrangement, the expected population marginal means of treatment 1 and group 2 are not estimable either, unless you're willing to assume a no-interaction model. Chapters 13 and 14 of Milliken and Johnson's Analysis of Messy Data (vol. 1) cover this topic in some detail, but it assumes you're familiar with the matrix form of a linear statistical model. Both chapters cover the two-way model with interaction - Ch.13 from the cell means model approach and Ch. 14 from the model effects approach. Because this was written in the mid 80s and republished in the early 90s, all the code used is in SAS. HTH, Dennis Thanks David. The only category that has no cases is treat 1-group 2: with(test,table(treat,group)) group treat 1 2 1 8 0 2 1 5 3 5 5 4 7 3 5 7 4 6 3 3 7 8 2 But why the coefficient for treat 7-group 2 is not estimable? Thanks John From: David Winsemius dwinsem...@comcast.net Cc: r-help@r-project.org r-help@r-project.org Sent: Monday, November 7, 2011 5:13 PM Subject: Re: [R] why NA coefficients On Nov 7, 2011, at 7:33 PM, array chip wrote: Hi, I am trying to run ANOVA with an interaction term on 2 factors (treat has 7 levels, group has 2 levels). I found the coefficient for the last interaction term is always 0, see attached dataset and the code below: test-read.table(test.txt,sep='\t',header=T,row.names=NULL) lm(y~factor(treat)*factor(group),test) Call: lm(formula = y ~ factor(treat) * factor(group), data = test) Coefficients: (Intercept) factor(treat)2 factor(treat)3 0.429244 0.499982 0.352971 factor(treat)4 factor(treat)5 factor(treat)6 -0.204752 0.142042 0.044155 factor(treat)7 factor(group)2 factor(treat)2:factor(group)2 -0.007775 -0.337907 -0.208734 factor(treat)3:factor(group)2 factor(treat)4:factor(group)2 factor(treat)5:factor(group)2 -0.195138 0.800029 0.227514 factor(treat)6:factor(group)2 factor(treat)7:factor(group)2 0.331548 NA I guess this is due to model matrix being singular or collinearity among the matrix columns? But I can't figure out how the matrix is singular in this case? Can someone show me why this is the case? Because you have no cases in one of the crossed categories. --David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https
[R] compare proportions
Hi all, I asked this yesterday, but hadn't got any response yet. Understand this is not pure R technical question, but more of statistical. With many statistical experts in the list, I would appreciate any suggestions... Many thanks John -- Hi, I have a seemingly simple proportional test. here is the question I am trying to answer: There is a test running each day in the lab, the test comes out as either positive or negative. So at the end of each month, we can calculate a positive rate in that month as the proportion of positive test results. The data look like: Month # positive # total tests positive rate January 24 205 11.7% February 31 234 13.2% March 26 227 11.5% : : : August 42 241 17.4% The total # of positive before August is 182, and the total # of tests before August is 1526. It appears that from January to July, the positive monthly rate is between 11% to 13%, the rate in August is up around 17%. So the question is whether is up in August is statistically significant? I can think of 3 ways to do this test: 1. Use binom.test(), set “p” as the average positive rate between January and July (=182/1526): binom.test(42,241,182/1526) Exact binomial test data: 42 and 241 number of successes = 42, number of trials = 241, p-value = 0.0125 alternative hypothesis: true probability of success is not equal to 0.1192661 95 percent confidence interval: 0.1285821 0.2281769 sample estimates: probability of success 0.1742739 2. Use prop.test(), where I compare the average positive rate between January July with the positive rate in August: prop.test(c(182,42),c(1526,241)) 2-sample test for equality of proportions with continuity correction data: c(182, 42) out of c(1526, 241) X-squared = 5.203, df = 1, p-value = 0.02255 alternative hypothesis: two.sided 95 percent confidence interval: -0.107988625 -0.002026982 sample estimates: prop 1 prop 2 0.1192661 0.1742739 3. Use prop.test(), where I compare the average MONTHLY positive rate between January July with the positive rate in August. The average monthly # of positives is 182/7=26, the average monthly $ of total tests is 1526/7=218: prop.test(c(26,42),c(218,241)) 2-sample test for equality of proportions with continuity correction data: c(26, 42) out of c(218, 241) X-squared = 2.3258, df = 1, p-value = 0.1272 alternative hypothesis: two.sided 95 percent confidence interval: -0.12375569 0.01374008 sample estimates: prop 1 prop 2 0.1192661 0.1742739 As you can see that the method 3 gave insignificant p value compared to method 1 2. While I understand each method is testing different hypothesis, but for the question I am trying to answer (does August have higher positive rate compare to earlier months?), which method is more relevant? Or I should consider some regression techniques, then what type of regression is appropriate? Thanks for any suggestions, John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] compare proportions
Hi, I have a seemingly simple proportional test. here is the question I am trying to answer: There is a test running each day in the lab, the test comes out as either positive or negative. So at the end of each month, we can calculate a positive rate in that month as the proportion of positive test results. The data look like: Month # positive # total tests positive rate January 24 205 11.7% February 31 234 13.2% March 26 227 11.5% : : : August 42 241 17.4% The total # of positive before August is 182, and the total # of tests before August is 1526. It appears that from January to July, the positive rate is between 11% to 13%, the rate in August is up around 17%. So the question is whether is up in August is statistically significant? I can think of 3 ways to do this test: 1.1. Use binom.test(), set “p” as the average positive rate between January and July (=182/1526): binom.test(42,241,182/1526) Exact binomial test data: 42 and 241 number of successes = 42, number of trials = 241, p-value = 0.0125 alternative hypothesis: true probability of success is not equal to 0.1192661 95 percent confidence interval: 0.1285821 0.2281769 sample estimates: probability of success 0.1742739 2. 2. Use prop.test(), where I compare the average positive rate between January July with the positive rate in August: prop.test(c(182,42),c(1526,241)) 2-sample test for equality of proportions with continuity correction data: c(182, 42) out of c(1526, 241) X-squared = 5.203, df = 1, p-value = 0.02255 alternative hypothesis: two.sided 95 percent confidence interval: -0.107988625 -0.002026982 sample estimates: prop 1 prop 2 0.1192661 0.1742739 3. 2. 3. Use prop.test(), where I compare the average monthly positive rate between January July with the positive rate in August. The average monthly # of positives is 182/7=26, the average monthly $ of total tests is 1526/7=216: prop.test(c(26,42),c(218,241)) 2-sample test for equality of proportions with continuity correction data: c(26, 42) out of c(218, 241) X-squared = 2.3258, df = 1, p-value = 0.1272 alternative hypothesis: two.sided 95 percent confidence interval: -0.12375569 0.01374008 sample estimates: prop 1 prop 2 0.1192661 0.1742739 As you can see that the method 3 gave insignificant p value compared to method 1 2. While I understand each method is testing different hypothesis, but for the question I am trying to answer (does August have higher positive rate compare to earlier months?), which method is more relevant? Thanks for any suggestion, John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] closed testing procedure
Hi, are the methods for multiple testing p value adjustment (Shaffer, Westfall, free) implemented in the function adjusted() in multcomp package so called closed testing procedure? what about those methods (holm, hochberg, hommel, BH, BY) implemented in the p.adjust() in the stats package? Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] suggestion for proportions
Thanks all again for the suggestions. I agree with Wolfgang that mcnemar.test() is what I am looking for. The accuracy is the proportion of correct diagnosis compared to a gold standard, and I am interested in which diagnosis test is better, not particular interested in assessing the agreement between the 2 tests. Thanks again. John - Original Message - From: Viechtbauer Wolfgang (STAT) wolfgang.viechtba...@maastrichtuniversity.nl To: r-help@r-project.org r-help@r-project.org Cc: csrabak cesar.ra...@gmail.com; array chip arrayprof...@yahoo.com Sent: Thursday, September 8, 2011 1:24 AM Subject: RE: [R] suggestion for proportions I assume you mean Cohen's kappa. This is not what the OP is asking about. The OP wants to know how to test for a difference in the proportions of 1's. Cohen's kappa will tell you what the level of agreement is between the two tests. This is something different. Also, the OP has now clarified that the data are paired. Therefore, prop.test() and binom.test() are not appropriate. So, to answer the OPs question: yes, mcnemar.test() is what you should be using. Wolfgang -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of csrabak Sent: Thursday, September 08, 2011 02:31 To: array chip Cc: r-help@r-project.org Subject: Re: [R] suggestion for proportions Em 7/9/2011 16:53, array chip escreveu: Hi all, thanks very much for sharing your thoughts. and sorry for my describing the problem not clearly, my fault. My data is paired, that is 2 different diagnostic tests were performed on the same individuals. Each individual will have a test results from each of the 2 tests. Then in the end, 2 accuracy rates were calculated for the 2 tests. And I want to test if there is a significant difference in the accuracy (proportion) between the 2 tests. My understanding is that prop.test() is appropriate for 2 independent proportions, whereas in my situation, the 2 proportions are not independent calculated from paired data, right? the data would look like: pid test1 test2 p1 1 0 p2 1 1 p3 0 1 : : 1=test is correct; 0=not correct from the data above, we can calculate accuracy for test1 and test2, then to compare So mcnemar.test() is good for that, right? Thanks John, From above clarifying I suggest you consider the use of kappa test. For a list of possible ways of doing it in R try: RSiteSearch(kappa,restrict=functions) HTH -- Cesar Rabak __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] suggestion for proportions
Hi, I am wondering if anyone can suggest how to test the equality of 2 proportions. The caveat here is that the 2 proportions were calculated from the same number of samples using 2 different tests. So essentially we are comparing 2 accuracy rates from same, say 100, samples. I think this is like a paired test, but don't know if really we need to consider the paired nature of the data, and if yes then how? Or just use prop.test() to compare 2 proportions? Any suggestion would be greatly appreciated. Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] suggestion for proportions
Hi all, thanks very much for sharing your thoughts. and sorry for my describing the problem not clearly, my fault. My data is paired, that is 2 different diagnostic tests were performed on the same individuals. Each individual will have a test results from each of the 2 tests. Then in the end, 2 accuracy rates were calculated for the 2 tests. And I want to test if there is a significant difference in the accuracy (proportion) between the 2 tests. My understanding is that prop.test() is appropriate for 2 independent proportions, whereas in my situation, the 2 proportions are not independent calculated from paired data, right? the data would look like: pid test1 test2 p1 1 0 p2 1 1 p3 0 1 : : 1=test is correct; 0=not correct from the data above, we can calculate accuracy for test1 and test2, then to compare So mcnemar.test() is good for that, right? Thanks John - Original Message - From: Viechtbauer Wolfgang (STAT) wolfgang.viechtba...@maastrichtuniversity.nl To: r-help@r-project.org r-help@r-project.org Cc: Bert Gunter gunter.ber...@gene.com Sent: Wednesday, September 7, 2011 8:14 AM Subject: Re: [R] suggestion for proportions Indeed, the original post leaves some room for interpretation. In any case, I hope the OP has enough information now to figure out what approach is best for his data. Best, Wolfgang -Original Message- From: Bert Gunter [mailto:gunter.ber...@gene.com] Sent: Wednesday, September 07, 2011 16:47 To: Viechtbauer Wolfgang (STAT) Cc: r-help@r-project.org; John Sorkin Subject: Re: [R] suggestion for proportions Wolfgang: On Wed, Sep 7, 2011 at 7:28 AM, Viechtbauer Wolfgang (STAT) wolfgang.viechtba...@maastrichtuniversity.nl wrote: Acutally, ?mcnemar.test since it is paired data. Actually, it is unclear to me from the OP's message whether this is the case. In one sentence the OP says that the _number_ of samples is the same, and in the next he says that essentially the samples are the same. So, as usual, imprecision in the problem description leads to imprecision in the solution. But your point is well taken, of course. -- Bert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] weird apply() behavior
Hi, I had a weird results from using apply(). Here is an simple example: y-data.frame(list(a=c(1,NA),b=c('2k','0'))) y a b 1 1 2k 2 NA 0 apply(y,1,function(x){x-unlist(x); if (!is.na(x[2]) x[2]=='2k' !is.na(x[1]) x[1]=='1') 1 else 0} ) This should print 1 0 as output, as demonstrated by: apply(y[1,],1,function(x){x-unlist(x); if (!is.na(x[2]) x[2]=='2k' !is.na(x[1]) x[1]=='1') 1 else 0} ) 1 1 apply(y[2,],1,function(x){x-unlist(x); if (!is.na(x[2]) x[2]=='2k' !is.na(x[1]) x[1]=='1') 1 else 0} ) 2 0 But it actually prints: apply(y,1,function(x){x-unlist(x); if (!is.na(x[2]) x[2]=='2k' !is.na(x[1]) x[1]=='1') 1 else 0} ) [1] 0 0 Anyone has any suggestion? Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] weird apply() behavior
Thanks Bill and David! John - Original Message - From: William Dunlap wdun...@tibco.com To: array chip arrayprof...@yahoo.com; r-help@r-project.org r-help@r-project.org Cc: Sent: Monday, August 29, 2011 5:21 PM Subject: RE: [R] weird apply() behavior apply() should come with a big warning that it was written for matrices and can cause a lot of surprises with data.frames. apply(X, ...) converts X to a matrix and that is the kernel of your problem: if any columns of X are not numeric then as.matrix(X) is a character matrix whose columns are made with calls to format(), which means that all entries in the column have the same number of characters in them. This is handy for printing but not for other purposes. as.matrix(y) a b [1,] 1 2k [2,] NA 0 as.matrix(y[1,]) a b 1 1 2k as.matrix(y[2,]) a b 2 NA 0 Another way to debug this on your own is to simplify the function you are giving to apply. Then you would see where things are going wrong (but perhaps not why): apply(y, 1, function(x)unlist(x)) [,1] [,2] a 1 NA b 2k 0 apply(y[1,], 1, function(x)unlist(x)) 1 a 1 b 2k apply(y[2,], 1, function(x)unlist(x)) 2 a NA b 0 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of array chip Sent: Monday, August 29, 2011 5:03 PM To: r-help@r-project.org Subject: [R] weird apply() behavior Hi, I had a weird results from using apply(). Here is an simple example: y-data.frame(list(a=c(1,NA),b=c('2k','0'))) y a b 1 1 2k 2 NA 0 apply(y,1,function(x){x-unlist(x); if (!is.na(x[2]) x[2]=='2k' !is.na(x[1]) x[1]=='1') 1 else 0} ) This should print 1 0 as output, as demonstrated by: apply(y[1,],1,function(x){x-unlist(x); if (!is.na(x[2]) x[2]=='2k' !is.na(x[1]) x[1]=='1') 1 else 0} ) 1 1 apply(y[2,],1,function(x){x-unlist(x); if (!is.na(x[2]) x[2]=='2k' !is.na(x[1]) x[1]=='1') 1 else 0} ) 2 0 But it actually prints: apply(y,1,function(x){x-unlist(x); if (!is.na(x[2]) x[2]=='2k' !is.na(x[1]) x[1]=='1') 1 else 0} ) [1] 0 0 Anyone has any suggestion? Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survplot() for cph(): Design vs rms
Thanks Frank! - Original Message - From: Frank Harrell f.harr...@vanderbilt.edu To: r-help@r-project.org Cc: Sent: Thursday, August 25, 2011 4:33 PM Subject: Re: [R] survplot() for cph(): Design vs rms http://biostat.mc.vanderbilt.edu/Rrms shows differences between Design and rms Frank array chip wrote: Thank you David. The plot from Design package will draw a plot of survival probability at 5 years (in my example) versus different age. I will look into Predict() in rms package to see how this can be done. John - Original Message - From: David Winsemius lt;dwinsem...@comcast.netgt; To: array chip lt;arrayprof...@yahoo.comgt; Cc: r-h...@stat.math.ethz.ch lt;r-h...@stat.math.ethz.chgt; Sent: Thursday, August 25, 2011 3:15 PM Subject: Re: [R] survplot() for cph(): Design vs rms On Aug 25, 2011, at 5:11 PM, array chip wrote: Hi, in Design package, a plot of survival probability vs. a covariate can be generated by survplot() on a cph object using the folliowing code: n - 1000 set.seed(731) age - 50 + 12*rnorm(n) label(age) - Age sex - factor(sample(c('male','female'), n, TRUE)) cens - 15*runif(n) h - .02*exp(.04*(age-50)+.8*(sex=='Female')) dt - -log(runif(n))/h label(dt) - 'Follow-up Time' e - ifelse(dt = cens,1,0) dt - pmin(dt, cens) units(dt) - Year dd - datadist(age, sex) options(datadist='dd') S - Surv(dt,e) library(Design) f - cph(S ~ age, surv=TRUE,x=T,y=T) plot(f,age=NA,time=5) But the same code won't work if I used rms package: detach(package:Design) library(rms) f - cph(S ~ age, surv=TRUE,x=T,y=T) plot(f,age=NA,time=5) Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ Is there a way to plot the same graph using rms package. I don't remember what that would have done in Design and you have not explained what you expected. You should read the rms help page for cph and walk through the examples where ht euse of the Predict function is illustrated. The plotting support for Predict objects is excellent. I like to use Frank Harrell's new package rms and try to avoid using old Design package. Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/survplot-for-cph-Design-vs-rms-tp3769430p3769683.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] baseline cumulative hazard by basehaz()
Hi, basehaz() in survival package is said to estimate the baseline cumulative hazard from coxph(), but it actually calculate cumulative hazard by -log(survfit(coxph.object)). But survfit() calculate survival based on MEAN covariate value, not covariate value of 0. I thought baseline cumulative hazard means when covariate value is at 0. Did I misunderstand something? Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survplot() for cph(): Design vs rms
Hi, in Design package, a plot of survival probability vs. a covariate can be generated by survplot() on a cph object using the folliowing code: n - 1000 set.seed(731) age - 50 + 12*rnorm(n) label(age) - Age sex - factor(sample(c('male','female'), n, TRUE)) cens - 15*runif(n) h - .02*exp(.04*(age-50)+.8*(sex=='Female')) dt - -log(runif(n))/h label(dt) - 'Follow-up Time' e - ifelse(dt = cens,1,0) dt - pmin(dt, cens) units(dt) - Year dd - datadist(age, sex) options(datadist='dd') S - Surv(dt,e) library(Design) f - cph(S ~ age, surv=TRUE,x=T,y=T) plot(f,age=NA,time=5) But the same code won't work if I used rms package: detach(package:Design) library(rms) f - cph(S ~ age, surv=TRUE,x=T,y=T) plot(f,age=NA,time=5) Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ Is there a way to plot the same graph using rms package. I like to use Frank Harrell's new package rms and try to avoid using old Design package. Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survplot() for cph(): Design vs rms
Thank you David. The plot from Design package will draw a plot of survival probability at 5 years (in my example) versus different age. I will look into Predict() in rms package to see how this can be done. John - Original Message - From: David Winsemius dwinsem...@comcast.net To: array chip arrayprof...@yahoo.com Cc: r-h...@stat.math.ethz.ch r-h...@stat.math.ethz.ch Sent: Thursday, August 25, 2011 3:15 PM Subject: Re: [R] survplot() for cph(): Design vs rms On Aug 25, 2011, at 5:11 PM, array chip wrote: Hi, in Design package, a plot of survival probability vs. a covariate can be generated by survplot() on a cph object using the folliowing code: n - 1000 set.seed(731) age - 50 + 12*rnorm(n) label(age) - Age sex - factor(sample(c('male','female'), n, TRUE)) cens - 15*runif(n) h - .02*exp(.04*(age-50)+.8*(sex=='Female')) dt - -log(runif(n))/h label(dt) - 'Follow-up Time' e - ifelse(dt = cens,1,0) dt - pmin(dt, cens) units(dt) - Year dd - datadist(age, sex) options(datadist='dd') S - Surv(dt,e) library(Design) f - cph(S ~ age, surv=TRUE,x=T,y=T) plot(f,age=NA,time=5) But the same code won't work if I used rms package: detach(package:Design) library(rms) f - cph(S ~ age, surv=TRUE,x=T,y=T) plot(f,age=NA,time=5) Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ Is there a way to plot the same graph using rms package. I don't remember what that would have done in Design and you have not explained what you expected. You should read the rms help page for cph and walk through the examples where ht euse of the Predict function is illustrated. The plotting support for Predict objects is excellent. I like to use Frank Harrell's new package rms and try to avoid using old Design package. Thanks John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Labelling all variables at once (using Hmisc label)
Hi Frank, it's true to one of your reply to my previous post, can only be seen in Nabble. - Original Message - From: David Winsemius dwinsem...@comcast.net To: Frank Harrell f.harr...@vanderbilt.edu Cc: r-help@r-project.org Sent: Wednesday, August 17, 2011 3:08 PM Subject: Re: [R] Labelling all variables at once (using Hmisc label) On Aug 17, 2011, at 5:49 PM, Frank Harrell wrote: I'm puzzled. I provided a solution that did not require looping. Frank Hi Frank; Do you realize that some portions of your Nabble postings are not being communicated to the ordinary mail-clients? This code did not appear in my copy from Nabble. I didn't cut anything. This appears in Nabble when you look there: d - data.frame(a=1:2,b=3:4) label(d, self=FALSE) - c('A','B') contents(d) Data frame:d 2 observations and 2 variables Maximum # NAs:0 Labels Storage a A integer b B integer I observed and noted that was happening in a prior message. The missing formatted Nabble studd also comes across without linefeeds when pasted, so I added some of those back in. --David. Monsieur Do wrote: I did read the help page before posting, but didn't find the direct way... My function here works fine. But just for learning purposes, I'd like to be able to avoid the loop... with.labels - function(x, labels=NULL, csvfile=NULL) { if(!is.null(csvfile)) labels - read.csv(csvfile, sep=\t, header=F, stringsAsFactors=F)[,1] for(i in 1:length(x)) label(x[,i]) - labels[i] if(length(labels) != length(x)) cat(Warning: data and labels are not of same length\n) return(x) } Thanks Message: 11 Date: Tue, 16 Aug 2011 04:22:07 -0700 (PDT) From: Frank Harrell lt;f.harr...@vanderbilt.edugt; To: r-help@r-project.org Subject: Re: [R] Labelling all variables at once (using Hmisc label) Message-ID: 1313493727519-3746928.p...@n4.nabble.com Content-Type: text/plain; charset=UTF-8 Do require(Hmisc); ?label to see the help file for label. It will show you how to do this: Monsieur Do wrote: I have a dataset and a list of labels. I simply want to apply the labels to the variables, all at once. The only way I was able to do it was using a loop: for (i in 1:length(data)) label(data[,i]) -data.labels[i] I'd like to find the non-loop way to do it, using apply or the like... Any help appreciated. - Frank Harrell Department of Biostatistics, Vanderbilt University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Labelling-all-variables-at-once-using-Hmisc-label-tp3745660p3751273.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calibration curve for cph()
Dear Frank, Thanks for suggesting val.surv() function from rms package. It's exactly what I need. I tried the example on the help page and tweak towards to my situation, but got an error message. Could you suggestion what went wrong? library(rms) set.seed(123) # so can reproduce results n - 1000 age - 50 + 12*rnorm(n) sex - factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4))) cens - 15*runif(n) h - .02*exp(.04*(age-50)+.8*(sex=='Female')) t - -log(runif(n))/h units(t) - 'Year' label(t) - 'Time to Event' ev - ifelse(t = cens, 1, 0) t - pmin(t, cens) S - Surv(t, ev) f - cph(S ~ age + sex, x=TRUE, y=TRUE) val.surv(f, newdata=data.frame(age,sex), S=S, u=5) ## pretend the same training data as the independent dataset Error in val.surv(f, newdata = data.frame(age, sex), S = S, u = 5) : unused argument(s) (u = 5) Many thanks John - Original Message - From: Frank Harrell f.harr...@vanderbilt.edu To: r-help@r-project.org Cc: Sent: Tuesday, August 16, 2011 4:23 AM Subject: Re: [R] calibration curve for cph() David Winsemius wrote: A combination of Predict (your newdata), cut2, and the plotting function of your choice ought to suffice. But thought that cross-validation was an option. Not at console at the moment (just off airplane.) Sent from my iPhone On Aug 15, 2011, at 5:26 PM, array chip lt;arrayprof...@yahoo.comgt; wrote: is there a R function that produces calibration curve on an independetn data automatically, just like what calibrate() does on the training data itself? Thanks John From: Comcast lt;dwinsem...@comcast.netgt; To: array chip lt;arrayprof...@yahoo.comgt; Cc: r-help@r-project.org lt;r-help@r-project.orggt; Sent: Monday, August 15, 2011 2:04 PM Subject: Re: [R] calibration curve for cph() Build a prediction function using 'Function' that gets applied to set2. Calibrate and validate. -- David Sent from my iPhone On Aug 15, 2011, at 11:31 AM, array chip lt;arrayprof...@yahoo.comgt; wrote: Hi, the calibrate.cph() function in rms package generate calibration curve for Cox model on the same dataset where the model was derived using bootstrapping or cross-validation. If I have the model built on dataset 1, and now I want to produce a calibration curve for this model on an independent dataset 2, how can I do that? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/calibration-curve-for-cph-tp3745328p3746931.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calibration curve for cph()
Oops, thank for reminding. I found that an in-house package interfered with rms package, which caused the error. Thanks David! John - Original Message - From: David Winsemius dwinsem...@comcast.net To: array chip arrayprof...@yahoo.com Cc: Frank Harrell f.harr...@vanderbilt.edu; r-help@r-project.org r-help@r-project.org Sent: Tuesday, August 16, 2011 11:27 AM Subject: Re: [R] calibration curve for cph() On Aug 16, 2011, at 1:57 PM, array chip wrote: Dear Frank, Thanks for suggesting val.surv() function from rms package. It's exactly what I need. I tried the example on the help page and tweak towards to my situation, but got an error message. Could you suggestion what went wrong? It is interesting that neither my mail client nor the archives show the code that was sent to nabble. For posterity it was require(rms) ?val.surv library(rms) set.seed(123) # so can reproduce results n - 1000 age - 50 + 12*rnorm(n) sex - factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4))) cens - 15*runif(n) h - .02*exp(.04*(age-50)+.8*(sex=='Female')) t - -log(runif(n))/h units(t) - 'Year' label(t) - 'Time to Event' ev - ifelse(t = cens, 1, 0) t - pmin(t, cens) S - Surv(t, ev) f - cph(S ~ age + sex, x=TRUE, y=TRUE) val.surv(f, newdata=data.frame(age,sex), S=S, u=5) ## pretend the same training data as the independent dataset Unable to reproduce. I get sensible output. You may need to post sessionInfo(). Do you have all the dependencies installed? I noticed that even after rms was loaded with messages about also loading Hmisc, survival, and splines, that there was a further message saying polspline was being loaded. Error in val.surv(f, newdata = data.frame(age, sex), S = S, u = 5) : unused argument(s) (u = 5) --david. sessionInfo() R version 2.13.1 RC (2011-07-03 r56263) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] polspline_1.1.5 rms_3.3-1 Hmisc_3.8-3 survival_2.36-9 [5] TTR_0.20-3 xts_0.8-0 zoo_1.6-5 sos_1.3-0 [9] brew_1.0-6 lattice_0.19-30 loaded via a namespace (and not attached): [1] cluster_1.14.0 grid_2.13.1 tools_2.13.1 - Original Message - From: Frank Harrell f.harr...@vanderbilt.edu To: r-help@r-project.org Cc: Sent: Tuesday, August 16, 2011 4:23 AM Subject: Re: [R] calibration curve for cph() David Winsemius wrote: A combination of Predict (your newdata), cut2, and the plotting function of your choice ought to suffice. But thought that cross-validation was an option. Not at console at the moment (just off airplane.) Sent from my iPhone On Aug 15, 2011, at 5:26 PM, array chip lt;arrayprof...@yahoo.comgt; wrote: is there a R function that produces calibration curve on an independetn data automatically, just like what calibrate() does on the training data itself? Thanks John From: Comcast lt;dwinsem...@comcast.netgt; To: array chip lt;arrayprof...@yahoo.comgt; Cc: r-help@r-project.org lt;r-help@r-project.orggt; Sent: Monday, August 15, 2011 2:04 PM Subject: Re: [R] calibration curve for cph() Build a prediction function using 'Function' that gets applied to set2. Calibrate and validate. --David Sent from my iPhone On Aug 15, 2011, at 11:31 AM, array chip lt;arrayprof...@yahoo.comgt; wrote: Hi, the calibrate.cph() function in rms package generate calibration curve for Cox model on the same dataset where the model was derived using bootstrapping or cross-validation. If I have the model built on dataset 1, and now I want to produce a calibration curve for this model on an independent dataset 2, how can I do that? Thanks John David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cuminc() in cmprsk package for cumulative incidence
Hi, To use cuminc() from cmprsk package, if a subject has 2 events (both the event of interest and the event of competing risk), should I create 2 observations for this subject in the dataset, one for each event with different fstatus (1 and 2), or just 1 observation with whatever event that happened first? My analysis objective is calculate cumulative incidence for the event of interest. Thanks! John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] calibration curve for cph()
Hi, the calibrate.cph() function in rms package generate calibration curve for Cox model on the same dataset where the model was derived using bootstrapping or cross-validation. If I have the model built on dataset 1, and now I want to produce a calibration curve for this model on an independent dataset 2, how can I do that? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.