Re: [R] bwplot change whiskers position to percentile 5 and P95
See also the panel.bpplot function in the Hmisc package. This gives you many options for extended box plots. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/bwplot-change-whiskers-position-to-percentile-5-and-P95-tp2993755p2998578.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Frank Harrell's 2011 RMS Short Course-March 9,10 and 11 (fwd)
2011 Regression Modeling Strategies Short Course by Frank E. Harrell, Jr., Ph.D., Professor and Chair, Department of Biostatistics, Vanderbilt University School of Medicine. 2011 Dates: Wednesday, March 9 (8:00AM-11:00AM), Thursday, March 10 (8:00AM-4:00PM) and Friday, March 11, 2011 (8:00AM-4:00PM). Location: Vanderbilt Campus; Sarratt Student Center (SARR) 216/220. Requirements: strong competence in multiple regression models. Target audience: statisticians and related quantitative researchers who want to learn some general model development strategies, including approaches to missing data imputation, data reduction, model validation, and relaxing linearity assumptions. Registration: please email interest to Audrey Carvajal (audrey.carva...@vanderbilt.edu) no later than December 31, 2010. Cancellation policy: February 15, 2011; send cancellation notification to Audrey via email. For more information including lodging, hotels, course schedules and fees, please visit http://biostat.mc.vanderbilt.edu/RmS or respond to this email with questions. Audrey R. Carvajal Department Of Biostatistics School of Medicine Vanderbilt University (615) 322-2001 Flyer_RMS2011.pdf Description: Flyer_RMS2011.pdf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error message in fit.mult.impute (Hmisc package)
I tried your code with the rms package (replacement for the Design package; see http://biostat.mc.vanderbilt.edu/Rrms) and it worked fine. Note that multiple imputation needs the outcome variable in the imputation model. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Error-message-in-fit-mult-impute-Hmisc-package-tp3022817p3023563.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple imputation for nominal data
The aregImpute function in the Hmisc package can do this through predictive mean matching and canonical variates (Fisher's optimum scoring algorithm). Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Multiple-imputation-for-nominal-data-tp3024276p3025181.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] An example for predab.resample in r
predab.resample can be called by users but usually is not. It is called by calibrate.* and validate.* functions in the rms and Design packages (you neglected to say which package you were using; update to rms if using Design). The calibrate and validate functions make it easy to use .632. Note that .632 only seems to help when you are using an improper scoring rule (e.g., proportion classified correctly), not when using smoother scoring rules such as Brier score, ROC area, etc. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/An-example-for-predab-resample-in-r-tp3031305p3031404.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bootstrap confidence intervals using bootcov from the rms package
Unfortunately, bootcov is not meant to operate on fit objects produced by fit.mult.impute. bootcov gets there too late in the process and does not know how to penalize for imputation. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Bootstrap-confidence-intervals-using-bootcov-from-the-rms-package-tp3035194p3036102.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] leaps::regsubsets p-value
P-values have little meaning in this context. Nor do regression coefficient estimates and especially standard errors (they are biased). Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/leaps-regsubsets-p-value-tp3036640p3036740.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fit a distribution to an ECDF
Fitting curves to an ECDF will result in a fit that has the same precision as the ECDF if variances are calculated correctly. So why not stop with the ECDF as your estimator? Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Fit-a-distribution-to-an-ECDF-tp3045793p3046091.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting number of patients at risk below survival curve
Use the rms package, a replacement for of Design Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Plotting-number-of-patients-at-risk-below-survival-curve-tp3048900p3050315.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculating martingale residual on new data using
The tendency is to use residual-like diagnostics on the entire dataset that was available for model development. For test data we typically run predictive accuracy analyses. For example, one of the strongest validations is to show, in a high-resolution calibration plot, that absolute predictions (e.g., probability of survival at 2 years) are accurate. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/calculating-martingale-residual-on-new-data-using-predict-coxph-tp3050712p3052377.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how do remove those predictor which have p value greater than 0.05 in GLM?
What would make you want to delete a variable because P 0.05? That will invalidate every aspect of statistical inference for the model. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/how-do-remove-those-predictor-which-have-p-value-greater-than-0-05-in-GLM-tp3053921p3054478.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xYplot error
If the x-axis variable is really a factor, xYplot will not handle it. You probably need a dot chart instead (see Hmisc's Dotplot). Note that it is unlikely that the confidence intervals are really symmetric. Frank On Tue, 27 Jul 2010, Kang Min wrote: Hi, I'm trying to plot a graph with error bars using xYplot in the Hmisc package. My data looks like this. mortstand sitetype 0.042512776 0.017854525 Plot A ST 0.010459803 0.005573305 PF ST 0.005188321 0.006842107MSFST 0.004276068 0.011592129YSF ST 0.044586495 0.035225266 Plot A LD 0.038810662 0.037355408 PF LD 0.027567430 0.020523820MSF LD 0.024698872 0.020320976YSF LD Having read previous posts on xYplot being unable to plot x-axis as factors, I used numericScale, but I still get this error. Error in label.default(xv, units = TRUE, plot = TRUE, default = as.character(xvname), : the default string cannot be of length greater then one I used: xYplot(cbind(mort, mort + stand, mort - stand) ~ numericScale(site) | type, method=bars) Am I missing something or doing something wrong? Thanks. KM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to generate a random data from a empirical distribition
Easiest thing is to sample with replacement from the original data. This is the idea behind the bootstrap, which is sampling from the empirical CDF. Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Tue, 27 Jul 2010, Greg Snow wrote: Another option for fitting a smooth distribution to data (and generating future observations from the smooth distribution) is to use the logspline package. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of xin wei Sent: Monday, July 26, 2010 12:36 PM To: r-help@r-project.org Subject: [R] how to generate a random data from a empirical distribition hi, this is more a statistical question than a R question. but I do want to know how to implement this in R. I have 10,000 data points. Is there any way to generate a empirical probablity distribution from it (the problem is that I do not know what exactly this distribution follows, normal, beta?). My ultimate goal is to generate addition 20,000 data point from this empirical distribution created from the existing 10,000 data points. thank you all in advance. -- View this message in context: http://r.789695.n4.nabble.com/how-to- generate-a-random-data-from-a-empirical-distribition- tp2302716p2302716.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Rank ANCOVA
This has been shown to yield unreliable analyses. Use the more formal proportional odds ordinal logistic model. This is a generalization of the Wiloxon-Mann-Whitney-Kruskal-Wallis statistic. This is implemented in the rms package and elsewhere. Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Tue, 27 Jul 2010, Dennis Fisher wrote: Colleagues I am trying to replicate some analyses performing in SAS using a rank ANCOVA model. I assume that this is a non-parametric ANOVA with covariates but I have not been able to get confirmation from the people who did the original analyses. Can anyone direct me to the comparable function for R? Thanks in advance. Not that it matters in this situation but the platform is OS X; R version is 2.11. Dennis Dennis Fisher MD P (The P Less Than Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xYplot error
Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Wed, 28 Jul 2010, Kang Min wrote: Hi Frank, Thanks for the suggestion. using numericScale() does work for Dotplot, but there're still a few issues: 1. My factor names are Plot A, PF, MSF, and YSF, so numericScale turns that into 3, 2, 1, 4 and the x-axis is plotted 1, 2, 3, 4. Is there any way I can retain the same order on the graph? Not sure why you are using numericScale. You can use the original factor variable. If you need to re-order its levels for plotting using reorder.factor. 2. I can't get the error bars displayed even after using method=bars, only the mean, lower and upper bounds of the data as points. This the line I used: Dotplot(cbind(mort, mort + stand, mort - stand) ~ numericScale(site) | type, data = mort, method=bands) That looks OK but I can't test it right now. Please continue to have a look, and if you still don't see the problem provide a tiny reproducible example with self-contained data I can access. Frank Thanks for your help. KM On Jul 27, 9:58 pm, Frank Harrell f.harr...@vanderbilt.edu wrote: If the x-axis variable is really a factor, xYplot will not handle it. You probably need a dot chart instead (see Hmisc's Dotplot). Note that it is unlikely that the confidence intervals are really symmetric. Frank On Tue, 27 Jul 2010, Kang Min wrote: Hi, I'm trying to plot a graph with error bars using xYplot in the Hmisc package. My data looks like this. mort stand site type 0.042512776 0.017854525 Plot A ST 0.010459803 0.005573305 PF ST 0.005188321 0.006842107 MSF ST 0.004276068 0.011592129 YSF ST 0.044586495 0.035225266 Plot A LD 0.038810662 0.037355408 PF LD 0.027567430 0.020523820 MSF LD 0.024698872 0.020320976 YSF LD Having read previous posts on xYplot being unable to plot x-axis as factors, I used numericScale, but I still get this error. Error in label.default(xv, units = TRUE, plot = TRUE, default = as.character(xvname), : the default string cannot be of length greater then one I used: xYplot(cbind(mort, mort + stand, mort - stand) ~ numericScale(site) | type, method=bars) Am I missing something or doing something wrong? Thanks. KM __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to generate a random data from a empirical distribition
This is true by definition. Read about the bootstrap which may give you some good background information. Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Wed, 28 Jul 2010, xin wei wrote: hi, Frank: how can we make sure the randomly sampled data follow the same distribution as the original dataset? i assume each data point has the same prabability to be selected in a simple random sampling scheme. thanks -- View this message in context: http://r.789695.n4.nabble.com/how-to-generate-a-random-data-from-a-empirical-distribition-tp2302716p2305275.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with normality req. for ANOVA
To add to David's note, the Kruskal-Wallis test is the nonparametric counterpart to one-way ANOVA. You can get a series of K-W tests for several grouping or continuous independent variables (but note these are SEPARATE analyses) using the Hmisc package's spearman2 function. The generalization of K-W to the case of multiple independent variables is the proportional odds ordinal logistic model (see e.g. the rms package lrm function). Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Mon, 2 Aug 2010, David Winsemius wrote: On Aug 2, 2010, at 9:33 AM, wwreith wrote: I am conducting an experiment with four independent variables each of which has three or more factor levels. The sample size is quite large i.e. several thousand. The dependent variable data does not pass a normality test but visually looks close to normal so is there a way to compute the affect this would have on the p-value for ANOVA or is there a way to perform an nonparametric test in R that will handle this many independent variables. Simply saying ANOVA is robust to small departures from normality is not going to be good enough for my client. The statistical assumption of normality for linear models do not apply to the distribution of the dependent variable, but rather to the residuals after a model is estimated. Furthermore, it is the homoskedasticity assumption that is more commonly violated and also greater threat to validity. (And if you don't already know both of these points, then you desperately need to review your basic modeling practices.) I need to compute an error amount for ANOVA or find a nonparametric equivalent. You might get a better answer if you expressed the first part of that question in unambiguous terminology. What is error amount? For the second part, there is an entire Task View on Robust Statistical Methods. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with normality req. for ANOVA
In addition the poster did not tell us what is wrong with a nonparametric test. Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Mon, 2 Aug 2010, Bert Gunter wrote: My sympathies, but I don't think it's the business of list contributors to facilitate stupidity. Confidence interval for the p-value is nonsense. You could try sensitivity analyses via simulation, though. Cheers, Bert Gunter Genentech Nonclinical Biostatistics On Mon, Aug 2, 2010 at 11:31 AM, wwreith reith_will...@bah.com wrote: I am testing normality on the studetized residuals that are generated after performing ANOVA and yes I used Levene's test to see if the variances can be assumed equal. They infact are not, but I have found a formula for determining whether the p-value for ANOVA will become larger or smaller as a result of unequal variances and unequal sample sizes. Fortuneately it turns out the p-value is greater. Despite this the ANOVA test is still significant with p=.000. The problem I have is that I am expected, by my client, to find a similiar formula that states which way the p-value would be pushed by a lack of normality. Despite numerous citations that ANOVA is robust to departures of normality my client does not care. They want numerical proof. This lead to looking for a method for estimating the effects non normality would have on the p-value for ANOVA. In other words can I build a confidence interval for the p-value? Hence the error term I am speaking of would be a the margin or error for p-value confidence interval. William W. Reith III Business Analytics J9 SAC (757)-203-3400 Best Contact From 7:00am-4:00pm J9 Office (757)-203-3772 Booz Office (757) 466-3253 Mobile (434)-989-7948 From: David Winsemius [via R] [ml-node+2310616-1859960724-371...@n4.nabble.com] Sent: Monday, August 02, 2010 1:33 PM To: Reith, William [USA] Subject: Re: Problems with normality req. for ANOVA On Aug 2, 2010, at 9:33 AM, wwreith wrote: I am conducting an experiment with four independent variables each of which has three or more factor levels. The sample size is quite large i.e. several thousand. The dependent variable data does not pass a normality test but visually looks close to normal so is there a way to compute the affect this would have on the p-value for ANOVA or is there a way to perform an nonparametric test in R that will handle this many independent variables. Simply saying ANOVA is robust to small departures from normality is not going to be good enough for my client. The statistical assumption of normality for linear models do not apply to the distribution of the dependent variable, but rather to the residuals after a model is estimated. Furthermore, it is the homoskedasticity assumption that is more commonly violated and also greater threat to validity. (And if you don't already know both of these points, then you desperately need to review your basic modeling practices.) I need to compute an error amount for ANOVA or find a nonparametric equivalent. You might get a better answer if you expressed the first part of that question in unambiguous terminology. What is error amount? For the second part, there is an entire Task View on Robust Statistical Methods. -- David Winsemius, MD West Hartford, CT __ [hidden email]https://webmail.bah.com/OWA/UrlBlockedError.aspx mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. View message @ http://r.789695.n4.nabble.com/Problems-with-normality-req-for-ANOVA-tp2310275p2310616.html To unsubscribe from Problems with normality req. for ANOVA, click here (link removed) =. -- View this message in context: http://r.789695.n4.nabble.com/Problems-with-normality-req-for-ANOVA-tp2310275p2310738.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Specifying interactions in rms package... error
Hi Rob, rms wants symmetry in the sense that the interactions need to use the same number and location of spline knots as the main effects. So if using the * notation omit the main effects (which are generated automatically) and live with the equal knots. Or use the restricted interaction operator %ia% to not use doubly nonlinear terms in forming the interactions, which is a sensible restriction. Frank Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Mon, 2 Aug 2010, Rob James wrote: I am encountering an error I do not know how to debug. The error arises when I try to add an interaction term involving two continuous variables (defined using rcs functions) to an existing (and working) model. The new model reads: model5 - lrm( B_fainting ~ gender+ rcs(exactage, 7) + rcs(DW_nadler_bv, 7) + rcs(drawtimefrom8am, 7)+ DW_firsttime+ DW_race_eth + rcs(DW_BPS, 5)+ rcs(DW_BPD, 5)+ rcs(pulse3, 5)+ locat2+ (rcs(exactage, 3) * rcs(drawtimefrom8am, 3)) ) This gives rise to the following error: Error in dimnames(X)[[2]] - atr$colnames : length of 'dimnames' [2] not equal to array extent In addition: Warning message: In names(nmiss)[jz] - fname[asm != 9] : number of items to replace is not a multiple of replacement length The sessionInfo() output is below, but I think relatively unremarkable. The dataset is very large (about 550K observations, although the outcome (B_fainting) is relatively sparse with only 1500 events in this sample. I do not get this error when the interaction term is replaced by two categorical variables (data not shown). I can replicate this error when there is just one continuous variable and one categorical variable. I'd welcome suggestions. Thank you in advance. sessionInfo() R version 2.11.0 (2010-04-22) x86_64-pc-linux-gnu locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C LC_TIME=en_CA.UTF-8 [4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=C LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=en_CA.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] gmodels_2.15.0 rms_3.0-0 Hmisc_3.8-2 survival_2.35-8 odfWeave_0.7.14 XML_3.1-0 [7] lattice_0.18-8 loaded via a namespace (and not attached): [1] cluster_1.13.1 gdata_2.8.0grid_2.11.0gtools_2.6.2 MASS_7.3-7 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to extract se(coef) from cph?
In an upcoming release of the rms package, all fit objects can be printed using LaTeX if putting LaTeX code directly to the console (this is optimized for Sweave). You will be able to say print(fit, latex=TRUE). Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Thu, 5 Aug 2010, David Winsemius wrote: On Aug 5, 2010, at 4:03 PM, Biau David wrote: Hello, I am modeling some survival data wih cph (Design). I have modeled a predictor which showed non linear effect with restricted cubic splines. I would like to retrieve the se(coef) for other, linear, predictors. The cph object has a var. The vcov function is an extractor function. You would probably be using something like: diag(vcov(fit))^(1/2) This is just to make nice LateX tables automatically. Are you sure Frank has not already programed that for you somewhere? Perhaps latex.cph? I have the coefficients with coef(). How do I do that? Thanks, David Biau. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot the dependent variable against one of the predictors with other predictors as constant
There are many ways to do this. Here is one. install.packages('rms') require(rms) dd - datadist(x, y); options(datadist='dd') f - ols(z ~ x + y) plot(Predict(f))# plot all partial effects plot(Predict(f, x)) # plot only the effect of x plot(Predict(f, y)) # plot only the effect of y f - ols(z ~ pol(x,2)*pol(y,2) # repeat, not assuming linearity Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Sat, 7 Aug 2010, Yi wrote: Hi, folks, Happy work in weekends _ My question is how to plot the dependent variable against one of the predictors with other predictors as constant. Not for the original data, but after prediction. It means y is the predicted value of the dependent variables. The constane value of the other predictors may be the average or some fixed value. ### y=1:10 x=10:1 z=2:11 lin_model=lm(z~x+y) x_new=11:2 ### How to plot predicted value of z from the regression model with x takes x_new and y as a constant (let's say y=1) I am thinking about using 'predict' command to generate the prediction of z with the new data.frame but there should be a better way. Thanks all. Yi [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic Regression in R (SAS -like output)
Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Mon, 9 Aug 2010, Harsh wrote: Hello useRs, I have a problem at hand which I'd think is fairly common amongst groups were R is being adopted for Analytics in place of SAS. Users would like to obtain results for logistic regression in R that they have become accustomed to in SAS. Towards this end, I was able to propose the Design package in R which contains many functions to extract the various metrics that SAS reports. The replacement for Design, rms, has some new indexes. If you have suggestions pertaining to other packages, or sample code that replicates some of the SAS outputs for logistic regression, I would be glad to hear of them. Some of the requirements are: - Stepwise variable selection for logistic regression an invalid procedure - Choose base level for factor variables not relevant - get what you need from predicted values and differences in predicted values or contrasts - this automatically takes care of reference cells - The Hosmer-Lemeshow statistic obsolete: low power and sensitive to choice of binning - concordant and discordant see Hmisc's rcorr.cens - Tau C statistic Those are two different statistics. tau and C are obtained by lrm in rms/Design. Frank Thank you for your suggestions. Regards, Harsh Singhal __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic Regression in R (SAS -like output)
Note that stepwise variale selection based on AIC has all the problems of stepwise variable selection based on P-values. AIC is just a restatement of the P-Value. Frank Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Mon, 9 Aug 2010, Gabor Grothendieck wrote: On Mon, Aug 9, 2010 at 6:43 AM, Harsh singhal...@gmail.com wrote: Hello useRs, I have a problem at hand which I'd think is fairly common amongst groups were R is being adopted for Analytics in place of SAS. Users would like to obtain results for logistic regression in R that they have become accustomed to in SAS. Towards this end, I was able to propose the Design package in R which contains many functions to extract the various metrics that SAS reports. If you have suggestions pertaining to other packages, or sample code that replicates some of the SAS outputs for logistic regression, I would be glad to hear of them. Some of the requirements are: - Stepwise variable selection for logistic regression - Choose base level for factor variables - The Hosmer-Lemeshow statistic - concordant and discordant - Tau C statistic For stepwise logistic regression using AIC see: library(MASS) ?stepAIC For specifying reference level: ?relevel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic Regression in R (SAS -like output)
In the trivial case where all candidate predictors have one degree of freedom (which is unlikely as some things will be nonlinear or have 2 categories), adding a variable if it increases AIC is the same as adding it if its chi-square exceeds 2. This corresponds to an alpha level of 0.157 for a chi-square with 1 d.f. At least AIC leads people to use a more realistic alpha (small alpha in stepwise regression leads to more bias in the retained regression coefficients). But you still have serious multiplicity problems, and non-replicable models. Things are different if you have a pre-defined group of variables you are thinking of adding. Suppose that this group of 10 variables required 15 d.f. Adding the group if AIC (based on 15 d.f.) increases wouldn't be a bad strategy. This avoids the multiplicities of single-variable looks. Frank Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Mon, 9 Aug 2010, Kingsford Jones wrote: On Mon, Aug 9, 2010 at 10:27 AM, Frank Harrell f.harr...@vanderbilt.edu wrote: Note that stepwise variale selection based on AIC has all the problems of stepwise variable selection based on P-values. AIC is just a restatement of the P-Value. I find the above statement very interesting, particularly because there are common misconceptions in the ecological community that AIC is a panacea for model selection problems and the theory behind P-values is deeply flawed. Can you direct me toward a reference for better understanding the relation? best, Kingsford Jones Frank Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University On Mon, 9 Aug 2010, Gabor Grothendieck wrote: On Mon, Aug 9, 2010 at 6:43 AM, Harsh singhal...@gmail.com wrote: Hello useRs, I have a problem at hand which I'd think is fairly common amongst groups were R is being adopted for Analytics in place of SAS. Users would like to obtain results for logistic regression in R that they have become accustomed to in SAS. Towards this end, I was able to propose the Design package in R which contains many functions to extract the various metrics that SAS reports. If you have suggestions pertaining to other packages, or sample code that replicates some of the SAS outputs for logistic regression, I would be glad to hear of them. Some of the requirements are: - Stepwise variable selection for logistic regression - Choose base level for factor variables - The Hosmer-Lemeshow statistic - concordant and discordant - Tau C statistic For stepwise logistic regression using AIC see: library(MASS) ?stepAIC For specifying reference level: ?relevel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Identification of Outliners and Extraction of Samples
On Mon, 9 Aug 2010, Alexander Eggel wrote: Hello everybody, I need to know which samples (S1-S6) contain a value that is bigger than the median + five standard deviations of the column he is in. This is just an Why not the 70th percentile plus 6 times the difference in the 85th and 75th percentiles :-) Frank P.S. See @Article{fin06cal, author = {Finney, David J.}, title = {Calibration guidelines challenge outlier practices}, journal = The American Statistician, year = 2006, volume = 60, pages ={309-313}, annote = {anticoagulant therapy;bias;causation;ethics;objectivity;outliers;guidelines for treatment of outliers;overview of types of outliers;letter to the editor and reply 61:187 May 2007} } example. Command should be applied to a data frame wich is a lot bigger (over 100 columns). Any solutions? Thank you very much for your help!!! s Samples A BCE 1 S1 1 2 3 7 2 S2 4NA 6 6 3 S3 7 8 9NA 4 S4 4 5NA 6 5 S5 2 5 6 7 6 S6 2 3 4 5 This loop works fine for a column without NA values. However it doesn't work for the other columns. I should have a loop that I could apply to all columns ideally in one command. o - data.frame(); for (i in 1:nrow(s)) { dd - s[i,]; if (dd$A = median(s$A, na.rm=TRUE) + 5 * sd(s$A, na.rm=TRUE)) o - rbind(o,dd) } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple imputation, especially in rms/Hmisc packages
On Mon, 9 Aug 2010, Mark Seeto wrote: Hello, I have a general question about combining imputations as well as a question specific to the rms and Hmisc packages. The situation is multiple regression on a data set where multiple imputation has been used to give M imputed data sets. I know how to get the combined estimate of the covariance matrix of the estimated coefficients (average the M covariance matrices from the individual imputations and add (M+1)/M times the between-imputation covariance matrix), and I know how to use this to get p-values and confidence intervals for the individual coefficients. What I don't know is how to combine imputations to obtain the multiple degree-of-freedom tests to test whether a set of coefficients are all 0. If there were no missing values, this would be done using the general linear test which involves fitting the full model and the reduced model and getting an F statistic based on the residual sums of squares and degrees of freedom. How does one correctly do this when there are multiple imputations? In the language of Hmisc, I'm asking how the values in anova(fmi) are calculated if fmi is the result of fit.mult.impute (I've looked at the anova.rms code but couldn't understand it). In ordinary regression, the leave some variables out and compare R^2-based F-test and the quicker contrast tests are equivalent. In general we use the latter, which is expedient since it is easy to estimate the imputation-corrected covariance matrix. In the rms and Hmisc packages, fit.mult.impute creates the needed fit object, then anova, contrast, summary, etc. give the correct Wald tests. The second question I have relates to rms/Hmisc specifically. When I use fit.mult.impute, the fitted values don't appear to match the values given by the predict function on the original data. Instead, they match the fitted values of the last imputation. For example, library(rms) set.seed(1) x1 - rnorm(40) x2 - 0.5*x1 + rnorm(40,0,3) y - x1^2 + x2 + rnorm(40,0,3) x2[40] - NA # create 1 missing value in x2 a - aregImpute(~ y + x1 + x2, n.impute=2, nk=3) # 2 imputations x2.imp1 - x2; x2.imp1[40] - a$imputed$x2[,1] # first imputed x2 vector x2.imp2 - x2; x2.imp2[40] - a$imputed$x2[,2] # second imputed x2 vector ols.imp1 - ols(y ~ rcs(x1,3) + x2.imp1) # model on imputation 1 ols.imp2 - ols(y ~ rcs(x1,3) + x2.imp2) # model on imputation 2 d - data.frame(y, x1, x2) fmi - fit.mult.impute(y ~ rcs(x1,3) + x2, ols, a, data=d) fmi$fitted predict(fmi, d) I would have expected the results of fmi$fitted and predict(fmi,d) to be the same except for the final NA, but they are not. Instead, fmi$fitted is the same as ols.imp2$fitted. Also, anova(ols.imp2) anova(fmi) unexpectedly give the same result in the ERROR line. I would be most grateful if anyone could explain this to me. I need to add some more warnings. For many of the calculations, the last imputation is used. Frank Thanks, Mark -- Mark Seeto Statistician National Acoustic Laboratories http://www.nal.gov.au/ A Division of Australian Hearing __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting confidence bands around regression line
Please give the prescription. The article is not available on our extensive online library. I wonder if the method can compete with the bootstrap. Frank Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Tue, 10 Aug 2010, Michal Figurski wrote: David, I would consider myself intermediate in R, but a beginner in statistics. I need a formula that would allow me to calculate confidence boundaries of the regression line given the slope, intercept and their CIs (and *any* range). Passing-Bablok regression doesn't yet exist in R - I am developing it. Therefore I am sure there is no predict method for it ;) I believe I have provided sufficient data to address this problem, but if that would help anyone, here is more: # data frame a - structure(list(x = c(0.1, 1.43, 4.21, 3.67, 3.23, 7.72, 5.99, 9.16, 10.6, 9.84, 11.94, 12.03, 12.89, 11.26, 15.54, 15.58, 17.32, 17.65, 19.52, 20.48, 20.44, 20.51, 22.27, 23.58, 25.83, 26.04, 26.92, 28.44, 30.73, 28.78), y = c(1.08, 1.39, 1.84, 0.56, 7.23, 4.91, 3.35, 7.09, 3.16, 8.98, 16.37, 7.46, 15.46, 23.2, 4.63, 11.13, 15.68, 13.92, 26.44, 21.65, 21.01, 20.22, 22.69, 22.21, 23.6, 17.24, 45.24, 30.09, 40, 49.6)), .Names = c(x, y), row.names = c(NA, -30L), class = data.frame) Then I run the regression procedure (in development - now part of the 'MethComp' package): print(PBreg(a)) # And the result of the Passing-Bablok regression on this data frame: Estimate 5%CI 95%CI Intercept -4.306197 -9.948438 -1.374663 Slope 1.257584 1.052696 1.679290 The original Passing Bablok article on this method has an easy prescription for CIs on coefficients, so I implemented that. Now I need a way to calculate CI boundaries for individual points - this may be a basic handbook stuff - I just don't know it (I'm not a statistician). I would appreciate if anyone could point me to a handbook or website where it is described. Regarding 2 - the predict method for 'nls' class currently *ignores* the interval parameter - as it is stated in documentation. Regards -- Michal J. Figurski, PhD HUP, Pathology Laboratory Medicine Biomarker Research Laboratory 3400 Spruce St. 7 Maloney Philadelphia, PA 19104 tel. (215) 662-3413 On 2010-08-10 11:38, David Winsemius wrote: On Aug 10, 2010, at 11:23 AM, Michal Figurski wrote: David, I may have stated my problem incorrectly - my problem is to *obtain the coordinates* for confidence boundary lines. As input data I have only CIs for slope and intercept. Wouldn't you also need to specify the range over which these estimates might be valid and to offer the means for the X values? What level of R knowledge are you at? You have provided no data or code. Many R methods offer predict methods that return CI's. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting confidence bands around regression line
Thanks Michael, That's the method that Dana Quade taught me in his intro nonparametrics course at UNC in the mid 1970s, at least for a single predictor. His method did not incorporate the shift you mentioned though. The method looks robust. Not sure about efficiency. Frank Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Tue, 10 Aug 2010, Michal Figurski wrote: Frank, I had to order this article through Inter-Library Loan and wait for it for a week! I'll try to make it short. In Passing-Bablok the principle is to calculate slopes between all possible pairs of points in the dataset, and then to take a shifted median of those slopes, where the offset is the number of slopes of value (-1). Because of this, bootstrap is out of question - it would take too much time. Let n be the number of data points, N - the number of slopes and K - the offset. The locations of CI boundaries in the set of N slopes are then calculated with formulas: M1 - N - qnorm(1 - conf.level/2) * sqrt((n*(n-1)*(2*n+5))/18))/2 M2 - N - M1 + 1 CIs for intercept are calculated as medians of y(i) - slopes[M1,M2]*x(i) I hope I don't confuse anyone. The article has a mathematical derivations section and justification for these formulas. It looks solid to me, although after 25 years the flaws may be apparent to some of you. -- Michal J. Figurski, PhD HUP, Pathology Laboratory Medicine Biomarker Research Laboratory 3400 Spruce St. 7 Maloney Philadelphia, PA 19104 tel. (215) 662-3413 On 2010-08-10 12:29, Frank Harrell wrote: Please give the prescription. The article is not available on our extensive online library. I wonder if the method can compete with the bootstrap. Frank Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University On Tue, 10 Aug 2010, Michal Figurski wrote: David, I would consider myself intermediate in R, but a beginner in statistics. I need a formula that would allow me to calculate confidence boundaries of the regression line given the slope, intercept and their CIs (and *any* range). Passing-Bablok regression doesn't yet exist in R - I am developing it. Therefore I am sure there is no predict method for it ;) I believe I have provided sufficient data to address this problem, but if that would help anyone, here is more: # data frame a - structure(list(x = c(0.1, 1.43, 4.21, 3.67, 3.23, 7.72, 5.99, 9.16, 10.6, 9.84, 11.94, 12.03, 12.89, 11.26, 15.54, 15.58, 17.32, 17.65, 19.52, 20.48, 20.44, 20.51, 22.27, 23.58, 25.83, 26.04, 26.92, 28.44, 30.73, 28.78), y = c(1.08, 1.39, 1.84, 0.56, 7.23, 4.91, 3.35, 7.09, 3.16, 8.98, 16.37, 7.46, 15.46, 23.2, 4.63, 11.13, 15.68, 13.92, 26.44, 21.65, 21.01, 20.22, 22.69, 22.21, 23.6, 17.24, 45.24, 30.09, 40, 49.6)), .Names = c(x, y), row.names = c(NA, -30L), class = data.frame) Then I run the regression procedure (in development - now part of the 'MethComp' package): print(PBreg(a)) # And the result of the Passing-Bablok regression on this data frame: Estimate 5%CI 95%CI Intercept -4.306197 -9.948438 -1.374663 Slope 1.257584 1.052696 1.679290 The original Passing Bablok article on this method has an easy prescription for CIs on coefficients, so I implemented that. Now I need a way to calculate CI boundaries for individual points - this may be a basic handbook stuff - I just don't know it (I'm not a statistician). I would appreciate if anyone could point me to a handbook or website where it is described. Regarding 2 - the predict method for 'nls' class currently *ignores* the interval parameter - as it is stated in documentation. Regards -- Michal J. Figurski, PhD HUP, Pathology Laboratory Medicine Biomarker Research Laboratory 3400 Spruce St. 7 Maloney Philadelphia, PA 19104 tel. (215) 662-3413 On 2010-08-10 11:38, David Winsemius wrote: On Aug 10, 2010, at 11:23 AM, Michal Figurski wrote: David, I may have stated my problem incorrectly - my problem is to *obtain the coordinates* for confidence boundary lines. As input data I have only CIs for slope and intercept. Wouldn't you also need to specify the range over which these estimates might be valid and to offer the means for the X values? What level of R knowledge are you at? You have provided no data or code. Many R methods offer predict methods that return CI's. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] extracting the standard error in lrm
On Wed, 11 Aug 2010, david dav wrote: Hi, I would like to extract the coefficients of a logistic regression (estimates and standard error as well) in lrm as in glm with summary(fit.glm)$coef Thanks David coef(fit) sqrt(diag(vcov(fit))) But these will not be very helpful except in the trivial case where everything is linear, nothing interacts, and factors have two levels. Frank __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple imputation, especially in rms/Hmisc packages
Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Tue, 10 Aug 2010, Mark Seeto wrote: Hi Frank (and others), Thank you for your comments in reply to my questions. I had not encountered contrast tests before. I've looked in a few texts, but the only place I could find anything about contrast tests was your Regression Modeling Strategies book. You wrote that the leave some variables out F-test and the contrast tests are equivalent in ordinary regression, so I have tried to verify this. However, I got different results for the two tests, so I'm hoping you can point out what I'm doing incorrectly. The null hypothesis is CB = 0, where C is the contrast matrix and B is the vector of regression coefficients. I'm using the test statistic on p. 189 of RMS: W = (Cb)'(CVC')^{-1}(Cb), with b being the vector of coefficient estimates and V being its covariance matrix. The model is y ~ rcs(x1,3) + x2, and I want to test the null hypothesis that the two x1 coefficients are both zero. This is equivalent to CB = 0, with C being the matrix [0 1 0 0 (first row); 0 0 1 0 (second row)]. library(rms) set.seed(1) x1 - rnorm(50) x2 - rnorm(50) y - 1 + x1 + x2 + rnorm(50,0,3) ols.full - ols(y ~ rcs(x1,3) + x2) # full model ols.red - ols(y ~ x2) # reduced model RSS.full - sum((ols.full$fitted - y)^2) # residual sums of squares RSS.red - sum((ols.red$fitted - y)^2) F - ((RSS.red - RSS.full)/(48 - 46))/(RSS.full/46) # test statistic for F-test F [1] 2.576335 1 - pf(F, 2, 46) [1] 0.08698798 anova(ols.full) Analysis of Variance Response: y Factor d.f. Partial SS MS FP x1 239.95283222 19.97641611 2.58 0.0870 Nonlinear 1 0.07360774 0.07360774 0.01 0.9228 x2 147.17322335 47.17322335 6.08 0.0174 REGRESSION 383.79617922 27.93205974 3.60 0.0203 ERROR 46 356.67539200 7.75381287 anova(ols.full)[x1, F] [1] 2.576335# This confirms that the result of anova is the same as that of the leave some variables out F-test. V - ols.full$var C - matrix(c(0,1,0,0, 0,0,1,0), nrow=2, byrow=TRUE) b - ols.full$coef W - t(C %*% b) %*% solve(C %*% V %*% t(C)) %*% (C %*% b) 1 - pchisq(W, 2) [,1] [1,] 0.07605226 Have I used the correct distribution for W? Should I expect the p-values from the two tests to be exactly the same? Mark - you can see the code for this at the bottom of anova.rms. Compute W, divide by numerator d.f., then compute P-value using F with numerator and error d.f. Frank Thanks in advance; I really appreciate any help you can give. Mark Frank Harrell wrote: On Mon, 9 Aug 2010, Mark Seeto wrote: Hello, I have a general question about combining imputations as well as a question specific to the rms and Hmisc packages. The situation is multiple regression on a data set where multiple imputation has been used to give M imputed data sets. I know how to get the combined estimate of the covariance matrix of the estimated coefficients (average the M covariance matrices from the individual imputations and add (M+1)/M times the between-imputation covariance matrix), and I know how to use this to get p-values and confidence intervals for the individual coefficients. What I don't know is how to combine imputations to obtain the multiple degree-of-freedom tests to test whether a set of coefficients are all 0. If there were no missing values, this would be done using the general linear test which involves fitting the full model and the reduced model and getting an F statistic based on the residual sums of squares and degrees of freedom. How does one correctly do this when there are multiple imputations? In the language of Hmisc, I'm asking how the values in anova(fmi) are calculated if fmi is the result of fit.mult.impute (I've looked at the anova.rms code but couldn't understand it). In ordinary regression, the leave some variables out and compare R^2-based F-test and the quicker contrast tests are equivalent. In general we use the latter, which is expedient since it is easy to estimate the imputation-corrected covariance matrix. In the rms and Hmisc packages, fit.mult.impute creates the needed fit object, then anova, contrast, summary, etc. give the correct Wald tests. The second question I have relates to rms/Hmisc specifically. When I use fit.mult.impute, the fitted values don't appear to match the values given by the predict function on the original data. Instead, they match the fitted values of the last imputation. For example, library(rms) set.seed(1) x1 - rnorm(40) x2 - 0.5*x1 + rnorm(40,0,3) y - x1^2 + x2 + rnorm(40,0,3) x2[40] - NA # create 1 missing value in x2 a - aregImpute(~ y + x1 + x2, n.impute=2, nk=3) # 2 imputations x2.imp1 - x2; x2.imp1[40] - a$imputed$x2[,1] # first imputed x2 vector x2.imp2 - x2; x2.imp2[40] - a$imputed$x2[,2] # second imputed x2 vector ols.imp1 - ols(y ~ rcs(x1,3) + x2
Re: [R] Plotting confidence bands around regression line
Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Wed, 11 Aug 2010, Michal Figurski wrote: Peter, Frank, David and others, Thank you all for your ideas. I understand your lack of trust in PB method. Setting that aside (it's beyond me anyways), please see below what I have finally came up with to calculate the CI boundaries. Given the slope and intercept with their 05% 95% CIs, and a range of x = 1:50 : ints= c(-1, 0, 1) # c(int05%, int, int95%) slos= c(0.9, 1, 1.1) # c(slo05%, slo, slo95%) CIs = data.frame(x=1:50, lo=NA, hi=NA) for (i in 1:50) { CIs$lo[i] = min(ints + slos * CIs$x[i]) CIs$hi[i] = max(ints + slos * CIs$x[i]) } It looks like it works to me. Does it make sense? Doesn't look like it takes the correlation of slope and intercept into account but I may not understand. Now, what about a 4-parameter 'nls' model? Do you guys think I could use the same approach? This problem seems to cry out for one of the many available robust regression methods in R. Frank Best regards, -- Michal J. Figurski, PhD HUP, Pathology Laboratory Medicine Biomarker Research Laboratory 3400 Spruce St. 7 Maloney Philadelphia, PA 19104 tel. (215) 662-3413 On 2010-08-10 13:12, Peter Dalgaard wrote: Michal Figurski wrote: # And the result of the Passing-Bablok regression on this data frame: Estimate 5%CI 95%CI Intercept -4.306197 -9.948438 -1.374663 Slope 1.257584 1.052696 1.679290 The original Passing Bablok article on this method has an easy prescription for CIs on coefficients, so I implemented that. Now I need a way to calculate CI boundaries for individual points - this may be a basic handbook stuff - I just don't know it (I'm not a statistician). The answer is that you can't. You can't even do it with ordinary linear regression without knowing the correlation between slope and intercept. However, if you can get a CI for the intercept then you could subtract x0 from all the x and get a CI for the value at x0. (This brings echos from a distant past. My master's thesis was about some similar median-type estimators. I can't remember whether I looked at the Passing-Bablok paper at the time (1985!!) but my general recollection is that this group of methods is littered with unstated assumptions.) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting confidence bands around regression line
I may be missing something but I don't see how PB handles errors in variables any differently than other regression methods that ignore this problem. Frank Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Wed, 11 Aug 2010, S Ellison wrote: Frank Harrell f.harr...@vanderbilt.edu 11/08/2010 17:02:03 This problem seems to cry out for one of the many available robust regression methods in R. Not sure that would be much more appropriate, although it would _appear_ to work. The PB method is a sort of nonparametric/robust approach to an errors-in-variables problem, intended to provide an indication of consistency of results between two different measurement methods, often with similar error variance. So the aim is to handle the error-in-variable problem at least consistently, to avoid the bias that results from assuming no error in predictors. The M-estimator and related robust regression methods in things like MASS and robustbase don't handle errors in the predictors. Of course, with small errors in predictors that won't matter much; rlm and the like will be pretty much as defensible then as they ever are. But perhaps one could construct a more formal robust equivalent of error-in-variable regression by using a max likelihood functional relationship model with bivariate t (choosing arbitrarily low df) instead of bivariate gaussian errors? Unfortunately I haven't tried that, so no help beyond the thought ... Steve Ellison *** This email and any attachments are confidential. Any u...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Growth Curves with lmer
Classification accuracy is an improper scoring rule, and one of the problems with it is that the proportional classified correctly can be quite good even if the model uses no predictors. [Hence omitting the intercept is also potentially problematic.] Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Wed, 11 Aug 2010, Michael Scharkow wrote: Dear all, I have some growth curve data from an experiment that I try to fit using lm and lmer. The curves describe the growth of classification accuracy with the amount of training data t, so basically y ~ 0 + t (there is no intercept because y=0 at t0) Since the growth is somewhat nonlinear *and* in order to estimate the treatment effect on the growth curve, the final model is y ~ 0 + t + t.squared + t:treat + t,squared:treat this yields: t t.sq t:treat t.sq:treat 1.08 -0.0070.39 -0.0060 This fits the data fairly well, but I have replicated data for 12 different classifiers. First, I tried 12 separate regressions which yielded results with different positive values for t and t:treat. Finally, I tried to estimate a varying intercept model using lmer lmer(y ~ 0+t+t.squared+t:treat+t,squared:treat+(0+t+t.squared+t:treat +t,squared:treat | classifier) The fixed effects are similar to the pooled regression, but most of the random effects for t and t:treat are implausible (negative). What's wrong with the lmer model? Did I misspecify something? Greetings, Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to compare the effect of a variable across regression models?
David, In the Cox and many other regression models, the effect of a variable is context-dependent. There is an identifiability problem in what you are doing, as discussed by @ARTICLE{for95mod, author = {Ford, Ian and Norrie, John and Ahmadi, Susan}, year = 1995, title = {Model inconsistency, illustrated by the {Cox} proportional hazards model}, journal = Stat in Med, volume = 14, pages = {735-746}, annote = {covariable adjustment; adjusted estimates; baseline imbalances; RCT; model misspecification; model identification} } One possible remedy, which may not work for your goals, is to embed all models in a grand model that is used for inference. When coefficients ARE comparable in some sense, you can use the bootstrap to get confidence bands for differences in regressor effects between models. Frank Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Fri, 13 Aug 2010, Biau David wrote: Hello, I would like, if it is possible, to compare the effect of a variable across regression models. I have looked around but I haven't found anything. Maybe someone could help? Here is the problem: I am studying the effect of a variable (age) on an outcome (local recurrence: lr). I have built 3 models: - model 1: lr ~ age y = \beta_(a1).age - model 2: lr ~ age + presentation variables (X_p)y = \beta_(a2).age + \BETA_(p2).X_p - model 3: lr ~ age + presentation variables + treatment variables( X_t) y = \beta_(a3).age + \BETA_(p3).X_(p) + \BETA_(t3).X_t Presentation variables include variables such as tumor grade, tumor size, etc... the physician cannot interfer with these variables. Treatment variables include variables such as chemotherapy, radiation, surgical margins (a surrogate for adequate surgery). I have used cph for the models and restricted cubic splines (Design library) for age. I have noted that the effect of age decreases from model 1 to 3. I would like to compare the effect of age on the outcome across the different models. A test of \beta_(a1) = \beta_(a2) = \beta_(a3) and then two by two comparisons or a global trend test maybe? Is that possible? Thank you for your help, David Biau. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] val.prob in the Design package - Calibrated Brier Score
Please check the code. I hope that Brier is on the uncalibrated probabilities. Calibrated probabilities are from 1/(1+exp(-[a+b logit(uncalibrated probs)]) where a and b are maximum likelihood estimators (they will be 0 and 1 in training data). Frank Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Fri, 13 Aug 2010, Kim Fernandes wrote: Hello, I am using the val.prob function in the Design package. I understand how the Brier quadratic error score is calculated, but I do not know how the Brier score computed on the calibrated rather than raw predicted probabilities (B cal) is calculated. My question is: how are the calibrated probabilities calculated? Any explanation of this, or references to explanations of this, would be most greatly appreciated. Thanks for your help, Kim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Re : Re : How to compare the effect of a variable across regression models?
Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Fri, 13 Aug 2010, Biau David wrote: This is all very interesting indeed. so I appreciate that the effect of a variable will depend on the presence of other variables and that the effect of this variable has a statistical meaning only in a specific model. With the particularity of inconsistency if data arise from non normal distribution as for the Cox model. It's worse than that. In the Cox PH model the degree of non proportional hazards depends on which variables are adjusted for. I have two comments though; the first is statistical and the second more methodological: - first, either I am strict on the inconsistency phenomenon and, given the fact that it is very unlikely that I even come close to finding all relevant variables to a model, all models I may build are necessarily biased OR I have to be less strict. How to use tests (score, Wald, LR) to compare models when, at most, only one of them is unbiased? - second (relevance first depend on the answer to comment above I guess), I study the effect of age without adustment (model 1), after adjusting for presentation variables (model 2) and after adjusting for presentation and treatment variables (model 3). I certainly do not pretend that I am truly looking at the effect of age and I know that what I call age across the different models is not the same (otherwise I would not compare it or I would hope not to find any difference). I am merely looking the effect of something that is measured by 'age' and that emcompasses imbalances at presentation, treatment, and remaining unknown counfounders (first model), imbalances in treatment and remaining unknown counfounders (second model), and imbalances in the remaining confounders (third model). The remaining unknown confounders potentially include a true effect of age, if such a thing exists, and of other variables I don't know about (which in fact are still presentation and treatment variables since there are no other possibilities!). As everything fits in a cohesive interpretation: older patients presents with worse tumors and are undertreated (as shown by descriptive statistics) AND as this is shown by the modelisation of the effect by an erosion of the effect of 'age' from model 1 to 3, I was tempted, with all due precautions, to conclude that the increased risk for older patients was due in part because they present with worse tumors, in part because the are undertreated, and in part because of something else. Would that be overinterpreting the data? Should I even drop the idea of building and presenting these models? I'm not sure. But often we learn by predicting a predictor from the other predictors. Frank Thanks again, David Biau. _ De : Bert Gunter gunter.ber...@gene.com À : Biau David djmb...@yahoo.fr Cc : Frank Harrell f.harr...@vanderbilt.edu; r help list r-help@r-project.org Envoyé le : Ven 13 août 2010, 18h 22min 58s Objet : Re: [R] Re : How to compare the effect of a variable across regression models? Just to amplify a bit on what Frank said... Except in special circumstances (othogonal designs, say), regression models are only guaranteed to produce useful predictions -- they may not tell you anything meaningful about the relative effects of the regressors because, as Frank said, that depends on both the study design AND the true effects. So, for example, typically if one removes a regressor from a model and refits, the values of the remaining coefficients will change. Moreover, it is difficult to understand even conceptually what the effect of a variable should mean if, as is typical, it interacts with other variables: the effect of the variable then depends on what the values of the other variables are. A very simple example of this that I used to use in teaching was the effect of barbituates and alcohol on sleepiness. In the absence of barbituates, the effect of a few drinks is to make you modestly sleepy; in the presence of barbituates, the effect of a few drinks is to make you modestly dead! The point is that we live in a multivariate interactive world. Terms like the effect of a variable derive from univariate thinking and need to be very carefully defined to make sense in such a world -- and then studies need to be carefully designed -- happenstance data are rarely sufficient -- to quantify them. .. Not what scientists like to hear, I think, but this is the reality. Further comments and criticisms welcome, of course. Cheers, Bert Gunter Genentech Nonclinical Statistics On Fri, Aug 13, 2010 at 6:59 AM, Biau David djmb...@yahoo.fr wrote: OK, thank you very much for the answer.I will look into that. Hopefully I'll find smoething that will work out
Re: [R] Stepwise Regression + entry/exit significance level
The values of slentry and slstay that will avoid ruining the statistical properties of the result are slentry=1.0 and slstay=1.0. Frank Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Sat, 14 Aug 2010, Shubha Vishwanath Karanth wrote: Hi R, Does the step function used to perform stepwise regression has the option to specify the entry/exit significance levels for the independent variables? (This is similar to the 'slentry' and 'slstay' option in 'Proc reg' of SAS.). Or do we have any other package which does the above? Thanks. Thanks and Regards, Shubha This e-mail may contain confidential and/or privileged i...{{dropped:13}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to add lines to lattice plot produced by rms::bplot
Once you guys figure all this out, I'm glad to modify bplot to pass more arguments lattice if needed. Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Fri, 13 Aug 2010, David Winsemius wrote: On Aug 13, 2010, at 11:25 PM, Duncan Mackay wrote: Hi David I do not know if you have done something like this. I had tried a few efforts like that, starting with an examination of str(bp.plot) as you demonstrate. I tried str(bp.plot) which gave the section about the regions (for colours) as: $ panel.args.common:List of 8 ..$ x : num [1:2500] 27 28 29 29.9 30.9 ... ..$ y : num [1:2500] 141 141 141 141 141 ... ..$ z : num [1:2500] -1.43 -1.41 -1.39 -1.36 -1.34 ... ..$ at : num [1:10] -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 ..$ region : logi FALSE ..$ zlab :List of 3 .. ..$ label: chr log odds .. ..$ rot : num 90 .. ..$ cex : num 1 ..$ labels : logi TRUE ..$ contour: logi TRUE I tried (with a bplot object named bldLT40): bldLT40$legend$right$args$key$at - c(-0.233, seq(.50, 2.25, by=0.25), seq(2.5, 5.0, by=0.5), 6:10) ... and then tried bldLT40$panel.args$at - c(-0.233, seq(.50, 2.25, by=0.25), seq(2.5, 5.0, by=0.5), 6:10) Neither of these efforts changed the boundaries beteen colors in the plot area. The first effort changed the legend scal,e but that just created a misalignment of the colors of plot area and the legend. I would be interested in either a strategy that lets one alter the color level changes of the z variable (which in bplot-created objects is zhat, or lets one specify the values at which contour lines are drawn in contourplot. Thanks for your efforts. -- David. I added the col.region and colours from a plot levelplot that I had done to see what would occur to the trellis parameters. No colours were produced when plotted. bp.plot - bplot(p, lfun=contourplot, color.key = TRUE, col.regions = c (#FF ,#00,#A9E2FF,#8080FF,#FF,#FFD18F,#FF) ) $ panel.args.common:List of 10 ..$ x : num [1:2500] 27 28 29 29.9 30.9 ... ..$ y : num [1:2500] 141 141 141 141 141 ... ..$ z : num [1:2500] -1.43 -1.41 -1.39 -1.36 -1.34 ... ..$ at : num [1:10] -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 ..$ region : logi FALSE ..$ color.key : logi TRUE ..$ zlab :List of 3 .. ..$ label: chr log odds .. ..$ rot : num 90 .. ..$ cex : num 1 ..$ labels : logi TRUE ..$ contour: logi TRUE ..$ col.regions: chr [1:7] #FF #00 #A9E2FF #8080FF ... So it has been added to the panel.args.common, whether you can access these are another matter. I then tried bp.plot - bplot(p, lfun=contourplot, par.settings = list(axis.text = list(cex = 0.65)), color.key = TRUE, col.regions = c (#FF ,#00,#A9E2FF,#8080FF,#FF,#FFD18F,#FF) ) which changed the size of the axis text so it may be the case of having to add the col.regions etc to the appropriate list in par.settings I'll leave you to amend and access the colours. You may have to add values for the wireframe/levelplot arguments like at etc. and col.regions (I think that is the function) to produce an appropriate colour range of your choice It is a while since I have delved into these sorts of plots. Need some sustenance. Regards Duncan Duncan Mackay Department of Agronomy and Soil Science University of New England ARMIDALE NSW 2351 Email home: mac...@northnet.com.au At 10:33 14/08/2010, you wrote: I have a plot produced by function bplot (package = rms) that is really a lattice plot (class=trellis). It is similar to this plot produced by a very minor modification of the first example on the bplot help page: require(rms) n - 1000# define sample size set.seed(17) # so can reproduce the results age- rnorm(n, 50, 10) blood.pressure - rnorm(n, 120, 15) cholesterol- rnorm(n, 200, 25) sex- factor(sample(c('female','male'), n,TRUE)) label(age)- 'Age' # label is in Hmisc label(cholesterol)- 'Total Cholesterol' label(blood.pressure) - 'Systolic Blood Pressure' label(sex)- 'Sex' units(cholesterol)- 'mg/dl' # uses units.default in Hmisc units(blood.pressure) - 'mmHg' # Specify population model for log odds that Y=1 L - .4*(sex=='male') + .045*(age-50) + (log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male')) # Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)] y - ifelse(runif(n) plogis(L), 1, 0) ddist - datadist(age, blood.pressure, cholesterol, sex) options(datadist='ddist') fit - lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4)), x=TRUE, y=TRUE) p - Predict(fit, age, cholesterol, sex='male', np=50) # vary sex last bp.plot - bplot(p, lfun=contourplot) bp.plot I have tried a variety of efforts at using update (which I assume is a lattice function although I can find no help page for it. It does appear in some of the lattice hep pages and my
Re: [R] confidence Intervals for predictions in GLS
install.packages('rms') require(rms) ?Gls ?plot.Predict Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Sat, 14 Aug 2010, Camilo Mora wrote: Hi everyone: Is there a function in R to calculate the confidence intervals for the predictions of a GLS(Generalized Least Square) model? The function predict gives confidence intervals for the predictions of other types of models (lm, glm, etc) but not gls. Any input will be much appreciated, Best, Camilo Camilo Mora, Ph.D. Department of Biology Dalhouisie University Halifax, Canada Phone: (902) 494-7720 http://as01.ucis.dal.ca/fmap/people.php?pid=53 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to add lines to lattice plot produced by rms::bplot
Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Sat, 14 Aug 2010, David Winsemius wrote: On Aug 14, 2010, at 9:59 AM, Frank Harrell wrote: Once you guys figure all this out, I'm glad to modify bplot to pass more arguments lattice if needed. As always, Frank, I appreciate your support. In this case I think it's not needed. What seems to be needed is simply the correct use of the at argument. (I thought I had tried this before but apparently mucked it up somehow.) This gives the desired color levels separation and labeling of the default levelplot version of bplot output: bldLT40 - bplot(mdl.pred , perim=boundaries, at=c(-0.233, seq(.50, 2.25, by=0.25), seq(2.5, 5.0, by=0.5), 6:10) ) bldLT40 And this produces the expected output with its contouplot version: bldLT40c - bplot(mdl.pred , perim=boundaries, lfun=contourplot, at=c(-0.233, seq(.50, 2.25, by=0.25), seq(2.5, 5.0, by=0.5), 6:10) ) bldLT40c My only quibble with that last one was that the labels had three digits to the right of the decimal pt but that happily went away when I changed the low end to 0.25. All is good here. -- David. Excellent. Thanks David. If you think of anything that would be a good addition to the examples or other parts of the help file, shoot me a few lines and I'll paste them in. Frank Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Fri, 13 Aug 2010, David Winsemius wrote: On Aug 13, 2010, at 11:25 PM, Duncan Mackay wrote: Hi David I do not know if you have done something like this. I had tried a few efforts like that, starting with an examination of str(bp.plot) as you demonstrate. I tried str(bp.plot) which gave the section about the regions (for colours) as: $ panel.args.common:List of 8 ..$ x : num [1:2500] 27 28 29 29.9 30.9 ... ..$ y : num [1:2500] 141 141 141 141 141 ... ..$ z : num [1:2500] -1.43 -1.41 -1.39 -1.36 -1.34 ... ..$ at : num [1:10] -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 ..$ region : logi FALSE ..$ zlab :List of 3 .. ..$ label: chr log odds .. ..$ rot : num 90 .. ..$ cex : num 1 ..$ labels : logi TRUE ..$ contour: logi TRUE I tried (with a bplot object named bldLT40): bldLT40$legend$right$args$key$at - c(-0.233, seq(.50, 2.25, by=0.25), seq(2.5, 5.0, by=0.5), 6:10) ... and then tried bldLT40$panel.args$at - c(-0.233, seq(.50, 2.25, by=0.25), seq(2.5, 5.0, by=0.5), 6:10) Neither of these efforts changed the boundaries beteen colors in the plot area. The first effort changed the legend scal,e but that just created a misalignment of the colors of plot area and the legend. I would be interested in either a strategy that lets one alter the color level changes of the z variable (which in bplot-created objects is zhat, or lets one specify the values at which contour lines are drawn in contourplot. Thanks for your efforts. -- David. I added the col.region and colours from a plot levelplot that I had done to see what would occur to the trellis parameters. No colours were produced when plotted. bp.plot - bplot(p, lfun=contourplot, color.key = TRUE, col.regions = c (#FF ,#00,#A9E2FF,#8080FF,#FF,#FFD18F,#FF) ) $ panel.args.common:List of 10 ..$ x : num [1:2500] 27 28 29 29.9 30.9 ... ..$ y : num [1:2500] 141 141 141 141 141 ... ..$ z : num [1:2500] -1.43 -1.41 -1.39 -1.36 -1.34 ... ..$ at : num [1:10] -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 ..$ region : logi FALSE ..$ color.key : logi TRUE ..$ zlab :List of 3 .. ..$ label: chr log odds .. ..$ rot : num 90 .. ..$ cex : num 1 ..$ labels : logi TRUE ..$ contour: logi TRUE ..$ col.regions: chr [1:7] #FF #00 #A9E2FF #8080FF ... So it has been added to the panel.args.common, whether you can access these are another matter. I then tried bp.plot - bplot(p, lfun=contourplot, par.settings = list(axis.text = list(cex = 0.65)), color.key = TRUE, col.regions = c (#FF ,#00,#A9E2FF,#8080FF,#FF,#FFD18F,#FF) ) which changed the size of the axis text so it may be the case of having to add the col.regions etc to the appropriate list in par.settings I'll leave you to amend and access the colours. You may have to add values for the wireframe/levelplot arguments like at etc. and col.regions (I think that is the function) to produce an appropriate colour range of your choice It is a while since I have delved into these sorts of plots. Need some sustenance. Regards Duncan Duncan Mackay Department of Agronomy and Soil Science University of New England ARMIDALE NSW 2351 Email home: mac...@northnet.com.au At 10:33 14/08/2010, you wrote: I have a plot produced by function bplot (package = rms) that is really a lattice plot (class=trellis). It is similar to this plot produced by a very minor modification of the first example
Re: [R] HMisc/rms package questions
Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Tue, 17 Aug 2010, Rob James wrote: 1) How does one capture the plots from the plsmo procedure? Simply inserting a routing call to a graphical device (such as jpeg, png, etc) and then running the plsmo procedure (and then dev.off()) does not route the output to the file system. 1b) Related to above, has anyone thought of revising the plsmo procedure to use ggplot? I'd like to capture several such graphs into a faceted arrangement. Hi Rob, plsmo in Hmisc uses base graphics, and I have captured its output many times using pdf() or postscript(). I'll bet that Hadley Wickham has an example that will help. For lattice there is panel.plsmo. 2) The 2nd issue is more about communications than software. I have developed a model using lrm() and am using plot to display the model. All that is fairly easy. However, my coauthors are used to traditional methods, where baseline categories are rather broadly defined (e.g. males, age 25-40, height 170-180cm, BP 120-140, etc) and results are reported as odds-ratios, not as probabilities of outcomes. Therefore, and understandably, they are finding the graphs which arise from lrm-Predict-plot difficult to interpret. Specifically, in one graph, the adjusted to population is defined one way, and in another graph of the same model (displaying new predictors) there will be a new adjusted to population. Sometimes the adjusted populations are substantially distinct, giving rise to event rates that vary dramatically across graphs. This can prove challenging when trying to present the set of graphs as parts of a whole. It all makes sense; it just adds complexity to introducing these new methods. I very simple example might help us with this one. But odds ratios resulting from categorizing continuous variables are invalid. They do not have the claimed interpretation. In fact they have no interpretation in the sense that their interpretation is a function of the entire set of sample values. You can get whatever odds ratios you need (with exact interpretations) using summary or contrast. You can also modify plot to plot relative odds, relative to something of your choosing. Frank One strategy might be to manually define the baseline population across graphs; this way I could attempt to impose some content-specific coherence to the graphs, by selecting the baseline populations. Clearly this is do-able, but I have yet to see it done. I'd welcome suggestions and comments. Thanks, Rob __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R reports
What do low level proc print and proc report have on Sweave or http://biostat.mc.vanderbilt.edu/wiki/pub/Main/StatReport/summary.pdf? If proc print and proc report are 4G, let's move back a generation. Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Thu, 19 Aug 2010, Donald Paul Winston wrote: I don't see much in the way of an ability to write reports in R the way you can with SAS. You basically have to write a program with R in a 3G way unlike SAS with it's 4G proc print and proc report. Are there similar R functions and packages? -- View this message in context: http://r.789695.n4.nabble.com/R-reports-tp2330733p2330733.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ROCR predictions
At the heart of this you have a problem in incomplete conditioning. You are computing things like Prob(X x) when you know X=x. Working with a statistician who is well versed in probability models will undoubtedly help. Frank Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Thu, 19 Aug 2010, Assa Yeroslaviz wrote: Hello everybody, yes I'm sorry. I can see it is not so easy to understand. I'l try to explain a bit more. The experiment was used to compare two (protein domain) data bases and find out whether or not the results founded in one are comparable to the second DB. the first column shows the list of the various inputs in the DB, the second lists the various domains for each gene. the p-value column calculates the probability that the found in column four (Expected) to be found by chance. in column five the expected values was listed. The calculation of the TP,TN,FP,FN was made many times, each time with a different p-value (from p=1,...,p=10E-12) as a threshold to calculate the various values of TP,TN, etc. The goal of this calculation was to find the optimal p-value wit ha maximum of TP and a minimum of FP. To do so I thought about making the column of p-values my predictions and the values in the column Is.Expected (TRUE,FALSE) to my labels. This how I calculated my first ROC curve: pValue - read.delim(file = p=1.txt, as.is= TRUE) desc1 - pValue[[p.value]] label1 - pValue[[Is.Expected]] # after changing the values of TRUE = 0, FALSE = 1 pred - prediction(desc1, label1) perf - performance(pred, tpr, fpr) plot(perf, colorsize = TRUE) my question are as follow: 1. Am I right in my way of thinkning, that the p-values here are predictions? I know you said I need to decided it for myself, but I'm not sure. If they are, than I will have the same predictions for each and every calculation of ROCR. Will it make any difference at the prediction? 2. how can i calculate the other p-values thresholds? Do I need to do each separately, or is there a way of combining them? I hope you can still help we with some hints or further advieces. Thanks Assa On Wed, Aug 18, 2010 at 07:55, Claudia Beleites cbelei...@units.it wrote: Dear Assa, you need to call prediction with continuous predictions and a _binary_ true class label. You are the only one who can tell whether the p-values are actually predictions and what the class labels are. For the list readers p is just the name of whatever variable, and you didn't even vaguely say what you try to classify, nor did you offer any explanation of what the columns are. The only information we get from your table is that p-value has small and continuous values. From what I see the p-values could also be fitting errors of the predictions (e.g. expressed as a probability that the similarity to the predicted class is random). Claudia Assa Yeroslaviz wrote: Dear Claudia, thank you for your fast answer. I add again the table of the data as an example. Protein ID Pfam Domain p-value ExpectedIs Expected True Postive False Negative False Positive True Negative NP_11.2 APH 1.15E-05APH TRUE1 0 0 0 NP_11.2 MutS_V 0.0173 APH FALSE 0 0 1 0 NP_62.1 CBS 9.40E-08CBS TRUE1 0 0 0 NP_66.1 APH 3.83E-06APH TRUE1 0 0 0 NP_66.1 CobU0.009 APH FALSE 0 0 1 0 NP_66.1 FeoA0.3975 APH FALSE 0 0 1 0 NP_66.1 Phage_integr_N 0.0219 APH FALSE 0 0 1 0 NP_000161.2 Beta_elim_lyase 6.25E-12Beta_elim_lyase TRUE1 0 0 0 NP_000161.2 Glyco_hydro_6 0.002 Beta_elim_lyase FALSE 0 0 1 0 NP_000161.2 SurE0.0059 Beta_elim_lyase FALSE 0 0 1 0 NP_000161.2 SapB_2 0.0547 Beta_elim_lyase FALSE 0 0 1 0 NP_000161.2 Runt0.1034 Beta_elim_lyase FALSE 0 0 1 0 NP_000204.3 EGF 0.004666118 EGF TRUE1 0 0 0 NP_000229.1 PAS 3.13E-06PAS TRUE1 0 0 0 NP_000229.1 zf-CCCH 0.2067 PAS FALSE 0 1 1 0 NP_000229.1 E_raikovi_mat 0.0206 PAS FALSE 0 0 0 0 NP_000388.2 NAD_binding_1 8.21E-24NAD_binding_1 TRUE1 0 0 0 NP_000388.2 ABM 1.40E-08NAD_binding_1 FALSE 0 0 1 0 NP_000483.3 MMR_HSR11.98E-05MMR_HSR1TRUE1 0 0 0 NP_000483.3 DEAD2.30E-05MMR_HSR1FALSE 0 0 1 0 NP_000483.3 APS_kinase 1.80E-09MMR_HSR1
Re: [R] logistic regression tree
It would be good to tell us of the frequency of observations in each category of Y, and the number of continuous X's. Recursive partitioning will require perhaps 50,000 observations in the less frequent Y category for its structure and predicted values to validate, depending on X and the signal:noise ratio. Hence the use of combinations of trees nowadays as opposed to single trees. Or logistic regression. Frank Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Fri, 20 Aug 2010, Kay Cichini wrote: hello gavin achim, thanks for responding. by logistic regression tree i meant a regression tree for a binary response variable. but as you say i could also use a classification tree - in my case with only two outcomes. i'm not aware if there are substantial differences to expect for the two approaches (logistic regression tree vs. classification tree with two outcomes). as i'm new to trees / boosting / etc. i also might be advised to use the more comprehensible method / a function which argumentation is understood without having to climb a steep learning ledder, respectively. at the moment i don't know which this would be. regarding the meaning of absences at stands: as these species are frequent in the area and hence there is no limitation by propagules i guess absence is really due to unfavourable conditions. thanks a lot, kay - Kay Cichini Postgraduate student Institute of Botany Univ. of Innsbruck -- View this message in context: http://r.789695.n4.nabble.com/logistic-regression-tree-tp2331847p2332447.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic regression tree
On Fri, 20 Aug 2010, Kay Cichini wrote: hello, my data-collection is not yet finished, but i though have started investigating possible analysis methods. below i give a very close simulation of my future data-set, however there might be more nominal explanatory variables - there will be no continous at all (maybe some ordered nominal..). i tried several packages today, but the one i fancied most was ctree of the party package. i can't see why the given no. of datapoints (n=100) might pose a problem here - but please teach me better, as i might be naive.. See http://biostat.mc.vanderbilt.edu/wiki/Main/ComplexDataJournalClub#Sebastiani_et_al_Nature_Genetics The recursive partitioning simulation there will give you an idea - you can modify the R code to simulate a situation more like yours. When you simulate the true patterns and see how far the tree is from discovering the true patterns, you'll be surprised. Frank i'd be very glad about comments on the use of ctree on suchalike dataset and if i oversee possible pitfalls thank you all, kay ## # an example with 3 nominal explanatory variables: # Y is presence of a certain invasive plant species # introduced effect for fac1 and fac3, fac2 without effect. # presence with prob. 0.75 in factor combination fac1=I (say fac1 is geogr. region) and # fac3 = a|b|c (say all richer substrates). # presence is not influenced by fac2, which might be vegetation type, i.e. ## library(party) dat-cbind( expand.grid(fac1=c(I,II), fac2=LETTERS[1:5], fac3=letters[1:10])) print(dat-dat[order(dat$fac1,dat$fac2,dat$fac3),]) dat$fac13-paste(dat$fac1,dat$fac3,sep=) for(i in 1:nrow(dat)){ ifelse(dat$fac13[i]==Ia|dat$fac13[i]==Ib|dat$fac13[i]==Ic, dat$Y[i]-rbinom(1,1,0.75), dat$Y[i]-rbinom(1,1,0)) } dat$Y-as.factor(dat$Y) tr-ctree(Y~fac1+fac2+fac3,data=dat) plot(tr) ## - Kay Cichini Postgraduate student Institute of Botany Univ. of Innsbruck -- View this message in context: http://r.789695.n4.nabble.com/logistic-regression-tree-tp2331847p2333073.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R reports
Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Sat, 21 Aug 2010, Donald Paul Winston wrote: Sweave and LaTex is way to much overhead to deal with. There should be a built in standard report() function analogous to plot(). Something like the following is necessary if you want real people to take R seriously: report(data=aDataFrame, vars=vectorOfColumnNames, label=vectorOfColumnNames, by=vectorOfColumnNames, sum=vectorOfColumnNames, title=vectorOfStrings, footer=vectorOfStrings, pageBy=vectorOfColumnNames, sumBy=vectorOfColumnNames, filename=string, fileType=text|csv|pdf.etc) Did I say real people? I've been Palinized. -- View this message in context: http://r.789695.n4.nabble.com/R-reports-tp2330733p2333264.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R reports
Your notes are not well thought out. You'll find that r-help is a friendly place for new users that do not come in with an attitude. I once used SAS (for 23 years) and know it very well. I wrote the first SAS procedures for a graphics device, percentiles, logistic regression, and Cox regression. Its reporting for the first 30 years of SAS' existence was quite primitive, and since then it is not what I'd call advanced and certainly not esthetically pleasing. Considering that SAS has had tens of billions of $ at its disposal for research and development the result is quite unimpressive. Look at the fake PROC EXPORT if you want to get a real laugh - it can't even put out valid csv files if there are any unmatched quotes inside of character variable values. It is not even a procedure, just a front end to a trivial macro. The report function you outlined is in many ways more primitive than many functions already available in R. See the summary.formula function for example in the Hmisc package. Among other things, some of the functions give you flexibility in specifying the criteria by which a variable is considered continuous vs. discrete numeric. They also allow you to override statistical tests to include in the table with your own functions. Now one of the functions even automatically creates micrographics inside of table cells. You are welcome to write any R functions you'd like. The language for doing so is richer than the SAS language by a significant margin. Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Sat, 21 Aug 2010, Donald Paul Winston wrote: People have been generating reports with a computer for many years. R is supposed to be an analytical engine. Report writing is fundamental to any kind of analysis tool. SAS has had several report procedures/functions since the very beginning(1960's?). SAS stands for Statistical Analysis System. Do you really expect users to have to piece together a half dozen or so bits of R code to create a report? It's not like it's difficult to do! I see this new company called Revolution Analytics who thinks R is the next big thing. Good grief. Maybe they can rescue it from the ghettoized academic world. -- View this message in context: http://r.789695.n4.nabble.com/R-reports-tp2330733p2333267.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R reports
On Sat, 21 Aug 2010, Donald Winston wrote: The point is SAS has had simple reporting for 30 years. R apparently doesn't have any. Why is it so hard to accept that a report function analogous to a plot function would be a good thing? R has had more advanced reporting features that SAS since not long after R's first production release in the 1990s. Users who are unwilling to study R documentation and examples will not see that of course. Let's see if you want to put your money where your mouth is. How much are you willing to pay every year for SAS for its rigid approach to reporting? What is the current licensing fee to your organization? You are wrong on yet another piece of this discussion. SAS legally removed the phrase Statistical Analysis System from the company more than 15 years ago. SAS doesn't stand for anything. Frank On Aug 21, 2010, at 8:38 AM, Frank Harrell wrote: Your notes are not well thought out. You'll find that r-help is a friendly place for new users that do not come in with an attitude. I once used SAS (for 23 years) and know it very well. I wrote the first SAS procedures for a graphics device, percentiles, logistic regression, and Cox regression. Its reporting for the first 30 years of SAS' existence was quite primitive, and since then it is not what I'd call advanced and certainly not esthetically pleasing. Considering that SAS has had tens of billions of $ at its disposal for research and development the result is quite unimpressive. Look at the fake PROC EXPORT if you want to get a real laugh - it can't even put out valid csv files if there are any unmatched quotes inside of character variable values. It is not even a procedure, just a front end to a trivial macro. The report function you outlined is in many ways more primitive than many functions already available in R. See the summary.formula function for example in the Hmisc package. Among other things, some of the functions give you flexibility in specifying the criteria by which a variable is considered continuous vs. discrete numeric. They also allow you to override statistical tests to include in the table with your own functions. Now one of the functions even automatically creates micrographics inside of table cells. You are welcome to write any R functions you'd like. The language for doing so is richer than the SAS language by a significant margin. Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Sat, 21 Aug 2010, Donald Paul Winston wrote: People have been generating reports with a computer for many years. R is supposed to be an analytical engine. Report writing is fundamental to any kind of analysis tool. SAS has had several report procedures/functions since the very beginning(1960's?). SAS stands for Statistical Analysis System. Do you really expect users to have to piece together a half dozen or so bits of R code to create a report? It's not like it's difficult to do! I see this new company called Revolution Analytics who thinks R is the next big thing. Good grief. Maybe they can rescue it from the ghettoized academic world. -- View this message in context: http://r.789695.n4.nabble.com/R-reports-tp2330733p2333267.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R reports
On Sat, 21 Aug 2010, Donald Paul Winston wrote: Good grief. Adding a report function is not going to make R less flexible. Don't you want to use a tool that's relevant to the rest of the world? That world is much bigger then your world. This is ridiculous. Looks like some people are complaining about me criticizing R and the people who defend it. Good grief again. I think your philosophy is now more clear Donald. People who don't like learning new things place themselves at a competitive disadvantage. The R community, where learning is highly valued, may be fundamentally incompatible with your philosophy. You may do well to stay with SAS. Frank From: David Hajage-2 [via R] ml-node+248-1227050197-138...@n4.nabble.com Sent: Sat, August 21, 2010 4:54:12 AM Subject: Re: R reports Just show us what is the kind of report you want to do, and you will perhaps get a solution to reproduce it. Then, if you don't like the way to do that, write your own code or don't use R, noone force you. The majority of R users are satisfied with the way to generate reports, because it is flexible. There is ABSOLUTELY *NO WARRANTY with R, this means also you have no warranty to find exactly what you want, and what you can find in SAS. Just deal with it.* 2010/8/21 Donald Paul Winston [hidden email] Sweave and LaTex is way to much overhead to deal with. There should be a built in standard report() function analogous to plot(). Something like the following is necessary if you want real people to take R seriously: report(data=aDataFrame, vars=vectorOfColumnNames, label=vectorOfColumnNames, by=vectorOfColumnNames, sum=vectorOfColumnNames, title=vectorOfStrings, footer=vectorOfStrings, pageBy=vectorOfColumnNames, sumBy=vectorOfColumnNames, filename=string, fileType=text|csv|pdf.etc) Did I say real people? I've been Palinized. -- View this message in context: http://r.789695.n4.nabble.com/R-reports-tp2330733p2333264.html?by-user=t Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] AUC
Samuel, Since the difference in AUCs has insufficient power and doesn't really take into account the pairing of predictions, I recommend the Hmisc package's rcorrp.cens function. Its method has good power and asks the question is one predictor more concordant than the other in the same pairs of observations?. Frank Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Mon, 23 Aug 2010, Samuel Okoye wrote: Hello, Is there is any R function computes the AUC for paired data? Many thanks, Samuel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] modify a nomogram
Update to the rms package which is the version now being actively supported. New features will not be added to Design. The nomogram function in rms separates the plotting into a plot method for easier understanding. You can control all axes - some experimentation can tell you if you can do what you want as I haven't tried exactly what you are doing. Frank Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Wed, 25 Aug 2010, david dav wrote: Hi, I would like to emphasize (zoom) the zone of a nomogram where the probability are 0.01 (nomogram built with nomogram, Design). As a consequence, I don't need to draw the part of the Total points axis with score 60 equivalent in my case to a linear predictor 4.5 - As far as I know, this is not possible with the arguments of the function. - Changing the code of the function is beyond my abilities -- can not even create a nomo.f function with the same body: body(nomo) - expression({ conf.lp - match.arg(conf.lp) . rest of the function body this does not even work -- I am not sure to find the piece of code responsible for defining the axis nomogram(logistic.fit, fun = plogis, fun.at = c(seq(0.01,1,by=.2)), lmgp=.2, lp = T,maxscale = 100, total.sep.page = T ) Thanks David __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Speeding up prediction of survival estimates when using `survifit'
Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Mon, 30 Aug 2010, Ravi Varadhan wrote: Hi, I fit a Cox PH model to estimate the cause-specific hazards (in a competing risks setting). Then , I compute the survival estimates for all the individuals in my data set using the `survfit' function. I am currently playing with a data set that has about 6000 observations and 12 covariates. I am finding that the survfit function is very slow. Here is a simple simulation example (modified from Frank Harrell's example for `cph') that illustrates the problem: #n - 500 set.seed(4321) age - 50 + 12*rnorm(n) sex - factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4))) cens - 5 * runif(n) h - 0.02 * exp(0.04 * (age-50) + 0.8 * (sex=='Female')) dt - -log(runif(n))/h e - ifelse(dt = cens, 1, 0) dt - pmin(dt, cens) Srv - Surv(dt, e) f - coxph(Srv ~ age + sex, x=TRUE, y=TRUE) system.time(ans - survfit(f, type=aalen, se.fit=FALSE, newdata=f$x)) When I run the above code with sample sizes, n, taking on values of 500, 1000, 2000, and 4000, the time it takes for survfit to run are as follows: # n - 500 system.time(ans - survfit(f, type=aalen, se.fit=FALSE, newdata=f$x)) user system elapsed 0.160.000.15 # n - 1000 system.time(ans - survfit(f, type=aalen, se.fit=FALSE, newdata=f$x)) user system elapsed 1.450.001.48 # n - 2000 system.time(ans - survfit(f, type=aalen, se.fit=FALSE, newdata=f$x)) user system elapsed 10.190.00 10.25 # n - 4000 system.time(ans - survfit(f, type=aalen, se.fit=FALSE, newdata=f$x)) user system elapsed 72.870.05 74.87 I eventually want to use `survfit' on a data set with roughly 50K observations, which I am afraid is going to be painfully slow. I would much appreciate hints/suggestions on how to make `survfit' faster or any other faster alternatives. Ravi, If you don't need standard errors/confidence limits, the rms package's survest and related functions can speed things up greatly if you fit the model using cph(, surv=TRUE). [cph calls coxph, and calls survfit once to estimate the underlying survival curve]. Frank Thanks. Best, Ravi. Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Q] Goodness-of-fit test of a logistic regression model using rms package
On Wed, 1 Sep 2010, GMail (KU) wrote: Hello, I was looking for a way to evaluate the goodness-of-fit of a logistic regression model. After googling, I found that I could use resid(fit, 'gof') method implemented in the rms package. However, since I am not used to the le Cessie-van Houwelingen normal test statistic, I do not know which statistic from the returned from the resid(fit, 'gof') call that I could use to evaluate the goodness of fit. When I ran the resid(fit, 'gof'), I got the following results: ## Sum of squared errors Expected value|H0SD 6844.684594 6805.672315 2.790969 Z P 13.978043 0.00 ## I tried to read the le Cessie and van Houwelingen's original paper, but I found that it required prerequisite knowledge I don't current have. Could someone explain how to interpret the results from resid(fit, 'gof') call? Any help would be much appreciated. Young-Jin Lee Young-Jin, I think everyone has trouble interpreting omnibus tests of lack of fit, so don't feel bad. You just know that something somewhere is probably wrong with the model. I focus on directed tests such as allowing all continuous variables to have nonlinear effects or allowing selected interactions, and finding out how important the complex model terms are. Frank Harrell __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there any package or function perform stepwise variable selection under GEE method?
The absence of stepwise methods works to your advantage, as these yield invalid statistical inference and inflated regression coefficients. Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Thu, 2 Sep 2010, 黃敏慈 wrote: Hi , I use library(gee),library(geepack),library(yags) perform GEE data analysis , but all of them cannot do variable selection! Both step and stepAIC can do variable selection based on AIC criterion under linear regression and glm, but they cannot work when model is based on GEE. I want to ask whether any variable selection function or package under GEE model avaliable now? Thanks! Best, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing COXPH models, one with age as a continuous variable, one with age as a three-level factor
On Thu, 2 Sep 2010, stephenb wrote: sorry to bump in late, but I am doing similar things now and was browsing. IMHO anova is not appropriate here. it applies when the richer model has p more variables than the simpler model. this is not the case here. the competing models use different variables. A simple approach is to have the factor variable in the model and to formally test for added information given by the continuous variable (linear, quadratic, spline, etc). AIC could also be used. you are left with IC. by transforming a continuous variable into categorical you are smoothing, which is the idea of GAM. if you look at what is offered in GAMs you may find better approximations f(age) as well as tools for testing among different f(age) transformations. I don't follow that comment. Smoothing uses the full continuous variable to begin with. A restricted cubic spline function in age is a simple approach. E.g.: require(rms) dd - datadist(mydata); options(datadist='dd') f - cph(Surv(dtime,death) ~ rcs(age,4) + sex, data=mydata) plot(Predict(f, age)) Note that you can always expect the categorized version of age not to fit the data except sometimes when behavior is dictated by law (driving, drinking, military service, medicare). Frank regards. S. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Logistic Regression - Understanding the output (and possibly the test to use!)
On Sun, 5 Sep 2010, st...@wittongilbert.free-online.co.uk wrote: David Winsemius wrote: 1. is glm the right thing to use before I waste my time Yes, but if your outcome variable is binomial then the family argument should be binomial. (And if you thought it should be poisson, then why below did you use gaussian??? Used gaussian below because it was the example from the docs. Thats not my data, its example data which was not binomial. and 2. how do I interpret the result! Result? What result? I do see any description of your data, nor any code. I didn't provide MY DATA because I thought that would complicate things even further. So I was hoping for some advice on how to interpret the result of the example data so that I could then apply that to my data. I haven't even tried to run my data as I couldn't see what the output of the examples was trying to tell me. However, as you've snipped it because it was not relevant thats useful to know. I often find this problem with the examples in the R doc's they suddenly take a dataset that I have no knowledege of and play with it and produce an 'answer'. The examples are presumably provided to enable me to work through how the code works etc. So what I was hoping for was someone to point to somewhere on-line that documents how to use the function for logistic regression and to explain what all that table of data it spits out actually meant. Someone has VERY KINDLY posted me something off list which I believe helps. I think you need to consult a statistician or someone who has taken the time to read that statistical mumbo jumbo you don't want to learn. This mailing list is not set up to be a tutorial site. I have access to stats advice, but I don't (a) want to turn up to them with a pile of paper from R and them say glm() may be the wrong analaysis (b) they don't do R so they can't tell me if I've used R wrongly and (c) I completely expect they'd say which of the values in the table matter since no paper I've ever seen published showed a logistic regression with a table of numbers. Clearly the time to consult a statistician is before you have done any statistical analysis. Frank Harrell I have a couple of Kleinbaum's (et al) other texts and find them to be well written and reasoned, so I suspect the citation above would be as accessible as any. Thank you, that is useful. There is a real problem when buying R text books. None of the bookshops round here stock any which means you can't tell if they are much good. I've looked at some and they seem to be re-writes of the help files. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] some questions about longitudinal study with baseline
Baseline should appear only as a baseline and should be removed from the set of longitudinal responses. This is often done with a merge( ) operation. Frank Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University On Tue, 7 Sep 2010, array chip wrote: Hi all, I asked this before the holiday, didn't get any response. So would like to resend the message, hope to get any fresh attention. Since this is not purely lme technical question, so I also cc-ed R general mailing list, hope to get some suggestions from there as well. I asked some questions on how to analyze longitudinal study with only 2 time points (baseline and a follow-up) previously. I appreciate many useful comments from some members, especially Dennis Murphy and Marc Schwartz who refered the following paper addressing specifically this type of study with only 2 time points: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1121605/ Basically, with only 2 time points (baseline and one follow-up), ANCOVA with follow-up as dependent variable and baseline as covariate should be used: follow-up = a + b*baseline + treatment Now I have a regular longitudinal study with 6 time points, 7 treatments (vehicle, A, B, C, D, F, G), measuring a response variable y. The dataset is attached. I have some questions, and appreciate any suggestions on how to analyze the dataset. dat-read.table(dat.txt,sep='\t',header=T,row.names=NULL) library(MASS) dat$trt-relevel(dat$trt,'vehicle') xyplot(y~time, groups=trt, data=dat, ylim=c(3,10),col=c(1:6,8),lwd=2,type=c('g','a'),xlab='Days',ylab=response, key = list(lines=list(col=c(1:6,8),lty=1,lwd=2), text = list(lab = levels(dat$trt)), columns = 3, title = Treatment)) So as you can see that there is some curvature between glucose level and time, so a quadratic fit might be needed. dat$time2-dat$time*dat$time A straight fit like below seems reasonable: fit-lmer(y~trt*time+trt*time2+(time|id),dat) Checking on random effects, it appears that variance component for random slope is very small, so a simpler model with random intercept only may be sufficient: fit-lmer(y~trt*time+trt*time2+(1|id),dat) Now, I want to incorporate baseline response into the model in order to account for any baseline imbalance. I need to generate a new variable baseline based on glucose levels at time=0: dat-merge(dat, dat[dat$time==0,c('id','y')], by.x='id',by.y='id',all.x=T) colnames(dat)[c(4,6)]-c('y','baseline') so the new fit adding baseline into the mixed model is: fit-lmer(y~baseline+trt*time+trt*time2+(1|id),dat) Now my question is 1). Is the above model a reasonable thing to do? 2) when baseline is included as a covariate, should I remove the data points at baseline from the dataset? I am kind of unsure if it's reasonable to use the baseline both as a covariate and as part of the dependent variable values. Next thing I want to do with this dataset is to do multiple comparisons between each treatment (A, B, C, D, F, G) vs. vehicle at a given time point, say time=56 (the last time points) after adjusting the baseline imbalance. This seems to be done using Dunnet test. When I say after adjusting baseline imbalance, I mean the comparisons should be done based on the difference between time=56 and time=0 (baseline), i.e. is there any difference in the change from baseline for treatment A (or B, C, D, F, G) vs. vehicle?. How can we test this? Will glht() in multcomp work for a lmer fit? If yes, how can I specify the syntax? Finally, with the above model, how to estimate the difference (and the standard error) between time=56 and time=0 (baseline) for each treatment groups? Thank you all for your attention. John __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Where to find R-help options web page
Dear Group: I must be missing something obvious but I can't find from www.r-project.org a link to a page for managing my r-help subscription. Due to getting a new smart phone I'm changing to nabble to manage r-help traffic. I need to turn off receiving mail directly to my e-mail address. Nabble sent a link to turn off e-mail but the r-help mail service rejected nabble's command. Thanks Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Where-to-find-R-help-options-web-page-tp2535123p2535123.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Where to find R-help options web page
Thanks for your note Ista. I had seen that page. I don't see on it where you can modify your subscription characteristics. I'd like to remain subscribed but not have e-mail delivered; that way I can just see messages in nabble. It may be that I'm misunderstanding how to use nabble (and to be able to post on it). Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Where-to-find-R-help-options-web-page-tp2535123p2535148.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Where to find R-help options web page
Aha. Thanks David. I somehow thought that that whole section was for administrators only. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Where-to-find-R-help-options-web-page-tp2535123p2535172.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SF-8 (not 36) questionnaire scoring for R?
I know someone who has R code for SF-36 and perhaps SF-12. Aren't there copyright issues relating to SF-* even if it is reprogrammed? Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/SF-36-questionnaire-scoring-for-R-tp2537198p2537404.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SF-8 (not 36) questionnaire scoring for R?
Yes the company behind that probably received federal funds for some of the research and has been very careful to minimize their contribution to the community. I didn't understand your parenthetical remark. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/SF-36-questionnaire-scoring-for-R-tp2537198p2537480.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] New version of rms package on CRAN
CRAN has a significant update to rms. Windows and unix/linux versions are available and I expect the Mac version to be available soon. The most significant improvement is addition of latex=TRUE arguments to model fitting print methods, made especially for use with Sweave. Here is a summary of changes since the previous version. Changes in version 3.1-0 (2010-09-12) * Fixed gIndex to not use scale for labeling unless character * Changed default na.action in Gls to na.omit and added a note in the help file that na.delete does not work with Gls * Added terms component to Gls fit object (latex was not working) * Added examples in robcov help file testing sandwich covariance estimator * Added reference related to the effects package under help file for plot.Predict * Added more examples and simulations to gIndex * Fixed ancient bug in lrm.fit Fortran code to handle case where initial estimates are nearly perfect (was trying to step halve); thanks: Dan Hogan * Changed survdiffplot to use gray(.85) for bands instead of gray(.95) * Fixed formatting problem in print.psm * Added prStats and reVector functions to rmsMisc.s * Changed formatting of all print.* functions for model fits to use new prStats function * Added latex=TRUE option to all model fit print methods; requires LaTeX package needspace * Re-wrote printing routines to make use of more general model * Removed long and scale options from cph printing-related routines * Prepare for version 2.36-1 of survival package by adding censor=FALSE argument to survfit.coxph * Added type=ccterms to various predict methods * Made type=ccterms the default for partial g-indexes in gIndex, i.e., combine all indirectly related (through interactions) terms * Added Spiegelhalter calibration test to val.prob * Added a check in cph to trigger an error if strata() is used in formula * Fixed drawing of polygon for shaded confidence bands for survplot.survfit (thanks to Patrick Breheny patrick.breh...@uky.edu) * Changed default adjust.subtitle in bplot to depend on ref.zero, thanks to David Winsemius dwinsem...@comcast.net * Used a namespace and simplified referenced to a few survival package functions that survival actually exports Frank E Harrell Jr Professor and ChairmanSchool of Medicine Department of Biostatistics Vanderbilt University ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict.lrm ( Design package)
You sent a private note about this which I just took the time to answer. Please send only one note, and please post my reply to you to r-help. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/predict-lrm-Design-package-tp2546894p2547011.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict.lrm ( Design package)
Thanks Peter. I think you're right. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/predict-lrm-Design-package-tp2546894p2547146.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict.lrm ( Design package) poor performance?
% correct is an improper scoring rule and a discontinuous one to boot. So it will not always agree with more proper scoring rules. When you have a more difficult task, e.g., discriminating more categories, indexes such as the generalized c-index that utilize all the categories will recognize the difficulty of the task and give a lower value. No cause for alarm. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/predict-lrm-Design-package-tp2546894p2550118.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Validation / Training - test data
Split sample validation is highly unstable with your sample size. The rms package can help with bootstrapping or cross-validation, assuming you have all modeling steps repreated for each resample. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Validation-Training-test-data-tp2718523p2718905.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Validation / Training - test data
It all depends on the ultimate use of the results. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Validation-Training-test-data-tp2718523p2719370.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
John, Don't conclude that one category is the most probable when its probability of being equaled or exceeded is a maximum. The first category would always be the winner if that were the case. When you say y=best remember that you are dealing with a probability model. Nothing is forcing you to classify an observation, and unless the category's probability is high, this may be dangerous. You might do well to consider a more smooth approach such as using the generalized roc area (C-index) or its related rank correlation measure Dxy. Also there are odds ratios. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2891623.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
Why assign them at all? Is this a forced choice at gunpoint problem? Remember what probabilities mean. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2909713.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
Well put Greg. The job of the statistician is to produce good estimates (probabilities in this case). Those cannot be translated into action without subject-specific utility functions. Classification during the analysis or publication stage is not necessary. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2951976.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
You still seem to be hung up on making arbitrary classifications. Instead, look at tendencies using odds ratios or rank correlation measures. My book Regression Modeling Strategies covers this. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2953220.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A problem about nomogram--thank you for you help
Please take the time to study the subject matter, and note that a nomogram is just a graphical method. It is not a statistical model or a process. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/A-problem-about-nomogram-thank-you-for-you-help-tp2953209p2953221.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help
I may be missing a point, but the proportional odds model easily gives you odds ratios for Y=j (independent of j by PO assumption). Other options include examining a rank correlation between the linear predictor and Y, or (if Y is numeric and spacings between categories are meaningful) you can get predicted mean Y (see the Mean.lrm in the R rms package, a replacement for the Design package). Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2954274.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple moderated regression steps
How does it help to center the variables? What statistical principles are you using to fit your model and how are you ensuring statistical validity of inferential estimates? Why do it in multiple steps? Why use a mixture of software tools? What is the utility of standardized regression coefficients? Frank stamkiral wrote: hi, ım studying moderated effects of percieved social support and justice world belief on relationship between stress coping strategies and depression level. ı haver never run this analysis before soi ı want to check my steps whether correct or not. first ı run regression in step 1 centered independent variables and centered moderators in step2 two way interactions instep 3 three way interactions as results ı found significiant two way and three way interactions. It ıs important after this for example two way interactions; ı run 2 slopes that fit in with Winnifred's suggestion (http://www.docstoc.com/docs/21151269/Moderated-Multiple-Regression-v5 p.6) my criteria for significence was the centered indipendent vairable (that has significant interaction level) in Block 2- is this way is correct? second to plot this slope ı used Dawson' s 2 way unstandardised excel spreadsheet (http://www.jeremydawson.co.uk/slopes.htm). In spreadsheet it is required to enter unstandardised regression coefficients of IV, moderator and interaction. ın which block the coefficients are true for entering IV and moderator co efficients? ın 1 block (that include main effects) or in 2 block (that include interactions)? one more question: In three way spreadshhet it is necessary to enter variance of IV*Moderator coefficient. ın matrıx which covariance block is correct to enter this value? should ı look at second covariance block or third covariance block. like this,for value of Covariance of IV*Mod1, IV*Mod2 coefficients which covariance block must be taken as correct??? thaks so much Note: excuse me for my English - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/multiple-moderated-regression-steps-tp3637807p3638186.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] multiple moderated regression steps
That concern has nothing to do with centering variables before including them in a model. Your multiple significance testing strategy is not based on statistical principles and will distort all inferences you obtain from the final model. Frank stamkiral wrote: variables were centered (except DV), because in social sciences zero is rarely a meaningful point on a scale (Cohen, Cohen, West, Aiken, 2006). for example in percieved social support questionnaire there is no value as zero. It is a Likert Type questionnaire and on questionnaire 1= strongly yes.7= strongly no. So Aiken and West (1991), suggested to centered predictors and moderators to appoint zero a meaningful value to count in regression equation. also multilevel steps were run to avoid multicollinearity. according to results .05 were accepted as significant (in the coefficient table) and slope test were done for interactions. for plottin slopes unstandardised regression coefficient were used. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/multiple-moderated-regression-steps-tp3637807p3638669.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Speed Advice for R --- avoid data frames
Data frame access is about 5,000 times slower than matrix column access, and 300 times slower than matrix row access. R's data frame operational speed is an amazing 40 data accesses per seconds. I have not seen access numbers this low for decades. How to avoid it? Not easy. One way is to create multiple matrices, and group them as an object. of course, this loses a lot of features of R. Another way is to copy all data used in calculations out of the data frame into a matrix, do the operations, and then copy them back. not ideal, either. In my opinion, this is an R design flow. Data frames are the fundamental unit of much statistical analysis, and should be fast. I think R lacks any indexing into data frames. Turning on indexing of data frames should at least be an optional feature. I hope this message post helps others. /iaw Ivo Welch (ivo.we...@gmail.com) http://www.ivo-welch.info/ __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-helplt;https://stat.ethz.ch/mailman/listinfo/r-helpgt; PLEASE do read the posting guide http://www.R-project.org/** posting-guide.htmllt;http://www.R-project.org/posting-guide.htmlgt; and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Speed-Advice-for-R-avoid-data-frames-tp3640932p3648681.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] GLS - Plotting Graphs with 95% conf interval
The Gls function in the rms package is a frontend to gls that allows you to use all the graphics and other methods available in rms. Frank SamiC wrote: Hi, I am trying to plot the original data with the line of the model using the predict function. I want to add SE to the graph, but not sure how to get them out as the predict function for gls does not appear to allow for SE=TRUE argument. Here is my code so far: f1-formula(MaxNASC40_50~hu3+flcmax+TidalFlag) vf1Exp-varExp(form=~hu3) B1D-gls(f1,correlation=corGaus(form=Lat~Lon, nugget=TRUE),weights=vf1Exp , data=ocean) ochu3-seq(from=2.9,to=4,length=120) ocflc-seq(from=0,to=0.8,length=120) tidal1-rep(c(1),length=120) mydata1-data.frame(TidalFlag=factor(tidal1),hu3=ochu3,flcmax=ocflc) lineNASC1-predict(B1D,newdata=mydata1,type=response) lineNASC1-as.data.frame(lineNASC1) plot(ocean$MaxNASC40_50[ocean$TidalFlag==1]~ocean$flcmax[ocean$TidalFlag==1) lines(lineNASC1$lineNASC1~mydata1$flcmax) Tidal Flag is a factor (so i assume i have to plot seperate graphs for each level). When I have tried to use the effects package I get the error: Error in x$formula : object of type 'symbol' is not subsettable. Also when i have been trying to predict values from a zero inflated negative binomial, I am getting the same line of fit regardless of what is on my X axis (depsite different variables have positive and negative relationships). any imput on any of these problems would be appreciated. Thanks - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/GLS-Plotting-Graphs-with-95-conf-interval-tp3659814p3660953.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Named numeric vectors with the same value but different names return different results when used as thresholds for calculating true positives
Also note that the statistical method you are using does not seem in line with decision theory, and you are assuming that the threshold actually exists. It is seldom the case that the relationship of a predictor with the response is flat on at least one side of the threshold. A smooth prediction model may be in order. Frank Eik Vettorazzi wrote: Hi, Am 11.07.2011 22:57, schrieb Lyndon Estes: ctch[ctch$threshold == 3.5, ] # [1] threshold val tpfptnfntpr fpr tnr fnr #0 rows (or 0-length row.names) this is the very effective FAQ 7.31 trap. http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f Welcome to the first circle of Patrick Burns' R Inferno! Also, unname() is a more intuitive way of removing names. And I think your code is quite inefficient, because you calculate quantiles many times, which involves repeated ordering of x, and you may use a inefficient size of bin (either to small and therefore calculating the same split many times or to large and then missing some splits). I'm a bit puzzled what is x and y in your code, so any further advise is vague but you might have a look at any package that calculates ROC-curves such as ROCR or pROC (and many more). Hth -- Eik Vettorazzi Department of Medical Biometry and Epidemiology University Medical Center Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Named-numeric-vectors-with-the-same-value-but-different-names-return-different-results-when-used-as-s-tp3660833p3662030.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum weights of independent variables across models (AIC)
Why go to so much trouble? Why not fit a single full model and use it? Even better why not use a quadratic penalty on the full model to get optimum cross-validation? Frank nofunsally wrote: Hello, I'd like to sum the weights of each independent variable across linear models that have been evaluated using AIC. For example: library(MuMIn) data(Cement) lm1 - lm(y ~ ., data = Cement) dd - dredge(lm1, beta = TRUE, eval = TRUE, rank = AICc) get.models(dd, subset = delta 4) There are 5 models with a Delta AIC Score of less than 4. I would like to sum the weights for each of the independent variables across the five models. How can I do that? Thanks, Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Sum-weights-of-independent-variables-across-models-AIC-tp3666306p3668513.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] validate survival with val.surv
The documentation for val.surv tried to be clear about that. Note that val.surv is only for the case where the predictions were developed on a separate dataset, i.e., that the validation is truly 'external'. Frank yz wrote: Dear R users: I want to externally validate a model with val.surv. Can I use only calculated survival (at 1 year) and actual survival? Or I needed the survival function and actual survival. Thanks *Yao Zhu* *Department of Urology Fudan University Shanghai Cancer Center Shanghai, China* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/validate-survival-with-val-surv-tp3669022p3669051.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Design Survplot performance
Replace the soon-to-be Design with rms. Specify surv=TRUE to cph so that approximate rather than fully correct standard errors will be computed by survplot/survest. Frank christopher.a.hane wrote: Hi, I have a Cox PH model that's large for my server, 120K rows, ~300 factors with 3 levels each, so about 1000 columns. The 300 factors all pass a preliminary test of association with the outcome. Solving this with cph from Design takes about 3 hours. I have created the fit with x=T, y=T to save the model data. I want to validate the PH assumption by calling survplot(fit, gender=NA, logt=TRUE, loglog=TRUE) for many of the factors (here gender is one column name). Just creating this one plot takes 40m. I'd be happy to sample from the fitted model to create these tests, or figure out another way to check assumptions in the model. Has anyone done something similar, or have other suggestions for tests that scale better? thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Design-Survplot-performance-tp3684580p3684626.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in plotmath
Under platform x86_64-pc-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 2 minor 13.1 year 2011 month 07 day08 svn rev56322 language R version.string R version 2.13.1 (2011-07-08) I get a double quote mark in place of = in the y-axis label when I run the following. set.seed(6) x - rnorm(30, 100, 20) xs - seq(50, 150, length=150) cdf - pnorm(xs, 100, 20) plot(xs, cdf, type='l', ylim=c(0,1), xlab=expression(x), ylab=expression(paste(Prob[, X = x, ]))) lines(ecdf(x), cex=.5) The problem also occurs if I use instead ylab=expression(prob(X = x))) All is well if I remove = but I need =. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Error-in-plotmath-tp3708153p3708153.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make a nomogam and Calibration plot
Kindly do not attach questions in a separate document. Install and read the documentation for the R rms package, and see handouts at http://biostat.mc.vanderbilt.edu/rms Frank sytangping wrote: Dear R users, I am a new R user and something stops me when I try to write a academic article. I want to make a nomogram to predict the risk of prostate cancer (PCa) using several factors which have been selected from the Logistic regression run under the SPSS. Always, a calibration plot is needed to validate the prediction accuracy of the nomogram. However, I tried many times and read a lot of posts with respect to this topic but I still couldn't figure out how to draw the nomogram and the calibration plot. My dataset and questions in detail are shown in two attached files. It will be very grateful if someone can save his/her time to help for my questions. Warmest regards! Ping Tang http://r.789695.n4.nabble.com/file/n3710068/Dataset.xls Dataset.xls http://r.789695.n4.nabble.com/file/n3710068/R_help.doc R_help.doc - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/How-to-make-a-nomogam-and-Calibration-plot-tp3710068p3710126.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make a nomogam and Calibration plot
The nomogram you included was produced by the Design package, the precursor to the rms package. You will have to take the time to intensively read the rms package documentation. Note that how you developed the model (e.g., allowing for non-linearity in log PSA, not using stepwise regression which invalidates the results, making sure all clinically relevant predictors are in the model, ...) is the most important step. The process you are going through generally requires an M.S. in biostatistics. Frank sytangping wrote: Dear Harrell, Many thanks for your quick response! However, after try and try, I still have difficulty to solve my questions. I post my questions again. I hope someone can help me run the data and draw the nomogram and calibration plot for me. I know that is not good but indeed I have no way to go. The problems almost drove me mad! Best regards! Ping Tang Dear R users, I am a new R user and something stops me when I try to write a academic article. I want to make a nomogram to predict the risk of prostate cancer (PCa) using several factors which have been selected from the Logistic regression run under the SPSS. Always, a calibration plot is needed to validate the prediction accuracy of the nomogram. However, I tried many times and read a lot of posts with respect to this topic but I still couldn't figure out how to draw the nomogram and the calibration plot. Attached file is the dataset for the research. It will be very grateful if someone can save his/her time to help for my questions. Warmest regards! Logistic Regression Classification Tablea,b ObservedPredicted Pca-YN Percentage Correct 0 1 Step 0Pca-YN 0 295 0 100.0 1 218 0 .0 Overall Percentage 57.5 Variables in the Equation B S.E.Walddf Sig.Exp(B) 95.0% C.I.for EXP(B) Lower Upper Step 1a Age .031.0154.491 1 .0341.032 1.002 1.062 DRE 1.173 .26619.492 1 .0003.233 1.920 5.443 LogPV -2.857 .50931.532 1 .000.057.021.156 LogPSA 2.316 .24688.416 1 .00010.132 6.253 16.419 Constant-1.024 1.273 .6481 .421.359 The equation: Probability = e-1.024+0.31age+1.173DRE+-2.857LogPV+2.316LogPSA 1+e-1.024+0.31age+1.173DRE+-2.857LogPV+2.316LogPSA My questions are, 1.How to draw a nomogram (similar to the below figure 1) to predict the probability of cancer using R? 2. How to make the Calibration plot (similar to the below figure 2) which used to validate the prediction accuracy of the nomogram using R? And how to calculate the concordance index (C-index) ? http://r.789695.n4.nabble.com/file/n3714477/untitled.jpg http://r.789695.n4.nabble.com/file/n3714477/%E9%99%84%E4%BB%B62.jpg http://r.789695.n4.nabble.com/file/n3714477/Dataset.xls Dataset.xls - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/How-to-make-a-nomogam-and-Calibration-plot-tp3710068p3715336.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make a nomogam and Calibration plot
I want to add a comment related to the calibration plot that was presented in a previous post (which probably cannot be done optimally in SPSS). The plot lacks sufficient resolution in the x-axis values that are calibrated. It is far better to use loess to estimate a smooth nonparametric calibration curve, with no (arbitrary) binning of x. And it is not adequate to validate only the 3-4 points that were plotted. Frank yz wrote: Nomogram is user-friendly, but the equation is also acceptable. It should be kept in mind that the process of model development really counts. BTW: You can calculate C-index (AUC) in SPSS. Calibration plot can also be plotted (may be manually) from the result of SPSS. *Yao Zhu* *Department of Urology Fudan University Shanghai Cancer Center Shanghai, China* 2011/8/1 sytangping lt;surgeon666...@yahoo.com.cngt; Dear R users, I am a new R user and something stops me when I try to write a academic article. I want to make a nomogram to predict the risk of prostate cancer (PCa) using several factors which have been selected from the Logistic regression run under the SPSS. Always, a calibration plot is needed to validate the prediction accuracy of the nomogram. However, I tried many times and read a lot of posts with respect to this topic but I still couldn't figure out how to draw the nomogram and the calibration plot. My dataset and questions in detail are shown in two attached files. It will be very grateful if someone can save his/her time to help for my questions. Warmest regards! Ping Tang http://r.789695.n4.nabble.com/file/n3710068/Dataset.xls Dataset.xls http://r.789695.n4.nabble.com/file/n3710068/R_help.doc R_help.doc -- View this message in context: http://r.789695.n4.nabble.com/How-to-make-a-nomogam-and-Calibration-plot-tp3710068p3710068.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/How-to-make-a-nomogam-and-Calibration-plot-tp3710068p3717519.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Goodness of fit of binary logistic model
Please provide the data or better the R code for simulating the data that shows the problem. Then we can look further into this. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Goodness-of-fit-of-binary-logistic-model-tp3721242p3721997.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Goodness of fit of binary logistic model
Exactly right Peter. Thanks. There should be some way for me to detect such situations so as to not result in an impressive P-value. Ideas welcomed! This is a great example why users should post a toy example on the first posting, as we can immediately see that this model MUST fit the data, so that any evidence for lack of fit has to be misleading. Frank Peter Dalgaard-2 wrote: On Aug 5, 2011, at 23:16 , Paul Smith wrote: Thanks, Frank. The following piece of code generate data, which exhibit the problem I reported: - set.seed(123) intercept = -1.32 beta = 1.36 xtest = rbinom(1000,1,0.5) linpred = intercept + xtest*beta prob = exp(linpred)/(1 + exp(linpred)) runis = runif(1000,0,1) ytest = ifelse(runis prob,1,0) xtest - as.factor(xtest) ytest - as.factor(ytest) require(rms) model - lrm(ytest ~ xtest,x=T,y=T) model residuals.lrm(model,'gof') - Basically, what you have is zero divided by zero, except that floating point inaccuracy turns it into the ratio of two small numbers. So the Z statistic is effectively rubbish. This comes about because the SSE minus its expectation has effectively zero variance, which makes it rather useless for testing whether the model fits. Since the model is basically a full model for a 2x2 table, it is not surprising to me that goodness of fit tests behave poorly. In fact, I would conjecture that no sensible g.o.f. test exists for that case. Paul On Fri, Aug 5, 2011 at 7:58 PM, Frank Harrell lt;f.harr...@vanderbilt.edugt; wrote: Please provide the data or better the R code for simulating the data that shows the problem. Then we can look further into this. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Goodness-of-fit-of-binary-logistic-model-tp3721242p3721997.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com Døden skal tape! --- Nordahl Grieg __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Goodness-of-fit-of-binary-logistic-model-tp3721242p3723388.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with predict for cr model using rms package
The help files do not make this clear, but predict and Mean in rms do not support computation of predicted means for the continuation ratio model. The help file for cr.setup shows you how to compute the unconditional probabilities you would need for doing that. Frank apeer wrote: Dear list, I'm currently trying to use the rms package to get predicted ordinal responses from a conditional ratio model. As you will see below, my model seems to fit well to the data, however, I'm having trouble getting predicted mean (or fitted) ordinal response values using the predict function. I have a feeling I'm missing something simple, however I haven't been able to determine what that is. Thanks in advance for your help. Adam dd - datadist(all.data2.stand) options(datadist='dd') bp.cat2 - all.data2.stand$bp.cat2 u - cr.setup(bp.cat2) u b.mean -rep(all.data2.stand$b.mean, u$reps) r.mean -rep(all.data2.stand$r.mean, u$reps) mean.ova.energy - rep(all.data2.stand$mean.ova.energy, u$reps) y - (u$y) # constructed binary response cohort - u$cohort attach(all.data2.stand[u$subs,]) dd - datadist(dd, cohort) ord.cr - lrm(y ~ cohort + mean.ova.energy + b.mean + r.mean, x=TRUE, y=TRUE, na.action=na.delete) summary(ord.cr) p.cr - predict(ord.cr, all.data2.stand, type='mean', codes=TRUE) pred.mean2 - data.frame(p.cr) pred.mean2 ord.cr - lrm(y ~ cohort + mean.ova.energy + b.mean + r.mean, x=TRUE, y=TRUE, na.action=na.delete) summary(ord.cr) Effects Response : y Factor Low HighDiff. EffectS.E. mean.ova.energy 0.36902 1.00810 0.63906 -2.732000e+01 11.74 Odds Ratio 0.36902 1.00810 0.63906 0.00e+00NA b.mean -0.98219 0.18109 1.16330 -6.76e+00 3.14 Odds Ratio -0.98219 0.18109 1.16330 0.00e+00NA r.mean -0.50416 0.89758 1.40170 1.175000e+01 4.84 Odds Ratio -0.50416 0.89758 1.40170 1.270308e+05NA cohort - bp.cat2=2:all 1.0 2.0 NA 4.307000e+01 18.37 Odds Ratio 1.0 2.0 NA 5.055545e+18NA cohort - bp.cat2=3:all 1.0 3.0 NA 5.538000e+01 23.52 Odds Ratio 1.0 3.0 NA 1.130317e+24NA Lower 0.95 Upper 0.95 -50.32 -4.31e+00 0.001.00e-02 -12.92 -6.10e-01 0.005.40e-01 2.272.124000e+01 9.661.671337e+09 7.077.907000e+01 1171.102.182447e+34 9.291.014700e+02 10876.061.174706e+44 ord.cr Logistic Regression Model lrm(formula = y ~ cohort + mean.ova.energy + b.mean + r.mean, na.action = na.delete, x = TRUE, y = TRUE) Model Likelihood Discrimination Rank Discrim. Ratio TestIndexes Indexes Obs 182LR chi2 174.09 R2 0.953 C 0.998 0 143d.f. 5g33.065 Dxy 0.996 139Pr( chi2) 0.0001gr 2.290780e+14 gamma 0.996 max |deriv| 6e-07 gp0.338 tau-a 0.337 Brier 0.013 Coef S.E.Wald Z Pr(|Z|) Intercept -20.6064 8.5979 -2.40 0.0165 cohort=bp.cat2=2 43.0670 18.3684 2.34 0.0190 cohort=bp.cat2=3 55.3845 23.5159 2.36 0.0185 mean.ova.energy -42.7469 18.3663 -2.33 0.0199 b.mean-5.8150 2.6984 -2.16 0.0312 r.mean 8.3840 3.4523 2.43 0.0152 p.cr - predict(ord.cr, all.data2.stand, type='mean', codes=TRUE) Error in model.frame.default(Terms, newdata, na.action = na.action, ...) : variable lengths differ (found for 'mean.ova.energy') In addition: Warning message: 'newdata' had 72 rows but variable(s) found have 182 rows pred.mean2 - data.frame(p.cr) Error in data.frame(p.cr) : object 'p.cr' not found pred.mean2 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/help-with-predict-for-cr-model-using-rms-package-tp3723475p3723763.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help annotating plot.xmean.ordinaly in rms package
Chris for my purposes I need the output tied to the model's variable names, so I'm not sure I want to add new naming options to the function. You might just add a line after nam - nam[-1] to specify your vector of names, starting with the response variable, e.g., nam - c('velocity m/s','age','blood pressure'). Frank Christopher DiBari wrote: Hello List: Does anyone know if there's a way to annotate the plots produced by the plot.xmean.ordinaly function in the rms package. I'd like to produce publication quality figures, but first need to annotate the axis labels. Is there a way to do that? I appreciate any advice. Chris [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/help-annotating-plot-xmean-ordinaly-in-rms-package-tp3724864p3724937.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] useR!2012 Nashville TN: Your Input Needed
Dear R and S-Plus Community: The 2012 R User Conference - useR!2012 - will be held in Nashville Tennessee USA, June 12-15, 2012 on the campus of Vanderbilt University. We would like to begin estimating the number of attendees, their area of interest, and the number seeking hotel vs. lower-cost housing. If you are likely to attend useR!2012 please go to the following link to answer two questions: https://redcap.vanderbilt.edu/surveys/?s=2wiqWo There are many fun things to do in Nashville -- Music City USA -- around the time of the meeting. Vanderbilt is 2.2 miles (3.5 km) from the center of the action. I hope to see many of you at useR!2011 at the University of Warwick in Coventry UK in just over a week. If anyone knows of another site to which I should send this announcement please e-mail me your suggestion. Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Revolutions Blog: July Roundup
David, as always this is a terrific roundup. I note in passing that the big data logistic function rxLogit used on the 1B observation dataset (impressive run time!) inappropriately used the t distribution for testing the coefficients in the logistic model [at least if the notation used is any clue]. It should have used the normal distribution. Cheers, Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Revolutions-Blog-July-Roundup-tp3731426p3731597.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] useR! 2012 Nashville TN: Your Input Needed
So far we have received over 70 responses to our survey. If you have NOT responded already and are likely to attend useR! 2012, please take the extremely short survey today. The link is below. More information about Nashville may be seen at http://biostat.mc.vanderbilt.edu/UseR-2012 In a day or two the conference web address will be http://www.r-project.org/useR-2012/ Thanks! -- The 2012 R User Conference - useR! 2012 - will be held in Nashville Tennessee USA, June 12-15, 2012 on the campus of Vanderbilt University. We would like to begin estimating the number of attendees, their area of interest, and the number seeking hotel vs. lower-cost housing. If you are likely to attend useR! 2012 please go to the following link to answer two questions: https://redcap.vanderbilt.edu/surveys/?s=2wiqWo There are many fun things to do in Nashville -- Music City USA -- around the time of the meeting. Vanderbilt is 2.2 miles (3.5 km) from the center of the action. I hope to see many of you at useR! 2011 at the University of Warwick in Coventry UK in just over a week. If anyone knows of another site to which I should send this announcement please e-mail me your suggestion. Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fit.mult.impute() in Hmisc
For your approach how do you know that either summary or vcov used multiple imputation? You are using a non-rms fitting function so be careful. Compare with using the lrm fitting function. Also repace Design with the rms package. Please omit confidentiality notices from your e-mails. Frank I tried multiple imputation with aregImpute() and fit.mult.impute() in Hmisc 3.8-3 (June 2010) and R-2.12.1. The warning message below suggests that summary(f) of fit.mult.impute() would only use the last imputed data set. Thus, the whole imputation process is ignored. Not using a Design fitting function; summary(fit) will use standard errors, t, P from last imputation only. Use vcov(fit) to get the correct covariance matrix, sqrt(diag(vcov(fit))) to get s.e. But the standard errors in summary(f) agree with the values from sqrt(diag(vcov(f))) to the 4th decimal point. It would seem that summary(f) actually adjusts for multiple imputation? Does summary(f) in Hmisc 3.8-3 actually adjust for MI? If it does not adjust for MI, then how do I get the MI-adjusted coefficients and standard errors? I can't seem to find answers in the documentations, including rereading section 8.10 of the Harrell (2001) book Googling located a thread in R-help back in 2003, which seemed dated. Many thanks in advance for the help, Yuelin. http://idecide.mskcc.org --- library(Hmisc) Loading required package: survival Loading required package: splines data(kyphosis, package = rpart) kp - lapply(kyphosis, function(x) + { is.na(x) - sample(1:length(x), size = 10); x }) kp - data.frame(kp) kp$kyp - kp$Kyphosis == present set.seed(7) imp - aregImpute( ~ kyp + Age + Start + Number, dat = kp, n.impute = 10, + type = pmm, match = closest) Iteration 13 f - fit.mult.impute(kyp ~ Age + Start + Number, fitter=glm, xtrans=imp, + family = binomial, data = kp) Variance Inflation Factors Due to Imputation: (Intercept) Age Start Number 1.061.281.171.12 Rate of Missing Information: (Intercept) Age Start Number 0.060.220.140.10 d.f. for t-distribution for Tests of Single Coefficients: (Intercept) Age Start Number 2533.47 193.45 435.79 830.08 The following fit components were averaged over the 10 model fits: fitted.values linear.predictors Warning message: In fit.mult.impute(kyp ~ Age + Start + Number, fitter = glm, xtrans = imp, : Not using a Design fitting function; summary(fit) will use standard errors, t, P from last imputation only. Use vcov(fit) to get the correct covariance matrix, sqrt(diag(vcov(fit))) to get s.e. f Call: fitter(formula = formula, family = binomial, data = completed.data) Coefficients: (Intercept) AgeStart Number -3.6971 0.0118 -0.1979 0.6937 Degrees of Freedom: 80 Total (i.e. Null); 77 Residual Null Deviance: 80.5 Residual Deviance: 58 AIC: 66 sqrt(diag(vcov(f))) (Intercept) Age Start Number 1.5444782 0.0063984 0.0652068 0.2454408 -0.1979/0.0652068 [1] -3.0350 summary(f) Call: fitter(formula = formula, family = binomial, data = completed.data) Deviance Residuals: Min 1Q Median 3Q Max -1.240 -0.618 -0.288 -0.109 2.409 Coefficients: Estimate Std. Error z value Pr(|z|) (Intercept) -3.6971 1.5445 -2.39 0.0167 Age 0.0118 0.00641.85 0.0649 Start-0.1979 0.0652 -3.03 0.0024 Number0.6937 0.24542.83 0.0047 (Dispersion parameter for binomial family taken to be 1) Null deviance: 80.508 on 80 degrees of freedom Residual deviance: 57.965 on 77 degrees of freedom AIC: 65.97 Number of Fisher Scoring iterations: 5 - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/fit-mult-impute-in-Hmisc-tp3419037p3741881.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Labelling all variables at once (using Hmisc label)
Do require(Hmisc); ?label to see the help file for label. It will show you how to do this: Monsieur Do wrote: The labels can contain much more than just names. In my case, they are variable descriptions (items from a questionnaire). I need to keep the names as they are, hence the need for Hmisc's labels. On Mon, Aug 15, 2011 at 3:53 PM, Monsieur Do lt;nonaupourr...@yahoo.cagt; wrote: I have a dataset and a list of labels. I simply want to apply the labels to the variables, all at once. The only way I was able to do it was using a loop: for (i in 1:length(data)) label(data[,i]) - data.labels[i] I'd like to find the non-loop way to do it, using apply or the like... Any help appreciated. Would you not be better off with names()? data - 1:10 data.labels - letters[1:10] names(data) - data.labels data a b c d e f g h i j 1 2 3 4 5 6 7 8 9 10 What are you trying to do with label() that names() doesn't accomplish? -- Sarah Goslee http://www.functionaldiversity.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Labelling-all-variables-at-once-using-Hmisc-label-tp3745660p3746928.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calibration curve for cph()
David Winsemius wrote: A combination of Predict (your newdata), cut2, and the plotting function of your choice ought to suffice. But thought that cross-validation was an option. Not at console at the moment (just off airplane.) Sent from my iPhone On Aug 15, 2011, at 5:26 PM, array chip lt;arrayprof...@yahoo.comgt; wrote: is there a R function that produces calibration curve on an independetn data automatically, just like what calibrate() does on the training data itself? Thanks John From: Comcast lt;dwinsem...@comcast.netgt; To: array chip lt;arrayprof...@yahoo.comgt; Cc: r-help@r-project.org lt;r-help@r-project.orggt; Sent: Monday, August 15, 2011 2:04 PM Subject: Re: [R] calibration curve for cph() Build a prediction function using 'Function' that gets applied to set2. Calibrate and validate. -- David Sent from my iPhone On Aug 15, 2011, at 11:31 AM, array chip lt;arrayprof...@yahoo.comgt; wrote: Hi, the calibrate.cph() function in rms package generate calibration curve for Cox model on the same dataset where the model was derived using bootstrapping or cross-validation. If I have the model built on dataset 1, and now I want to produce a calibration curve for this model on an independent dataset 2, how can I do that? Thanks John [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/calibration-curve-for-cph-tp3745328p3746931.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] contrast package with interactions in gls model
You did not follow the posting guide. You did not specify which packages you were using. It appears that you are mixing the rms package with some other functions such as gls. If you want to use rms, use the Gls function instead of gls, and type ?contrast.rms to see examples of the use of contrast(). Frank Marylin Bejarano wrote: Hi! I try to explain the efffect of (1) forest where i took samples's soils (* Lugar*: categorical variable with three levels), (2) nitrogen addition treatments (*Tra*: categorical variable with two levels) on total carbon concentration's soil samples (*C: *continue* *variable) during four months of sampling (*Time:* categorical and ordered variable with four levels). I fitted the following final model with gls function: var1-varIdent(form=~ 1| Lugar* factor(Time)) FINAL-gls(C ~ Tra+ Lugar+ Time + Time*Tra + Tra*Lugar, data=datos, weights=var1, method=REML) the summary function resulted in this first data's set (I omit correlation's matrix): Generalized least squares fit by REML Model: C ~ Tra + Lugar + Time + Time * Tra + Tra * Lugar Data: datos AIC BIClogLik 1129.458 1191.982 -540.7291 Variance function: Structure: Different standard deviations per stratum Formula: ~1 | Lugar * factor(Time) Parameter estimates: Chixchulub*0 Xmatkuil*0Hobonil*0 Chixchulub*2 Xmatkuil*2 Hobonil*2 Chixchulub*3 1.0000.77593240.53008110.96405590.8200742 0.29665450.9553168 Xmatkuil*3Hobonil*3 Chixchulub*4 Xmatkuil*4Hobonil*4 1.73502900.34302860.62416580.95739220.4651515 Coefficients: Value Std.Error t-value p-value (Intercept) 260.48540 16.48991 15.796653 0. Tra0 -9.38703 23.74893 -0.395261 0.6935 LugarChixchulub -0.15377 19.60260 -0.007845 0.9938 Lugar Hobonil-173.21354 15.89736 -10.895741 0. Time2-14.74999 14.55909 -1.013112 0.3135 Time3 14.42177 15.64594 0.921758 0.3589 Time4 14.77803 16.72367 0.883659 0.3790 Tra0:Time2 17.93859 20.78257 0.863156 0.3901 Tra0:Time3-48.77118 22.17628 -2.199250 0.0302 Tra0:Time4-52.63611 23.20192 -2.268610 0.0254 Tra0:LugarChixchulub 74.43956 28.11275 2.647893 0.0094 Tra0:Lugar Hobonil 43.03416 23.32391 1.845066 0.0680 anova function generated this table: enom. DF: 100 numDF F-value p-value (Intercept) 1 1693.1234 .0001 Tra 15.3225 0.0231 Lugar 2 247.7047 .0001 Time30.4767 0.6992 Tra:Time36.0531 0.0008 Tra:Lugar 23.5061 0.0338 I want to detetect differences between levels of Tra:Lugar interaction. For example: 1. Tra0:LugarChixchulub vs Tra1:LugarChixchulub (between treatment levels for same forest) or, 2. Tra0:LugarChixchulub vs Tra0:LugarHobonil(for same treatment among forests levels) I used function contrast (package contrast) whit following script to probe the hypotesis 1.: con-contrast(FINAL, list(Lugar= 'Xmatkuil', Tra=1), list(Lugar='Xmatkuil', Tra = 0)) but i found this error message: Error en gendata.default(fit = list(modelStruct = list(varStruct = c(-0.253689933940456, : not enough factors I would be grateful if somebody tell me I'm doing wrong with my contrast function script. Thanks in advance, Marylin Bejarano PHd candidate in Ecology Institute of Mexico's National Autonomous University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/contrast-package-with-interactions-in-gls-model-tp3751255p3751269.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Labelling all variables at once (using Hmisc label)
I'm puzzled. I provided a solution that did not require looping. Frank Monsieur Do wrote: I did read the help page before posting, but didn't find the direct way... My function here works fine. But just for learning purposes, I'd like to be able to avoid the loop... with.labels - function(x, labels=NULL, csvfile=NULL) { if(!is.null(csvfile)) labels - read.csv(csvfile, sep=\t, header=F, stringsAsFactors=F)[,1] for(i in 1:length(x)) label(x[,i]) - labels[i] if(length(labels) != length(x)) cat(Warning: data and labels are not of same length\n) return(x) } Thanks Message: 11 Date: Tue, 16 Aug 2011 04:22:07 -0700 (PDT) From: Frank Harrell lt;f.harr...@vanderbilt.edugt; To: r-help@r-project.org Subject: Re: [R] Labelling all variables at once (using Hmisc label) Message-ID: 1313493727519-3746928.p...@n4.nabble.com Content-Type: text/plain; charset=UTF-8 Do require(Hmisc); ?label to see the help file for label. It will show you how to do this: Monsieur Do wrote: I have a dataset and a list of labels. I simply want to apply the labels to the variables, all at once. The only way I was able to do it was using a loop: for (i in 1:length(data)) label(data[,i]) -data.labels[i] I'd like to find the non-loop way to do it, using apply or the like... Any help appreciated. - Frank Harrell Department of Biostatistics, Vanderbilt University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Labelling-all-variables-at-once-using-Hmisc-label-tp3745660p3751273.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] contrast package with interactions in gls model
Thanks. The reason I responded that way is that the error message mentioned a function gendata that is in the rms and Design packages. I guess it's in the contrast package too. Frank MarylinBejarano wrote: Hi, Sorry! I fited model with gls of nlme package. And I trying to contrast interaction with function contrast of contrast package. I thanks for your reply, Regards. Marylin Bejarano - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/contrast-package-with-interactions-in-gls-model-tp3751255p3753135.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Labelling all variables at once (using Hmisc label)
Sorry about the nabble problem. At any rate, do require(Hmisc) then ?label to see how to associate a vector of labels with all the variables in a data frame at once. Frank do999 wrote: Indeed, as David pointed out, all the portion that used courier font (all the good stuff) was absent from the email posting. Thanks for your answers. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Labelling-all-variables-at-once-using-Hmisc-label-tp3745660p3756459.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.