[R] Regression Modeling Strategies and the rms Package 4-Day Short Course May 2018
*RMS Short Course 2018* Frank E. Harrell, Jr., Ph.D., Professor Department of Biostatistics, Vanderbilt University School of Medicine fharrell.com @f2harrell *May 15-18, 2018* With Optional R Workshop May 14 9:00am - 4:00pm Alumni Hall Vanderbilt University Nashville Tennessee USA See http://biostat.mc.vanderbilt.edu/RMSShortCourse2018 for details. The course includes statistical methodology, case studies, and use of the R rms package. Please email interest to Jill Shell-- Frank E Harrell Jr Professor School of Medicine Department of *Biostatistics* *Vanderbilt University* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help understanding why glm and lrm.fit runs with my data, but lrm does not
Fixed 'maxiter' in the help file. Thanks. Please give the original source of that dataset. That dataset is a tiny sample of GUSTO-I and not large enough to fit this model very reliably. A nomogram using the full dataset (not publicly available to my knowledge) is already available in http://biostat.mc.vanderbilt.edu/tmp/bbr.pdf Use lrm, not lrm.fit for this. Adding maxit=20 will probably make it work on the small dataset but still not clear on why you are using this dataset. Frank -- Frank E Harrell Jr Professor School of Medicine Department of *Biostatistics* *Vanderbilt University* On Thu, Sep 14, 2017 at 10:48 AM, David Winsemiuswrote: > > > On Sep 14, 2017, at 12:30 AM, Bonnett, Laura < > l.j.bonn...@liverpool.ac.uk> wrote: > > > > Dear all, > > > > I am using the publically available GustoW dataset. The exact version I > am using is available here: https://na01.safelinks. > protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Fopen%3Fid% > 3D0B4oZ2TQA0PAoUm85UzBFNjZ0Ulk=02%7C01%7Cf.harrell%40vanderbilt.edu% > 7Cadb58b13c3994f89209708d4fb8807f0%7Cba5a7f39e3be4ab3b45067fa80fa > ecad%7C0%7C0%7C636410009046132507=UZgX3%2Ba% > 2FU2Eeh8ybHMI6JnF0Npd2XJPXAzlmtEhDgOY%3D=0 > > > > I would like to produce a nomogram for 5 covariates - AGE, HYP, KILLIP, > HRT and ANT. I have successfully fitted a logistic regression model using > the "glm" function as shown below. > > > > library(rms) > > gusto <- spss.get("GustoW.sav") > > fit <- glm(DAY30~AGE+HYP+factor(KILLIP)+HRT+ANT,family= > binomial(link="logit"),data=gusto,x=TRUE,y=TRUE) > > > > However, my review of the literature and other websites suggest I need > to use "lrm" for the purposes of producing a nomogram. When I run the > command using "lrm" (see below) I get an error message saying: > > Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) : > > Unable to fit model using "lrm.fit" > > > > My code is as follows: > > gusto2 <- gusto[,c(1,3,5,8,9,10)] > > gusto2$HYP <- factor(gusto2$HYP, labels=c("No","Yes")) > > gusto2$KILLIP <- factor(gusto2$KILLIP, labels=c("1","2","3","4")) > > gusto2$HRT <- factor(gusto2$HRT, labels=c("No","Yes")) > > gusto2$ANT <- factor(gusto2$ANT, labels=c("No","Yes")) > > var.labels=c(DAY30="30-day Mortality", AGE="Age in Years", > KILLIP="Killip Class", HYP="Hypertension", HRT="Tachycardia", ANT="Anterior > Infarct Location") > > label(gusto2)=lapply(names(var.labels),function(x) > label(gusto2[,x])=var.labels[x]) > > > > ddist = datadist(gusto2) > > options(datadist='ddist') > > > > fit1 <- lrm(DAY30~AGE+HYP+KILLIP+HRT+ANT,gusto2) > > > > Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) : > > Unable to fit model using "lrm.fit" > > > > Online solutions to this problem involve checking whether any variables > are redundant. However, the results for my data suggest that none are. > > redun(~AGE+HYP+KILLIP+HRT+ANT,gusto2) > > > > Redundancy Analysis > > > > redun(formula = ~AGE + HYP + KILLIP + HRT + ANT, data = gusto2) > > > > n: 2188 p: 5nk: 3 > > > > Number of NAs: 0 > > > > Transformation of target variables forced to be linear > > > > R-squared cutoff: 0.9 Type: ordinary > > > > R^2 with which each variable can be predicted from all other variables: > > > > AGEHYP KILLIPHRTANT > > 0.028 0.032 0.053 0.046 0.040 > > > > No redundant variables > > > > I've also tried just considering "lrm.fit" and that code seems to run > without error too: > > lrm.fit(cbind(gusto2$AGE,gusto2$KILLIP,gusto2$HYP, > gusto2$HRT,gusto2$ANT),gusto2$DAY30) > > > > Logistic Regression Model > > > > lrm.fit(x = cbind(gusto2$AGE, gusto2$KILLIP, gusto2$HYP, gusto2$HRT, > > gusto2$ANT), y = gusto2$DAY30) > > > > Model Likelihood DiscriminationRank > Discrim. > > Ratio Test Indexes Indexes > > Obs 2188LR chi2 233.59R2 0.273C > 0.846 > > 0 2053d.f. 5g1.642Dxy > 0.691 > > 1135Pr(> chi2) <0.0001gr 5.165gamma > 0.696 > > max |deriv| 4e-09 gp 0.079tau-a > 0.080 > >Brier0.048 > > > > Coef S.E. Wald Z Pr(>|Z|) > > Intercept -13.8515 0.9694 -14.29 <0.0001 > > x[1]0.0989 0.0103 9.58 <0.0001 > > x[2]0.9030 0.1510 5.98 <0.0001 > > x[3]1.3576 0.2570 5.28 <0.0001 > > x[4]0.6884 0.2034 3.38 0.0007 > > x[5]0.6327 0.2003 3.16 0.0016 > > > > I was therefore hoping someone would explain why the "lrm" code is > producing an error message, while "lrm.fit" and "glm" do not. In > particular I would welcome a solution to ensure I can produce a nomogram. > > Try this: > > lrm # look at code, do a search on "fail" > ?lrm.fit # read the structure of the returned value of lrm.fit > > my.fit <- lrm.fit(x =
Re: [R] rms::latex.anova broken?
In recent versions of rms on CRAN there was a non-downward compatible change. To get latex output for summary, anova, and print on fit objects you leave off file="" (because we're usually using knitr anyway) and use options(prType='latex') anova(f) # LaTeX output You can use options(prType='html') to get special html output for RMarkdown html reports and html notebooks. There are small changes in Hmisc, e.g., leave off file="" for latex(describe()). Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of *Biostatistics* *Vanderbilt University* On Wed, Feb 8, 2017 at 12:19 PM, Kevin E. Thorpewrote: > Hi Frank. > > I sent this to r-help yesterday but have not received any answers from the > list readers. I thought I'd send to you directly to see if you had any > insight. Since I posted this yesterday I have updated my R installation to > the latest patched version and still get the errors. > > I use latex(anova(...)) quite a bit so a fix or workaround would by most > appreciated. > > All the best, > > Kevin > > > Forwarded Message > Subject: rms::latex.anova broken? > Date: Tue, 7 Feb 2017 14:23:46 -0500 > From: Kevin E. Thorpe > To: R Help Mailing List > > I am re-running some logistic regression analyses using lrm from the rms > package but latex(anova(...)) appears to be broken on my system. > > Here is some anova() output followed by the latex() error for two models > since the error changes. My sessionInfo() follows the other output. I have > updated all my packages and re-installed Hmisc and rms plus dependencies. > The only thing I haven't done yet is update R completely. Has anyone else > encountered this and know how to solve it? > > anova(full) >> > Wald Statistics Response: id14 > > Factor Chi-Square d.f. P > birthweight_kilo 0.87 1 0.3517 > ageinmonth 4.12 1 0.0423 > zbmi 6.49 1 0.0108 > maxtbf 15.16 1 0.0001 > cowsmilk 6.54 1 0.0106 > Male 1.96 1 0.1611 > multivitamin 0.76 1 0.3819 > bottleuse0.13 1 0.7194 > preterm 0.50 1 0.4811 > AGEINTRO_cowsmilk0.06 1 0.8032 > AGEINTRO_complefood 0.61 1 0.4356 > TOTAL 30.49 11 0.0013 > >> latex(anova(full),file="",table.env=FALSE,booktabs=TRUE) >> > Error in ifelse(sn %nin% c("d.f.", "MS", "Partial SS"), math(sn), sn) : > could not find function "math" > >> anova(full.nl) >> > Wald Statistics Response: id14 > > Factor Chi-Square d.f. P > birthweight_kilo 3.68 2 0.1588 > Nonlinear 2.65 1 0.1037 > ageinmonth 16.25 2 0.0003 > Nonlinear 13.45 1 0.0002 > zbmi 4.07 2 0.1310 > Nonlinear 0.23 1 0.6323 > maxtbf 15.81 2 0.0004 > Nonlinear 2.57 1 0.1092 > cowsmilk 3.34 2 0.1880 > Nonlinear 1.16 1 0.2821 > Male 1.21 1 0.2711 > multivitamin 0.57 1 0.4494 > bottleuse0.06 1 0.8100 > preterm 0.01 1 0.9418 > AGEINTRO_cowsmilk3.65 2 0.1612 > Nonlinear 3.28 1 0.0700 > AGEINTRO_complefood 5.40 2 0.0671 > Nonlinear 4.00 1 0.0455 > TOTAL NONLINEAR 25.41 7 0.0006 > TOTAL 52.13 18 <.0001 > >> latex(anova(full.nl),file="",table.env=FALSE,booktabs=TRUE) >> > Error in paste0(specs$lspace, specs$italics(substring(rowl, 2)), sep = "") > : > attempt to apply non-function > > sessionInfo() >> > R version 3.2.3 Patched (2016-01-31 r70055) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Slackware 14.2 > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8LC_COLLATE=C > [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rms_5.1-0 SparseM_1.74Hmisc_4.0-2 ggplot2_2.2.1 > [5] Formula_1.2-1 survival_2.40-1 lattice_0.20-34 knitr_1.15.1 > > loaded via a namespace (and not attached): > [1] Rcpp_0.12.9 RColorBrewer_1.1-2 plyr_1.8.4 > [4] base64enc_0.1-3 tools_3.2.3 rpart_4.1-10 > [7] digest_0.6.12 polspline_1.1.12tibble_1.2 > [10] gtable_0.2.0htmlTable_1.9 checkmate_1.8.2 > [13] nlme_3.1-131Matrix_1.2-8mvtnorm_1.0-5 > [16]
[R] [R-pkgs] Major Update to rms package: 5.0-0
A major new version of the rms package is now on CRAN. The most user-visible changes are: - interactive plotly graphic methods for model fits. The best example of this is survplot for npsurv (Kaplan-Meier) estimates where the number of risk pop up as you hover over the curves, and you can click to bring up confidence bands for differences in survival curves - html methods for model fit summaries especially when using Rmarkdown html notebooks - instead of running print(fit, latex=TRUE) use options(prType='latex') print(fit) Or: options(prType='html') A complete list of changes is below. See http://biostat.mc.vanderbilt.edu/Rrms for an html notebook showing examples of new features. Changes in version 5.0-0 (2016-10-31) * plot.summary.rms: implemented plotly interactive plots if options(grType='plotly') in effect * plot.anova.rms: implemented plotly interactive plots if options(grType='plotly') in effect; remove psmall argument; changed margin default to chisq and P * ggplot.Predict: implemented plotly interactive plots if options(grType='plotly') in effect * print(fit, md=TRUE), prModFit, prStats: added latex/html methods using htmlTable package and MathJax for latex math * html.anova.rms, html.validate, html.summary.rms: new functions for use with html and MathJax/knitr/RStudio * latex methods for model fits: added md=TRUE argument to produce MathJax-compatible latex and html code for fitted models when using R Markdown * html: new methods for model fit objects for use with R Markdown * formatNP: fixed error when digits=NA * latex.anova.rms: fixed error in not rounding enough columns doe to using all.is.numeric intead of is.numeric * catg: corrected bug that disallowed explicit catg() in formulas * ggplot.Predict: added height and width for plotly * survplot: respected xlim with diffbands. Thanks: Toni G * reVector: changed to reListclean and stored model stats components as lists so can handle mixture of numeric and character, e.g., name of clustering variable * survplotp.npsurv: new function for interactive survival curve graphs using plotly * anova, summary, latex, print for model fits: use options(grType='html') or 'latex' to set output type, output htmltools::HTML marked html so that chunk header doesn't need results='asis' * latex methods - set file default to '' * GiniMd: moved to Hmisc package * plot.nomogram: fixed bug where abbreviations were being ignored. Thanks: Zongheng Zhang * nomogram: improved examples in help file * survplot.npsurv: fixed n.risk when competing risks * survest.cph, survfit.cph, others: fixed large problems due to incompatibility with survival 2.39-5 for survival predictions; changed object Strata to strata in cph to be compatible with survival package * new test survest.r to more comprehensively check survfit.cph and survest.cph * ggplot.Predict: quit ignoring xlim; suppress confidence bands if two group bariables because ggplot2 geom_ribbon doesn't support multiple aesthetics * predictrms: set contrasts for ordered factors to contr.treatment instead of contr.poly * val.prob: changed line of identity to use wide grayscale line instead of dashed line -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of *Biostatistics* *Vanderbilt University* [[alternative HTML version deleted]] ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] Massive Update to Hmisc package
Hmisc 4.0-0 is now on CRAN. The package has undergone a massive update. The most user-visable changes are; - support for Rmarkdown html notebooks - advanced html tables using the htmlTable package and summaryM function; can copy and paste into word processors - support for plotly interactive graphics, e.g. options(grType='plotly') plot(describe(mydata)) - new function ggfreqScatter - new object markupSpecs that drives LaTeX and html translations - describe function computes new statistics and relabels Unique as Distinct The full list of changes is below. You can look at examples of html notebook reports produced using the new Hmisc at http://biostat.mc.vanderbilt.edu/Hmisc . These include interactive plotly graphics. Changes in version 4.0-0 (2016-10-31) * summaryP: fixed exclude1 logic - was not excluding enough levels (inconsistent use of variable names vs. labels) * latexTranslate: any NAs in first argument changed to "" before conversion * minor.tick: added arguments to pass through (thanks: vhorvath) * tests/latexpng.r: test conversion of LaTeX table to png * latexTabular: made it properly call latexTranslate separately for each column, convert argument to matrix or data frame if a vector * tests/latexTabular.r: new test * latexDotchart: added call to latexTranslate to escape special characters for LaTeX such as underscore and pound sign * ggfreqScatter: new function * grType: new non-exported service function to sense if plotly is in effect * plot.describe: new function to make plotly graph for describe objects * describe: changed output: 3 decimal places for Info, new format for frequency table, separated logic for 10 extreme values from logic for frequency table, significant changes to print.describe * dotchartp: new version of dotchart3 for plotly charts * summaryD: modified to use dotchartp if options(grType='plotly') is in effect * nFm: added argument html * label.default, label.Surv, labelPlotmath: added html argument to implement HTML markup in place of plotmath (for plotly) * now imports htmlTable, viridis * html.contents.data.frame: return invisibly if file='' * htmlVerbatim: new function * html.contents.data.frame: improved html * html.data.frame: changed column headers from h4 to * ggfreqScatter: added html argument * knitrSet: was ignoring default figure h and w * html.summaryM: new function using new version of latex.summaryM * markupSpecs: new list object defining markup elements for latex and html, used by summaryM * show.html, print.html: removed; conflicted with htmltools/rmarkdown * label: required exact match on attribute name "label" to not retrieve the "labels" attribute (thanks: Ivan Puzek) * htmlSN: new function to convert floating point to scientific format usint html * upData, cleanup.import: fix NA becoming new factor levels (thanks: Beau Bruce) * htmlTranslate: new function * plotlySave: new function * histSpikep: new function * upData: new argument html * plot.describe: added ggplot2 graphics, improved continuous display using ggplotly * labelPlotmath: cleaned up by using markupSpecs * capitalize: fixed bug - did not cap. first letter if other letters after it were upper case (thanks: dchiu911) * html.summaryM: added brmsd argument to put mean, SD on separate line * Save: changed default back to use gzip for speed * knitrSet: used new knitr way to set aliases for option names * latexTabular: made translate argument apply to body of table also; implemented multi-line header * html.data.frame: added several arguments * describe: added html method * html markupSpecs object: added bibliographic database utility functions * bppltp: new service function for extended box plots for plotly * plot.summaryM: new plotly method * prType: new service function used to detect user settings for html, latex for print methods * html.contents.data.frame: removed file and append arguments and output an html character vector instead * prList: new function * GiniMd: moved to Hmisc from rms * describe: added GMD (Gini's mean difference) and relabeled unique to distinct * latex.summaryM: added Needspace{2.7in} before 2nd and later strata * ggplot.summaryP: added point labels for use with plotly as hover text * latex.summaryP: fixed bad linebreak at strata boundaries * dotchartpl: new plotly dot chart function especially for summaryP * plot.summaryP: new plotly method using dotchartpl when options(grType='plotly') * putHfig: new function to render graphics with captions in HTML reports * pngNeedle: new function, like latexNeedle but useful with HTML reports * html.data.frame: added width and caption arguments * plotlyParm: list object with plotly helper functions * summaryD, dotchartp: added symbol and col arguments * upData: improved efficiency, added labelled class to variables without it but having labels (e.g., data imported using haven package) * gbayesMixPost: mixed error in posterior odds of the mixing parameter that resulted in incorrect posterior probabilities when the
Re: [R] Trouble getting rms::survplot(..., n.risk=TRUE) to behave properly
This happens when you have not strat variables in the model. -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of *Biostatistics* *Vanderbilt University* On Thu, Jun 2, 2016 at 10:55 AM, Steve Lianoglouwrote: > Hello foks, > > I'm trying to plot the number of patients at-risk by setting the > `n.risk` parameter to `TRUE` in the rms::survplot function, however it > looks as if the numbers presented in the rows for each category are > just summing up the total number of patients at risk in all groups for > each timepoint -- which is to say that the numbers are equal in each > category down the rows, and they don't seem to be the numbers specific > to each group. > > You can reproduce the observed behavior by simply running the code in > the Examples section of ?survplot, which I'll paste below for > convenience. > > Is the error between the chair and the keyboard, here, or is this perhaps > a bug? > > === code === > library(rms) > n <- 1000 > set.seed(731) > age <- 50 + 12*rnorm(n) > label(age) <- "Age" > sex <- factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4))) > cens <- 15*runif(n) > h <- .02*exp(.04*(age-50)+.8*(sex=='Female')) > dt <- -log(runif(n))/h > label(dt) <- 'Follow-up Time' > e <- ifelse(dt <= cens,1,0) > dt <- pmin(dt, cens) > units(dt) <- "Year" > dd <- datadist(age, sex) > options(datadist='dd') > S <- Surv(dt,e) > > f <- cph(S ~ rcs(age,4) + sex, x=TRUE, y=TRUE) > survplot(f, sex, n.risk=TRUE) > === > > I'm using the latest version of rms (4.5-0) running on R 3.3.0-patched. > > === Output o sessionInfo() === > R version 3.3.0 Patched (2016-05-26 r70671) > Platform: x86_64-apple-darwin13.4.0 (64-bit) > Running under: OS X 10.11.4 (El Capitan) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rms_4.5-0 SparseM_1.7 Hmisc_3.17-4ggplot2_2.1.0 > [5] Formula_1.2-1 survival_2.39-4 lattice_0.20-33 > > loaded via a namespace (and not attached): > [1] Rcpp_0.12.5 cluster_2.0.4 MASS_7.3-45 > [4] splines_3.3.0 munsell_0.4.3 colorspace_1.2-6 > [7] multcomp_1.4-5 plyr_1.8.3 nnet_7.3-12 > [10] grid_3.3.0 data.table_1.9.6gtable_0.2.0 > [13] nlme_3.1-128quantreg_5.24 TH.data_1.0-7 > [16] latticeExtra_0.6-28 MatrixModels_0.4-1 polspline_1.1.12 > [19] Matrix_1.2-6gridExtra_2.2.1 RColorBrewer_1.1-2 > [22] codetools_0.2-14acepack_1.3-3.3 rpart_4.1-10 > [25] sandwich_2.3-4 scales_0.4.0mvtnorm_1.0-5 > [28] foreign_0.8-66 chron_2.3-47zoo_1.7-13 > === > > > Thanks, > -steve > > > -- > Steve Lianoglou > Computational Biologist > Genentech > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Recommendation for updating packages such as nlme
I use this function to update my installed R packages: updatePac <- function (checkBuilt = FALSE, ...) update.packages(repos = "http://cran.rstudio.com;, instlib = "/usr/local/lib/R/site-library", checkBuilt = checkBuilt, ...) When I type updatePac() I get: Warning: package 'MASS' in library '/usr/lib/R/library' will not be updated Warning: package 'Matrix' in library '/usr/lib/R/library' will not be updated Warning: package 'mgcv' in library '/usr/lib/R/library' will not be updated Warning: package 'nlme' in library '/usr/lib/R/library' will not be updated Warning: package 'nnet' in library '/usr/lib/R/library' will not be updated Warning: package 'spatial' in library '/usr/lib/R/library' will not be updated Should I not try to update recommended packages in /usr/local/lib/R/site-library on my Ubuntu system but specify /usr/lib/R/library instead? -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of *Biostatistics* *Vanderbilt University* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Ordinal regression with some categories combined for some data
This does seem to be a good situation for ordinal regression. The R rms package's orm function allows for thousands of categories in Y. But it doesn't handle censoring. This discussion would be better for stats.stackexchange.com Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of *Biostatistics* *Vanderbilt University* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Correct notation for functions, packages when using LaTex
Rolf I believe \textsf is the correct font to use for the symbol R but not necessarily for the names of R variables and functions. I'd like more discussion about that. Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of *Biostatistics* *Vanderbilt University* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with improveProb function in Hmisc in R
Please note that Kirsten is cross-posting to stats.stackexchange.com creating extra work for everyone. -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of *Biostatistics* *Vanderbilt University* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] Major updates to Hmisc and rms packages
New versions of Hmisc and rms are available on CRAN. Changes are listed below. The most significant change to Hmisc is the addition of the ffCompress function that creates an optimal ff package object for large data frames by computing the maximum number of bits used by each numeric or logical variable in the data frame. And the html.latex method now implements conversion of latex code generated by Hmisc to html for dynamic insertion in R Markdown documents (if you install the system package TeX4ht). The most significant changes to rms are the correct handling of offset variables and updating the demos. Hmisc Changes in version 3.17-0 (2015-09-20) * format.df (used by latex.default): added space after textless, textgreater * label: changed default for units to value of plot * getRs: replaced where argument with guser, grepo, gdir, dir to allow easy fetching of updated functions from Hmisc etc. * Separated sas.get source code from other .get functions and from upData/cleanup.import by putting into 3 separate files. Moved stata.get into misc.get.s * upData: for Stat/Transfer exported R workspaces, change variables into factors to incorporate value labels when present; added subset argument and reporting of number of observations pre and post subsetting * latex.default: added comma after botcap directive for ctable. Thanks: Paul Trowbridge * Hmisc-internal.Rd: removed alias{[.terms} * latex.default: for longtable when no caption is given, subtract one from table counter * latex.summaryM: quit ignoring insert.bottom if it is a character string (thanks: JoAnn Alvarez) * minor.tick: revised version by Earl Belllinger that fixes problem reported in https://github.com/harrelfe/Hmisc/issues/28 * several functions: used new names when assigning temporary functions * NAMESPACE: add imports to base functions to avoid new R CMD CHECK warnings * ffCompress: new function * knitrSet: changed fig.path default to '' instead of NULL to work with knitr 1.11 * html.latex: added argument rmarkdown * htmltools: added to suggests in DESCRIPTION * tests: new test script latex-html.Rmd for latex -> html under Rmarkdown/knitr/Rstudio, new test for cut2 * plsmo, panel.plsmo: added method='intervals', mobs, ifun arguments rms Changes in version 4.4-0 (2015-09-28) * contrast.rms: made SE a vector not a matrix, added 4 list logic for nvary, added new test from JoAnn Alvarez * plot.summary.rms: correct bug where pch was ignored. Thanks: Tamas Ferenci * prModFit: fix print(fit, latex=FALSE) when fit is result of robcov, bootcov * NAMESPACE: added imports for base functions used to avoid warnings with R CMD CHECK; new test rcs.r * prModFit: added rmarkdown argument. All print.* methods can pass this argument. * All print methods for fit objects: left result as prModFit instead of invisible() so that rmarkdown will work * demo/all.R: updated for plot and ggplot methods, npsurv * cph, predictrms, rms, rms.trans, rmsMisc: changed Design function to return new objects sformula (formula without cluster()) and mmcolnames which provides a new way to get rid of strat() main effects and interactions involving non-reference cells; handle offsets in cph and predict() (not yet in Predict); new internal function removeFormulaTerms that does character manipulation to remove terms like cluster() or offset() or the dependent variable(s). This gets around the problem with [.terms messing up offset terms when you subset on non-offset terms * Glm, ols: fixed offset * bj, Gls, lrm, orm, psm, Rq: change to new offset method and sformula * Predict: added offset=list(offsetvariable=value) * several: made temporary function names unique to avoid warnings with R CMD CHECK * ggplot.Predict: changed facet_wrap_labeller to not mess with class of returned object from ggplotGrob * Design: fixed column names for matrix predictors * Design, cph: handled special case where model is fit on a fit$x matrix * dxy.cens: exported * cph: added debug argument * tests/cph4.r: new tests for various predictor types * rms: changed warning to error if an ordered factor appears in the model and options(contrasts) is not set properly * rms transformation functions: made more robust by checking ! length instead of is.null -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of *Biostatistics* *Vanderbilt University* [[alternative HTML version deleted]] ___ R-packages mailing list r-packa...@r-project.org
Re: [R] Nagelkerke Pseudo R-squared
It's implemented in the R rms package's lrm and orm functions. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Nagelkerke-Pseudo-R-squared-tp4710014p4710031.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different behavior of model.matrix between R 3.2 and R3.1.1
Thank you very much Terry. I'm still puzzled at why this worked a year ago. What changed? I'd very much like to reverse the change by setting an argument somewhere or manipulating the terms object. I echo your sentiments about the general approach. Frank On 06/15/2015 09:05 AM, Therneau, Terry M., Ph.D. wrote: Frank, I don't think there is any way to fix your problem except the way that I did it. library(survival) tdata - data.frame(y=c(1,3,3,5, 5,7, 7,9, 9,13), x1=factor(letters[c(1,1,1,1,1,2,2,2,2,2)]), x2= c(1,2,1,2,1,2,1,2,1,2)) fit1 - lm( y ~ x1 * strata(x2) - strata(x2), tdata) coef(fit1) (Intercept)x1b x1a:strata(x2)x2=2 x1b:strata(x2)x2=2 3.00 5.00 1.00 1.67 Your code is calling model.matrix with the same model frame and terms structure as the lm call above (I checked). In your case you know that the underlying model has 2 intercepts (strata), one for the group with x2=1 and another for the group with x2=2, but how is the model.matrix routine supposed to guess that? It can't, so model.matrix returns the proper result for the lm call. As seen above the result is not singular, while for the Cox model it is singular due to the extra intercept. This is simply an extension of leaving the intercept term in the model and then removing that column from the returned X matrix, which is necessary to have the correct coding for ordinary factor variables, something we've both done since day 1. In order for model.matrix to do the right thing with interactions, it has to know how many intercepts there actually are. I've come to the conclusion that the entire thrust of 'contrasts' in S was wrong headed, i.e., the remove redundant columns from the X matrix ahead of time logic. It is simply not possible for the model.matrix routine to guess correctly for all y and x combinations, something that been acknowledged in R by changing the default for singular.ok to TRUE. Dealing with this after the fact via a good contrast function (a la SAS -- heresy!) would have been a much better design choice. But as long as I'm in R the coxph routine tries to be a good citizen. Terry T. -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of *Biostatistics* *Vanderbilt University* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different behavior of model.matrix between R 3.2 and R3.1.1
Terry - your example didn't demonstrate the problem because the variable that interacted with strata (zed) was not a factor variable. But I had stated the problem incorrectly. It's not that there are too many strata terms; there are too many non-strata terms when the variable interacting with the stratification factor is a factor variable. Here is a simple example, where I have attached no packages other than the basic startup packages. strat - function(x) x d - expand.grid(a=c('a1','a2'), b=c('b1','b2')) d$y - c(1,3,2,4) f - y ~ a * strat(b) m - model.frame(f, data=d) Terms - terms(f, specials='strat', data=d) specials - attr(Terms, 'specials') temp - survival:::untangle.specials(Terms, 'strat', 1) Terms - Terms[- temp$terms] model.matrix(Terms, m) (Intercept) aa2 aa1:strat(b)b2 aa2:strat(b)b2 1 1 0 0 0 2 1 1 0 0 3 1 0 1 0 4 1 1 0 1 . . . The column corresponding to a='a1' b='b2' should not be there (aa1:strat(b)b2). This does seem to be a change in R. Any help appreciated. Note that after subsetting out strat terms using Terms[ - temp$terms], Terms attributes factor and term.labels are: attr(,factors) a a:strat(b) y0 0 a1 2 strat(b) 0 1 attr(,term.labels) [1] a a:strat(b) Frank On 06/11/2015 08:44 AM, Therneau, Terry M., Ph.D. wrote: Frank, I'm not sure what is going on. The following test function works for me in both 3.1.1 and 3.2, i.e, the second model matrix has fewer columns. As I indicated to you earlier, the coxph code removes the strata() columns after creating X because I found it easier to correctly create the assign attribute. Can you create a worked example? require(survival) testfun - function(formula, data) { tform - terms(formula, specials=strata) mf - model.frame(tform, data) terms2 - terms(mf) strat - untangle.specials(terms2, strata) if (length(strat$terms)) terms2 - terms2[-strat$terms] X - model.matrix(terms2, mf) X } tdata - data.frame(y= 1:10, zed = 1:10, grp = factor(c(1,1,1,2,2,2,1,1,3,3))) testfun(y ~ zed*grp, tdata) testfun(y ~ strata(grp)*zed, tdata) Terry T. - original message -- For building design matrices for Cox proportional hazards models in the cph function in the rms package I have always used this construct: Terms - terms(formula, specials=c(strat, cluster, strata), data=data) specials - attr(Terms, 'specials') stra- specials$strat Terms.ns - Terms if(length(stra)) { temp - untangle.specials(Terms.ns, strat, 1) Terms.ns - Terms.ns[- temp$terms]#uses [.terms function } X - model.matrix(Terms.ns, X)[, -1, drop=FALSE] The Terms.ns logic removes stratification factor main effects so that if a stratification factor interacts with a non-stratification factor, only the interaction terms are included, not the strat. factor main effects. [In a Cox PH model stratification goes into the nonparametric survival curve part of the model]. Lately this logic quit working; model.matrix keeps the unneeded main effects in the design matrix. Does anyone know what changed in R that could have caused this, and possibly a workaround? --- -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of *Biostatistics* *Vanderbilt University* __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Different behavior of model.matrix between R 3.2 and R 3.1.1
For building design matrices for Cox proportional hazards models in the cph function in the rms package I have always used this construct: Terms - terms(formula, specials=c(strat, cluster, strata), data=data) specials - attr(Terms, 'specials') stra- specials$strat Terms.ns - Terms if(length(stra)) { temp - untangle.specials(Terms.ns, strat, 1) Terms.ns - Terms.ns[- temp$terms]#uses [.terms function } X - model.matrix(Terms.ns, X)[, -1, drop=FALSE] The Terms.ns logic removes stratification factor main effects so that if a stratification factor interacts with a non-stratification factor, only the interaction terms are included, not the strat. factor main effects. [In a Cox PH model stratification goes into the nonparametric survival curve part of the model]. Lately this logic quit working; model.matrix keeps the unneeded main effects in the design matrix. Does anyone know what changed in R that could have caused this, and possibly a workaround? Note that cph is a kind of front-end to the survival package's coxph function. Therry Therneau uses more complex logic to construct the design matrix reliably. I'd like to avoid that logic because it creates an overly wide design matrix before removing the unneeded columns. Thanks for any assistance, Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of *Biostatistics* *Vanderbilt University* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] Version 3.16-0 of Hmisc now on CRAN
Several updates have been made to the Hmisc package: Changes in version 3.16-0 (2015-04-25) * html.contents.data.frame: corrected html space character to add semicolon * ggplot.summaryP: added size of points according to denominators * colorFacet: new function * labelPlotmath: added chexpr argument (used by rms::ggplot.Predict) * rcsplineFunction: added type='integral' * summaryP: fixed bug with sort=FALSE using mfreq when shouldn't * summaryP: stored levels(val) in original levels order * summaryM: removed observations added by addMarginal when computing N in left column of tables * html.latex: added method for htlatex, added where argument, cleaned up code, implemented file='' for knitr when using html/Rmarkdown * summaryM, summary.formula: changed calls to translate to gsub() * summaryP: corrected but in exclude1 logic, moved exclude1 to methods that operate on summaryP objects and out of summaryP itself * addMarginal: respect original levels, add argument margloc * added latticeExtra:: in front of function calls * numeric.string, all.is.numeric: replaced options(warn=-1) with suppressWarnings() (thanks: Yihui) * arrGrob, print.arrGrob: new functions * wtd.var: added maximum likelihood method, fixed unbiased method, improved documentation (all provided by Benjamin Tyner) * Changed all any(duplicated()) to anyDuplicated(); thanks Benjamin Tyler * getRs: new function to interact with https://github.com/harrelfe/rscripts * knitrSet: new function to setup knitr with nice defaults for books etc. * rcorr: fixed sensing of NAs and diagonal elements of n matrix; thanks: Keith Jewell, Campden BRI Group; similar for hoeffd -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of *Biostatistics* *Vanderbilt University* [[alternative HTML version deleted]] ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Correction: Regression Modeling Strategies 4-Day Short Course March 2015
*RMS Short Course 2015* Frank E. Harrell, Jr., Ph.D., Professor and Chair Department of Biostatistics, Vanderbilt University School of Medicine *March 3, 4, 5 6, 2015* With Optional R Workshop March 2 9:00am - 4:00pm Student Life Center Board of Trust Room Vanderbilt University Nashville Tennessee USA See http://biostat.mc.vanderbilt.edu/2015RMSShortCourse for details. The course includes statistical methodology, case studies, and use of the R rms package. Please email interest to Ashlee Bartley CORRECTED EMAIL: ashlee.bart...@vanderbilt.edu - If your web browser handles redirection in a non-standard way, go to http://biostat.mc.vanderbilt.edu/wiki/Main/2015RMSShortCourse -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of *Biostatistics* *Vanderbilt University* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regression Modeling Strategies 4-Day Short Course March 2015
Subject: Regression Modeling Strategies 4-Day Short Course March 2015 *RMS Short Course 2015* Frank E. Harrell, Jr., Ph.D., Professor and Chair Department of Biostatistics, Vanderbilt University School of Medicine *March 3, 4, 5 6, 2015* With Optional R Workshop March 2 9:00am - 4:00pm Student Life Center Board of Trust Room Vanderbilt University Nashville Tennessee USA See http://biostat.mc.vanderbilt.edu/2015RMSShortCourse for details. The course includes statistical methodology, case studies, and use of the R rms package. Please email interest to Ashlee Bartley ashlee.f.bart...@vanderbilt.edu -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of *Biostatistics* *Vanderbilt University* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] CRAN submission of Hmisc 3.14-5
An update to the Hmisc package is now on CRAN. Some recent changes: * latex.summaryM: fixed bug in caption with test=TRUE. Thanks: Yonghao Pua * combined.levels: sensed all NA vector, now return non-factor numeric instead * dataframeReduce: handle all-NA factor variable * subplot: replaced with latest version from TeachingDemos package by Greg Snow * latexTabular: fixed error in example in help file; result is not a file * latex: another test added in tests/latex.s * summaryP: removed observations with a right-hand-side variable missing * latex.summaryP: fixed bug with wrong column labels due to reshape reordering columns coming from factor levels alphabetically instead of by original levels * format.df: added % = = to list of characters handled, the last two by going into math mode * latex.summaryP: use blank if denominator 0, instead of NaN * summary.formula: fixed problem with deparse formula. Thanks: Andreas Kiermeier * describe: added relative information measure for numeric variables - a measure of how continuous the variable is * wtd.table: detect duplications using duplicated() instead of diff(x) to handle Inf. Thanks: Benjamin Tyner * DESCRIPTION, NAMESPACE: multiple function changes to work in R-devel See http://biostat.mc.vanderbilt.edu/Hmisc, https://github.com/harrelfe/Hmisc for more information. Frank Harrell Vanderbilt University ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] CRAN update: rms 4.2-1
An update to the rms package is now on CRAN. Some recent changes: * plot.summary.rms: allowed a vector for lwd, and passed lwd to confbar. Thanks: Michael Friendly * gendata: Starting in R 3.1.0, as.data.frame.labelled or as.data.frame.list quit working when length vary; workaround * predictrms, ols: handle offset in formula. Thanks: Max Gordon * pentrace: neatened code, added new argument noaddzero if user wants to prevent unpenalized model from being tried; add new test script in tests * bplot: fixed bug whereby xlabrot was ignored. Thanks: Sven Krackow sven.krac...@gmail.com; new test for bplot in tests directory * plot.Predict: fixed bug in which 2nd argument to perim was not correct * validate.ols: Shane McIntosh fixed the passing of the tolerance argument to predab.resample * predictrms: computed offset earlier so always defined no matter the value of type * plot.Predict: added scaletrans argument, fixed use of subscripts in pan * lrm, lrm.fit: added scale argument * orm, orm.fit: added scale argument * vcov.orm: accounted for scale when extracting covariance matrix * npsurv: was not passing type argument * npsurv: start storing all classes created by survfit.formula * logLik.Gls: added. Makes AIC(Gls object) work. * NAMESPACE: several changes See http://biostat.mc.vanderbilt.edu/rms and https://github.com/harrelfe/rms for more information. Frank Harrell Vanderbilt University ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Predictions from coxph or cph objects
When using cph in the rms package there is a function Mean that operates on cph objects to produce an R function for computing the mean or restricted mean life time. Frank __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using subplot (from Hmisc) along with par(mfrow)
Greg I just re-copied the latest subplot and its help file from TeachingDemos to Hmisc for the next release. Thanks for pointing this out. Frank __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survplot invert number at risk labels
Please try hard to create a simple reproducible example, then I'll work on it. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SAS VS R - What type of analysis is better?
I can't think of an example where R does not work better than SAS except for a few cases of mixed effects regression models and for processing enormous datasets when the R user does not want to learn about the latest R tools for large datasets. I quit using SAS in 1991 (in favor of S-Plus and transitioned to R around 2000) and have never looked back. Lately what has really made R powerful is its ability to interface with other languages and especially the way it works in a reproducible analysis/dynamic report document context. Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] Updates to Hmisc, rms, and greport packages
The Hmisc package has had a number of updates/fixes: Changes in version 3.14-4 (2014-04-13) * rcspline.eval: stop new logic for ignoring outer values when there are many ties when there are also many ties on interior values. Added new logic to use interior unique values of x when the number of unique x is small. * latexBuild: generalized with insert argument * latex.default: modified to use mods of latexBuild, fixed bug with caption.loc='bottom' (thanks: YacineH) * latex.default: fixed bug where comma did not appear after caption={} for ctable (thanks: Johannes Hofrichter) * tests: fit.mult.impute.bootstrap.r: added new example (thanks: Jane Cook) * fit.mult.impute: added call for fit.mult.impute in returned object, replacing call from fitter; makes update work for fit.mult.impute * summary.formula: fixed recognition of multiple left-hand-side variables to trigger call to summaryM (thanks: Kevin Thorpe) * summaryM: changed group label to '' instead of 0 for formulas like age + sex ~ 1 * Ecdf: added what argument to all functions * nobsY: return constructed id vector * addMarginal: instead of .marginal. being logical, make it contain names of variables being marginalized over * mChoice.c: fixed some inconsistencies The rms package has also had some changes, and the survfit.formula function has been REMOVED, replaced by the npsurv function: Changes in version 4.2-0 (2014-04-13) * Deprecated survfit.formula so would not overlap with function in survival * Added function npsurv, survplot.npsurv * REMOVED survfit.formula * Used new type argument to label.Surv for fitting functions * cph: added weights argument to residuals.coxph (Thanks: Thomas Lumley) * survfit.cph: fixed bug in using wrong type variable. Thanks: Zhiyuan Sun * cph: added weighted=TRUE in call to residuals.coxph (Thanks: T Lumley) * orm.fit: improved ormfit to not try to deal with NaN in V, assuming that step-halving will happen Some improvements have been made in the greport package: Changes in version 0.5-1 (2014-04-15) * survReport: changed to use npsurv instead of survfit.formula * exReport: changed order of output so that analysis of randomized patients marked for exclusions appears last; use LaTeX chngpage package to allow detailed table to go into left margin so as to be centered on page * exReport: added adjustwidth argument * accrualReport: allowed enrollment target N to be omitted * exReport: fine tuning * nriskReport: new report to show number of subjects still being followed at each day * Merge: added support for data.table * nriskReport: added id() variable * exReport: fixed bug when there is an exclusion with 0 frequency * accrualReport: improved graphics formatting, added minrand argument * accrualReport: added enrollmax argument, clarified notation * exReport: added ignoreExcl, ignoreRand arguments * all: added greportOption texwhere; default is 'gentex'; can specify texwhere='' to write non-appendix LaTeX code to console as for knitr * dReport: for byx for discrete Y, sense when Y is binary and use Wilson interval instead of bootstrap; adjust SE using confidence interval if proportion is 0 or 1 * dReport: changed discreteness non-binary classification to use maximum number of unique values or Y instead of minimum * add globalVariables call to nriskReport -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] New version of Hmisc package on CRAN
Recent changes include the following. Changes in version 3.14-2 (2014-02-26) * latex.default: improved logic using new function in Misc: latexBuild * latex.default: fixed bug with ctable=TRUE with no caption by removing default label * latex.default: improved formatting for insert.top * latex.default: added tests, fixed insert.bottom * latex.summaryM: return stat summary key as legend attribute, use this according to insert.bottom argument * latex.summary.formula.response: fixed bug related to computation of cdec. Thanks: Kevin Thorpe * latex.default: added new argument star: ctables uses this to spread over two columns when the LaTeX document is in \twocolumn mode. Thanks: David Whiting Changes in version 3.14-1 (2014-02-25) * Added latexNeedle function * Change latexTherm, latexNeedle to use user LaTeX macro \tooltipn to do the pop-up * latex.default: changed line breaks around \end{tabular} * latex.summaryM: put insert.bottom text in minipage so \tooltip will not devote wide space to it * sas.get: added defaultencoding argument and logic (Thanks: Reinhold Koch) * plot.summaryP: omit tick marks for proportion 1.0 * format.df (used by latex): fixed na.blank logic for character var * latex: removed newlines when ending environments, added hyperref argument * latex: added center='centerline', fixed 'centering' * upData, cleanup.import, dataframeReduce: changed argument pr to print * rcspline.eval: added more evasive action in case of extreme ties Changes in version 3.14-0 (2014-01-22) * Added trans argument to varclus * Removed recode, existsFunction functions, under.unix object, survfitKM, functions used only by S-Plus: comment, mem, mulbar.chart, p.sunflowers * as.category, is.category, untangle.special: removed * Removed reference to .R. from many functions * Remove oldClass, oldUnclass, getFunction * latex.default: changed 'rotate' to 'sideways' for ctable mode. Thanks: Simon Zehnder szehn...@uni-bonn.de * gView: removed * ldBands: removed * summaryP: new function - graphical replacement for tables of proportions * ynbind: new function for combining related yes/no variables into a matrix with a label * added file argument to prn * summaryP: added autoarrange * added addMarginal and nobsY functions * pBlock: new function for blocking variables for summaryP * summaryP: changed text positioning to grid absolutes, added text.at argument * scat1d, histSpike: if grid used and y has grid units, fixed logic for frac * plsmo, panel.plsmo: added scat1d.opts argument * label.Surv, units.Surv: added, removed ::: in survival calls * summarize: added keepcolnames argument * Suppressed startup message unless options(Hverbose=TRUE) is set * summaryS: new function - multi-panel lattice xy and dot plots * summaryD: added ylab argument * dotchart3: quit letting left margin be less than pre-existing one * multLines: new function * Improved nobsY to respect subject IDs when counting number of subjects, and to return an attribute 'formula' without id variable; changed bpplotM, summaryP, summaryS to use this * Removed nobsY calculations from bpplotM, summaryP, summaryS, enhanced nobsY to allow stratification by treatment * panel.bpplot: added violin and violin.opts arguments * summaryS: added medvPanel support during-plot vertical violin plots * plot.summaryP: padded x-axis limits * latexTabular: added translate and hline arguments; moved to its own file and help page * latexTherm: added tooltip using LaTeX ocgtools package * summaryP: stopped reversing order of panels * summaryM: added table.env argument, changed how model.frame built * latex.summaryM: changed to print proportions by default, added round='auto' * character.table: added xpd=NA; thanks: Dale * summaryP: added latex method * latex.default: added insert.top argument * summaryM: added stratification (multiple tables) -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] rms package updated
Changes are listed below. NOTE: The next release of rms will replace the survfit.formula function with a new function named npsurv (nonparametric survival estimates) so as to not conflict with the survival package. This is a NON-DOWNWARD-COMPATIBLE change. To get Kaplan-Meier estimates in the NEXT release use npsurv() instead of survfit(). The web site for more information about rms is http://biostat.mc.vanderbilt.edu/Rrms Changes in version 4.1-3 (2014-03-02) * num.intercepts: removed (is in Hmisc) * survfit.formula, cph, psm: changed to use inputAttributes attribute of Surv objects (introduced earlier in survival package so that rms could drop its customized Surv function) * Exported survfit.formula * Changed survival fitting functions and residuals to use units.Surv Changes in version 4.1-2 (2014-02-28) * psm: Fixed bug to allow computation of Dxy with left censoring * val.prob: Fixed recently introduced bug that made calibration intercept and slope always 0,1. Thanks: lars.engerst...@lio.se * plot.Predict: added between to leave space between panels * orm.fit: fixed error in kmid calculation when heavy ties at first level of y. Thanks: Yuwei Zhu * setPb: changed default to now use tktcl to show progress bars for simulations * predictrms: fixed bug with type='terms' * val.surv: handle case where survival estimates=0 or 1 when using log-log transform Changes in version 4.1-1 (2014-01-22) * Removed use of under.unix in anova.rms, latex.summary, plot.nomogram * Removed use of oldUnclass, oldClass, is.category * Fixed class of Rq object; had failed with bootcov. Thanks: Max Gordon * survplot: preserved par() * Srv: removed, changed all uses to Surv() for new survival package that preserves attributes for Surv inputs * survplot.survfit, survdiffplot: added conf='diffbands' * predictrms: fixed num. intercepts calculation order * survplot, survdiffplot: used original standard error for survdiffplot, and fun * dyx.cens: allow left-censoring -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] New package on CRAN: greport
The first submission of the new greport package is now on CTAN. This package facilitates graphical reporting of clinical trials an is highly tied to LaTeX, Hmisc, and lattice. It creates hyperlinks and pop-up tooltips in the resulting pdf report. Tables are greatly de-emphasized, but hyperlinks take you from primary graphics to supporting tables in an appendix, which are hyperlinked back to the graphic. The package's web page, with example pdf reports, is at http://biostat.mc.vanderbilt.edu/Greport -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regression Modeling Strategies 4-Day Short Course March 2014
My yearly Regression Modeling Strategies course is expanded to 4 days this year to be able relax the pace a bit. Details are below. Questions welcomed. - *RMS Short Course 2014* Frank E. Harrell, Jr., Ph.D., Professor and Chair Department of Biostatistics, Vanderbilt University School of Medicine *March 4, 5, 6 7, 2014* 9:00am - 4:00pm (9:00am - 2:00pm March 7) Alumni Hall Vanderbilt University Nashville Tennessee USA See http://biostat.mc.vanderbilt.edu/2014RMSShortCourse for details. The course includes statistical methodology, case studies, and use of the R rms package. Please email interest to Audrey Carvajal {audrey.carva...@vanderbilt.edu} -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Handlig large SAS file in R
For that you need to purchase Stat/Transfer. Frank hans012 wrote Hey Guys I have a .sas7bdat file of 1.79gb that i want to read. I am using the .sas7bdat package to read the file and after i typed the command read.sas7bdat('filename.sas7bdat') it has been 3 hours with no result so far. Is there a way that i can see the progress of the read? Or is there another way to read the file with less computing time? I do not have access to SAS, the file was sent to me. Let me know what you guys think KR Hans - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Handlig-large-SAS-file-in-R-tp4684212p4684250.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] Major update to Hmisc package available on CRAN
Hmisc has had the following updates, with some of the new functions demonstrated on http://biostat.mc.vanderbilt.edu/HmiscNew . Much of the new work is motivated by the need to replace tables with graphics. Changes in version 3.14-0 (2014-01-22) * Added trans argument to varclus * Removed recode, existsFunction functions, under.unix object, survfitKM, functions used only by S-Plus: comment, mem, mulbar.chart, p.sunflowers * as.category, is.category, untangle.special: removed * Removed reference to .R. from many functions * Remove oldClass, oldUnclass, getFunction * latex.default: changed 'rotate' to 'sideways' for ctable mode. Thanks: Simon Zehnder szehn...@uni-bonn.de * gView: removed * ldBands: removed * summaryP: new function - graphical replacement for tables of proportions * ynbind: new function for combining related yes/no variables into a matrix with a label * added file argument to prn * summaryP: added autoarrange * added addMarginal and nobsY functions * pBlock: new function for blocking variables for summaryP * summaryP: changed text positioning to grid absolutes, added text.at argument * scat1d, histSpike: if grid used and y has grid units, fixed logic for frac * plsmo, panel.plsmo: added scat1d.opts argument * label.Surv, units.Surv: added, removed ::: in survival calls * summarize: added keepcolnames argument * Suppressed startup message unless options(Hverbose=TRUE) is set * summaryS: new function - multi-panel lattice xy and dot plots * summaryD: added ylab argument * dotchart3: quit letting left margin be less than pre-existing one * multLines: new function * Improved nobsY to respect subject IDs when counting number of subjects, and to return an attribute 'formula' without id variable; changed bpplotM, summaryP, summaryS to use this * Removed nobsY calculations from bpplotM, summaryP, summaryS, enhanced nobsY to allow stratification by treatment * panel.bpplot: added violin and violin.opts arguments * summaryS: added medvPanel support during-plot vertical violin plots * plot.summaryP: padded x-axis limits * latexTabular: added translate and hline arguments; moved to its own file and help page * latexTherm: added tooltip using LaTeX ocgtools package * summaryP: stopped reversing order of panels * summaryM: added table.env argument, changed how model.frame built * latex.summaryM: changed to print proportions by default, added round='auto' * character.table: added xpd=NA; thanks: Dale * summaryP: added latex method * latex.default: added insert.top argument * summaryM: added stratification (multiple tables) -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regression Modeling Strategies 4-Day Short Course March 2013
My yearly Regression Modeling Strategies course is expanded to 4 days this year to be able relax the pace a bit. Details are below. Questions welcomed. - *RMS Short Course 2014* Frank E. Harrell, Jr., Ph.D., Professor and Chair Department of Biostatistics, Vanderbilt University School of Medicine *March 4, 5, 6 7, 2014* 9:00am - 4:00pm (9:00am - 2:00pm March 7) Alumni Hall Vanderbilt University Nashville Tennessee USA See http://biostat.mc.vanderbilt.edu/2014RMSShortCourse for details. The course includes statistical methodology, case studies, and use of the R rms package. Please email interest to Audrey Carvajal {audrey.carva...@vanderbilt.edu} -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] howto join matrices produced by rcorr()
This is an unstable process. I suggest using the bootstrap to get a confidence interval for the rank of each correlation coefficient among all non-diagonal correlations. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/howto-join-matrices-produced-by-rcorr-tp4682867p4682932.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Possible to vary widths and/or heights of lattice panels?
Thanks very much Rich and Duncan. latticeExtra's resizePanels function was a perfect solution. Frank __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Possible to vary widths and/or heights of lattice panels?
Is it possible to vary the size of the panels in lattice within one page? The examine I have in mind is a 3 row by 2 column display where I vary the y-axis scales in a dot plot and there are only a few levels on the y-axis for one of the rows. I'd like to remove wasted space in that row. On a related issue http://stackoverflow.com/questions/9654244/multipage-lattice-panel-arrangement has some good information. Thanks for any pointers. Frank __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simple way to define a function to be used in a formula object inside another function
I would like to do this: f - function(formula, data=NULL) { gg - sqrt model.frame(formula, data=data) } x - y - 1:10 f(y ~ gg(x)) Error in eval(expr, envir, enclos) : could not find function gg Is there a simple way to get access to gg from within the model.frame invocation inside f? Thanks Frank __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple way to define a function to be used in a formula object inside another function
Thank you Bill, that worked perfectly. Frank __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] rms 4.1-0
The rms package has had several updates in version 4.1-0: * Fixed orm.fit to not create penalty matrix if not needed (penalties are not yet implemented anyway) * Added yscale argument to plot.Predict * Added Wald test simulation to orm help file * Added example in help file for plot.anova.rms of adding a line combining the effects of two predictors in dot chart * Fixed grid interpretation error in survplot.survfit * Changed plot.anova.rms to use dotchart3 instead of dotchart2 * Fixed bug in summary.rms - was taking reciprocal of effect ratio with orm even if not loglog family (thanks: Yong Hao Pua puayong...@gmail.com * Removed link to print.lm, summary.lm in ols.Rd * Added ntrans argument to plot.anova.rms * Fixed handling of intercepts in Rq, validate.Rq * Removed residuals.Glm, residuals.rms (also from Rd, NAMESPACE) * Removed other .rms methods and other remnants from fooling S+ dispatcher * Fixed bug in lm.pfit when penalty used (thanks: Yong Hao Pua puayong...@gmail.com) * Fixed bug in calibrate.default for ols (thanks: Andy Bush) * Change print.contrast.rms to insert NA for SE if fun is not the identity function * Added margin argument to plot.anova.rms to print selected stats in right margin of dot chart * Added anova argument to plot.Predict to allow overall association test statistics to be added to panels * Fixed bug in val.prob in which the logistic model was re-fitted instead of fixing coefficients at 0,1. This resulted in model statistics (including c-index) to always be favorable even when predictions were worse than change. Thanks: Kirsen Van Hoorde kirsten.vanhoo...@esat.kuleuven.be * Fixed bug in survdiffplot where conf.int was always overridden by value from survfit. Thanks: Kamil Fijorek kamilfijo...@gmail.com * Fixed bug in grid= for survplot.* and survdiffplot. Thanks: Kamil Fijorek * Fixed rms.s to account for possible offset in names(nmiss). Thanks: Larry Hunsicker * Fixed psm.s to not compute Dxy if simple right censoring is not in effect. Thanks: I.M. Nolte * rcs: respect system option fractied, passed to rcspline.eval; can be used to get old behavior * Gls: as nlme 3.1-113 exports more functions, removed nlme::: -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] Hmisc package 3.13-0
A significant update to the Hmisc package is now available on CRAN for all platforms. Hmisc source is now on github at https://github.com/harrelfe/Hmisc and the full change log may be found at https://github.com/harrelfe/Hmisc/commits/master The most important updates are additions of new graphics functions for summarizing and displaying data with an aim of replacing tables. There is also an interface to Duncan Murdoch's tables package for advanced LaTeX table making, allowing Hmisc variable labels and units of measurement to be automatically incorporated into table row or column labels. New functions are overviewed and exemplified at http://biostat.mc.vanderbilt.edu/HmiscNew . There is also a new function latexTherm that writes a LaTeX picture environment for including a series of thermometers inside LaTeX text. This is intended especially for figure captions where the analyst wishes to provide a snapshot of the fraction of observations that are used in the current graphic. The dotchart3 function is an improvement of dotchart2. The rcspline.eval function, used by the rms package's rcs function, has better default knot placement for restricted cubic splines (natural splines) when there are ties in the predictor at the lowest or highest value (e.g., clumping at zero). The most recent updates to Hmisc are * Changed n location (nloc argument) in bpplotM * Improved dotchart3 to better compute string widths when there is a mixture of expressions and regular strings for auxdata/auxtitle * Changed rlegend to not take logs if log axes are in effect. Fixes Ecdf(..., log='x', label.curves=list(keys=1:3)). Thanks: Bayazid Sarker sarkarbaya...@gmail.com * Extended non-panel (regular) version of plsmo to handle matrix y * Likewise for summaryRc * Added xlim to bpplotM * Added latexTherm function to create LaTeX picture environments to add a series of thermometers to LaTeX text * Fixed deff to handle the case where R^2 = 1. Thanks: Matthieu Stigler matthieu.stig...@gmail.com * Added new test file for wtd.mean, wtd.quantile * New test aregImpute3.r for glm Poisson regression * Improved describe.vector to format single unique values * Took aware warning about var, s.e., t, p in fit.mult.impute * Changed wtd.loess.noiter to use loess instead of stats:::simpleLoess -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A problem about nomogram
You did not follow the posting guide, did you use pure ascii email, and used illegal characters in your source code. This caused extra work. Once I cleaned up your characters and made the example self-contained, the labels worked fine for me. Here's the cleaned-up code: library(rms) x1 - runif(20) x2 - runif(20) y - sample(0:1, 20, TRUE) label(x1) - 'A' label(x2) - 'B' ddist - datadist(x1,x2) options(datadist='ddist') f - lrm(y~x1+x2) x-nomogram(f, fun=function(x)1/(1+exp(-x)), fun.at=c(.001,.01,.05,seq(0,1,by=.1),.95,.99,.999),vnames=labels) plot(x) -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Code Book from SPSS Data
Not certain about .por but this works with ordinary SPSS files: require(Hmisc) dat - spss.get(...) # gets variable labels, etc. contents(dat) html(contents(dat), ...) The last command produces a hyperlinked data dictionary, e.g., for each variable the number of levels is given and you click on that number to see the levels. Variables having the same levels are combined in the latter part. Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] missclassification in estimating proportions
This is dichotomania and is an inappropriate use of continuous variables. Use an information-preserving approach such as rank correlation. Also, this is not an R question. Frank --- Assume that I want to compare two methods, say skinfold measurement and BMI, in how they classify subjects into four categories. In a 2x2 table I would simply calculate sensitivity and specificity and NPV an PPV, using the standard formulas, but can the same be directly applied in the 4x4 table? Or are other methods preferable? Thanks in advance Patrick -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Interperting results of glmnet and coxph plot, Brier score and Harrel's C-Index - am I doing something wrong ???
), replace = TRUE) ## random sampling with replacement # create a dataframe for trainSet with time, status and selected variables in binary representation for evaluation in pec reformat_trainSet - reformat_dataSet [train,] # glmnet.cox only with meaningful features selected by stepwise bidirectional AIC feature selection glmnet.cox.meaningful.test - step(coxph(Surv(time,status) ~ .,data=reformat_dataSet),direction=both) selectedVarCox - predict_matrix[,attr(glmnet.cox.meaningful.test$terms,term.labels)] reformat_testSet - as.data.frame(cbind(surv_obj,selectedVarCox)) reformat_testSet - reformat_dataSet [-train,] # compute c-index (Harrell's) for cox-glmnet models if (is.null(glmnet.cox.meaningful)){ cIndexCoxglmnet - c(cIndexCoxglmnet,NULL) }else{ cIndexCoxglmnet - c(cIndexCoxglmnet, 1-rcorr.cens(predict(glmnet.cox.meaningful, reformat_testSet),Surv(reformat_testSet$time,reformat_testSet $status))[1]) } } #Get average C-Index cIndex- mean (unlist(cIndexCoxglmnet),rm.na=TRUE) #create a list of all the objects generated assign (name,c(eval(parse(text=name)),glmnet.cv=list(glmnet.cv),glmnet.obj=li st(glmnet.obj), selectedVar=list(colnames(selectedVar)),glmnet.cox=list(glmnet.cox), glmnet .cox.meaningful=list(glmnet.cox.meaningful),ipec.coxglmnet=list(ipec.c oxglmnet), cIndex=cIndex)) # save image of the workspace after each iteration save.image(final_subgroup_analysissubgroup_analysis.RData) __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Alameda, CA, USA __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Interperting-results-of-glmnet-and-coxph-plot-Brier-score-and-Harrel-s-C-Index-am-I-doing-something--tp4677166p4677175.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regression model for predicting ranks of the dependent variable
require(rms) ?orm# ordinal regression model For a case study see Handouts in http://biostat.mc.vanderbilt.edu/CourseBios330 Since you have lost the original values, one part of the case study will not apply: the use of Mean(). Frank - I have a dataset which has several predictor variables and a dependent variable, score (which is numeric). The score for each row is calculated using a formula which uses some of the predictor variables. But, the score figures are not explicitly given in the dataset. The scores are only arranged in ascending order, and the ranks of the numbers are given (like 1, 2, 3, 4, etc.; rank 1 means that the particular row had the highest score, 2 means it had the second highest score and so on). So, if the data has 100 rows, the output has ranks from 1 to 100. I don't think it would be proper to treat the output column as a numeric one, since it is an ordinal variable, and the distance (difference in scores) between ranks 1 and 2 may not be the same as that between ranks 2 and 3. However, most R regression models for ordinal regression are made for output such as (high, medium, low), where each level of the output does not necessarily correspond to a unique row. In my case, each output (rank) corresponds to a unique row. So please suggest me what models I could use for this problem. Will treating the output as numeric instead of ordinal be a reasonable approximation? Or will the usual models for ordinal regression work on this dataset as well? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regex challenge
Bill I found a workaround: f - ff(formula, lab) f - as.formula(gsub(`, , as.character(deparse(f Thanks for your elegant solution. Frank -- Thanks Bill. The problem is one of the results of convertName might be 'Heading(Age in Years)*age' (this is for the tables package), and as.name converts this to `Heading(...)*age` and the backticks cause the final formula to have a mixture of regular elements and ` ` quoted expression elements, making the formula invalid. Best, Frank --- -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regex challenge
Thanks Bill. The problem is one of the results of convertName might be 'Heading(Age in Years)*age' (this is for the tables package), and as.name converts this to `Heading(...)*age` and the backticks cause the final formula to have a mixture of regular elements and ` ` quoted expression elements, making the formula invalid. Best, Frank --- The following makes the name converter function an argument to ff (and restores the colon operator to the list of formula operators), but I'm not sure what you need the converter to do. ff - function(expr, convertName = function(name)paste0(toupper(name), z)) { if (is.call(expr) is.name(expr[[1]]) is.element(as.character(expr[[1]]), c(~,+,-,*,/,%in%,(, :))) { for(i in seq_along(expr)[-1]) { expr[[i]] - Recall(expr[[i]], convertName = convertName) } } else if (is.name(expr)) { expr - as.name(convertName(expr)) } expr } Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: [hidden email] [mailto:[hidden email]] On Behalf Of Frank Harrell Sent: Thursday, August 15, 2013 7:47 PM To: RHELP Subject: Re: [R] regex challenge Bill that is very impresive. The only problem I'm having is that I want the paste0(toupper(...)) to be a general function that returns a character string that is a legal part of a formula object that can't be converted to a 'name'. Frank --- Oops, I left ( out of the list of operators. ff - function(expr) { if (is.call(expr) is.name(expr[[1]]) is.element(as.character(expr[[1]]), c(~,+,-,*,/,%in%,())) { for(i in seq_along(expr)[-1]) { expr[[i]] - Recall(expr[[i]]) } } else if (is.name(expr)) { expr - as.name(paste0(toupper(as.character(expr)), z)) } expr } ff(a) CATz + (AGEz + Heading(Females) * (sex == Female) * SBPz) * Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() * COUNTRYz * Heading() * SEXz Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: [hidden email] [mailto:[hidden email]] On Behalf Of William Dunlap Sent: Thursday, August 15, 2013 6:03 PM To: Frank Harrell; RHELP Subject: Re: [R] regex challenge Try this one ff - function (expr) { if (is.call(expr) is.name(expr[[1]]) is.element(as.character(expr[[1]]), c(~, +, -, *, /, :, %in%))) { # the above list should cover the standard formula operators. for (i in seq_along(expr)[-1]) { expr[[i]] - Recall(expr[[i]]) } } else if (is.name(expr)) { # the conversion itself expr - as.name(paste0(toupper(as.character(expr)), z)) } expr } ff(a) CATz + (age + Heading(Females) * (sex == Female) * sbp) * Heading() * Gz + (age + sbp) * Heading() * TRIOz ~ Heading() * COUNTRYz * Heading() * SEXz Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: [hidden email] [mailto:[hidden email]] On Behalf Of Frank Harrell Sent: Thursday, August 15, 2013 4:45 PM To: RHELP Subject: Re: [R] regex challenge I really appreciate the excellent ideas from Bill Dunlap and Greg Snow. Both suggestions almost work perfectly. Greg's recognizes expressions such as sex=='female' but not ones such as age 21, age 21, a - b 0, and possibly other legal R expressions. Bill's idea is similar to what Duncan Murdoch suggested to me. Bill's doesn't catch the case when a variable appears both in an expression and as a regular variable (sex in the example below): f - function(formula) { trms - terms(formula) variables - as.list(attr(trms, variables))[-1] ## the 'variables' attribute is stored as a call to list(), ## so we changed the call to a list and removed the first element ## to get the variables themselves. if (attr(trms, response) == 1) { ## terms does not pull apart right hand side of formula, ## so we assume each non-function is to be renamed. responseVars - lapply(all.vars(variables[[1]]), as.name) variables - variables[-1] } else { responseVars - list() } ## omit non-name variables from list of ones to change. ## This is where you could expand calls to certain functions. variables - variables[vapply(variables, is.name, TRUE)] variables - c(responseVars, variables) # all are names now names(variables) - vapply(variables, as.character, ) newVars - lapply(variables, function(v) as.name(paste0(toupper(v), z))) formula(do.call(substitute, list(formula, newVars)), env=environment(formula)) } a - cat
Re: [R] regex challenge
I really appreciate the excellent ideas from Bill Dunlap and Greg Snow. Both suggestions almost work perfectly. Greg's recognizes expressions such as sex=='female' but not ones such as age 21, age 21, a - b 0, and possibly other legal R expressions. Bill's idea is similar to what Duncan Murdoch suggested to me. Bill's doesn't catch the case when a variable appears both in an expression and as a regular variable (sex in the example below): f - function(formula) { trms - terms(formula) variables - as.list(attr(trms, variables))[-1] ## the 'variables' attribute is stored as a call to list(), ## so we changed the call to a list and removed the first element ## to get the variables themselves. if (attr(trms, response) == 1) { ## terms does not pull apart right hand side of formula, ## so we assume each non-function is to be renamed. responseVars - lapply(all.vars(variables[[1]]), as.name) variables - variables[-1] } else { responseVars - list() } ## omit non-name variables from list of ones to change. ## This is where you could expand calls to certain functions. variables - variables[vapply(variables, is.name, TRUE)] variables - c(responseVars, variables) # all are names now names(variables) - vapply(variables, as.character, ) newVars - lapply(variables, function(v) as.name(paste0(toupper(v), z))) formula(do.call(substitute, list(formula, newVars)), env=environment(formula)) } a - cat + (age + Heading(Females) * (sex == Female) * sbp) * Heading() * g + (age + sbp) * Heading() * trio ~ Heading() * country * Heading() * sex f(a) Output: CATz + (AGEz + Heading(Females) * (SEXz == Female) * SBPz) * Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() * COUNTRYz * Heading() * SEXz The method also doesn't work if I replace sex == 'Female' with x3 4, converting to X3z 4. I'm not clear on how to code what kind of expressions to ignore. Thanks! Frank __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regex challenge
Bill that is very impresive. The only problem I'm having is that I want the paste0(toupper(...)) to be a general function that returns a character string that is a legal part of a formula object that can't be converted to a 'name'. Frank --- Oops, I left ( out of the list of operators. ff - function(expr) { if (is.call(expr) is.name(expr[[1]]) is.element(as.character(expr[[1]]), c(~,+,-,*,/,%in%,())) { for(i in seq_along(expr)[-1]) { expr[[i]] - Recall(expr[[i]]) } } else if (is.name(expr)) { expr - as.name(paste0(toupper(as.character(expr)), z)) } expr } ff(a) CATz + (AGEz + Heading(Females) * (sex == Female) * SBPz) * Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() * COUNTRYz * Heading() * SEXz Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: [hidden email] [mailto:[hidden email]] On Behalf Of William Dunlap Sent: Thursday, August 15, 2013 6:03 PM To: Frank Harrell; RHELP Subject: Re: [R] regex challenge Try this one ff - function (expr) { if (is.call(expr) is.name(expr[[1]]) is.element(as.character(expr[[1]]), c(~, +, -, *, /, :, %in%))) { # the above list should cover the standard formula operators. for (i in seq_along(expr)[-1]) { expr[[i]] - Recall(expr[[i]]) } } else if (is.name(expr)) { # the conversion itself expr - as.name(paste0(toupper(as.character(expr)), z)) } expr } ff(a) CATz + (age + Heading(Females) * (sex == Female) * sbp) * Heading() * Gz + (age + sbp) * Heading() * TRIOz ~ Heading() * COUNTRYz * Heading() * SEXz Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: [hidden email] [mailto:[hidden email]] On Behalf Of Frank Harrell Sent: Thursday, August 15, 2013 4:45 PM To: RHELP Subject: Re: [R] regex challenge I really appreciate the excellent ideas from Bill Dunlap and Greg Snow. Both suggestions almost work perfectly. Greg's recognizes expressions such as sex=='female' but not ones such as age 21, age 21, a - b 0, and possibly other legal R expressions. Bill's idea is similar to what Duncan Murdoch suggested to me. Bill's doesn't catch the case when a variable appears both in an expression and as a regular variable (sex in the example below): f - function(formula) { trms - terms(formula) variables - as.list(attr(trms, variables))[-1] ## the 'variables' attribute is stored as a call to list(), ## so we changed the call to a list and removed the first element ## to get the variables themselves. if (attr(trms, response) == 1) { ## terms does not pull apart right hand side of formula, ## so we assume each non-function is to be renamed. responseVars - lapply(all.vars(variables[[1]]), as.name) variables - variables[-1] } else { responseVars - list() } ## omit non-name variables from list of ones to change. ## This is where you could expand calls to certain functions. variables - variables[vapply(variables, is.name, TRUE)] variables - c(responseVars, variables) # all are names now names(variables) - vapply(variables, as.character, ) newVars - lapply(variables, function(v) as.name(paste0(toupper(v), z))) formula(do.call(substitute, list(formula, newVars)), env=environment(formula)) } a - cat + (age + Heading(Females) * (sex == Female) * sbp) * Heading() * g + (age + sbp) * Heading() * trio ~ Heading() * country * Heading() * sex f(a) Output: CATz + (AGEz + Heading(Females) * (SEXz == Female) * SBPz) * Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() * COUNTRYz * Heading() * SEXz The method also doesn't work if I replace sex == 'Female' with x3 4, converting to X3z 4. I'm not clear on how to code what kind of expressions to ignore. Thanks! Frank __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ... [show rest of quote] -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org
[R] regex challenge
I would like to be able to use gsub or gsubfn to process a formula and to translate the variables but to ignore expressions in the formula. Supposing that the R formula has already been transformed into a character string and that the transformation is to convert variable names to upper case and to append z to the names, an example would be to convert y1 + y2 ~ a*(b + c) + d + f * (h == 3) + (sex == 'male')*i to Y1z + Y2z ~ Az*(Bz + Cz) + Dz + Fz * (h == 3) + (sex == 'male')*Iz. Any expression that is not just a simple variable name would be left alone. Does anyone want to try their hand at creating a regex that would accomplish this? Thanks Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rms plot.Predict when type=p
Mike, 2 out of 3 of your requests are handled simply. I've tidied up your code and show how to include extra parameters into the plot( ) call below. The request to change plotting symbol color for each factor level is not consistent with the lattice philosophy for non-superposed variables (here on the x-axis) where the labels tell all. The last 2 lines in the code uses superpositioning for group but the col argument affects the color of confidence bands; I didn't look into that further. Frank Harrell -- require(rms) n = 30 group = factor(sample(c('a','b','c'), n, TRUE)) x1= runif(n) dat = data.frame(group, x1, y = as.numeric(group) + 0.2*x1 + rnorm(n) ) d - datadist(dat) ; options(datadist=d) f - ols(y ~ x1 + group, data=dat) p - Predict(f, group) plot(p, ~group, nlines=TRUE, type='p', ylab='fitted Y', xlab='Treatment', pch=4, lwd=3) p - Predict(f, x1=seq(0,1,by=.1), group) plot(p, ~ x1, groups='group', col=3:1) -- Hi all, I'm trying to change the pch, color, lty, and lwd in a type=p plot produced by plot.Predict in the rms package. What I'm shooting for is a separate plotting symbol symbol color for each level of a factor (and also to manage the line width for the CIs). Here's what I hope is a working example #make up some data . . . Mike Babyak Department of Psychiatry and Behavioral Sciences __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Does a general latex table-making function exist?
Our Hmisc package summary.formula function and its latex methods can make some fairly advanced tables. But the tables have to be regular. For example, all rows of the tables are based on the same data frame. I'm thinking that what is needed is a ggplot2-like set of functions for building a table row-by-row or row-by-block of rows. Different row blocks could have different denominators, e.g., the first part of the table might be on everyone and a latter block of rows be for females, with different summary statistics computed. Has anyone already written functions creating LaTeX markup with such functionality? Thanks Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Does a general latex table-making function exist?
Duncan, I had read your excellent tables package vignette at http://cran.r-project.org/web/packages/tables/vignettes/tables.pdf when it first came out. It is extremely impressive. I'm glad to be reminded to give it another look. Is there a way to make the special symbols n and 1 refer to the number of non-missing observations rather than the length of a vector? Do you feel like taking on this challenge? An example of an irregular table I'm thinking of is the following Females Males Q1 Med Q3 (n) Q1 Med Q3 (n) Age 25 49 63 (1016) 26 50 64 (1767) Canadians Weight (kg) 57 63 74 ( 243) 67 73 90 ( 401) Canadians could mean country=='Canada'. Thanks! Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Jul 26, 2013; 12:34am
Thanks Rich and Jim and apologies for omitting the line x - c(285, 43.75, 94, 150, 214, 375, 270, 350, 41.5, 210, 30, 37.6, 281, 101, 210) But the fundamental problem remains that vertical spacing is not correct unless I waste a lot of image space at the top. Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can't figure out why short figure won't work
[Sorry I can't quote past messages as I get mail on Nabble and have been told I can't reply through Nabble]. Thanks Kennel for recommended a narrowing range for usr y-limits. That does help quite a bit. But I found a disappointing aspect of the graphics system: When you change height= on the device call you have to change the y-coordinates to keep the same absolute vertical positioning. Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Can't figure out why short figure won't work
I can't get the points and a and b to render correctly (they are squeezed into the axis) unless I make the height of the figure waste a lot of space. I'd appreciate any ideas. The code is below. Thanks! png('/tmp/z.png', width=480, height=100) par(mar=c(3,.5,1,.5)) plot.new() par(usr=c(-10, 410, -.04, 1.04)) par(mgp=c(1.5,.5,0)) axis(1, at=seq(0, 400, by=50)) axis(1, at=seq(0, 400, by=25), tcl=-.25, labels=FALSE) title(xlab='Mean Count') points(x, rep(-.02, length(x))) lines(rep(.25*1500, 2), c(-.04, .04), col='blue') lines(rep(.196*1500, 2), c(-.04, .04), col='blue') text(.25*1500, .07, 'a', col='blue') text(.196*1500,.07, 'b', col='blue') dev.off() It works if I use height=350. Frank version _ platform x86_64-pc-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 3 minor 0.1 year 2013 month 05 day16 svn rev62743 language R version.string R version 3.0.1 (2013-05-16) nickname Good Sport -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] Major Update to rms package
The rms (Regression Modeling Strategies) package has undergone a massive update. The entire list of updates is at the bottom of this note. CRAN has the update for linux and will soon have it for Windows and Mac - check http://cran.r-project.org/web/packages/rms/ for availability. This rms update relies on a major update of the Hmisc package. The most user-visible changes are: * Addition of a major new modeling function orm (ordinal regression model) that supports 5 distribution families (one being the logistic, for proportional odds models) and is meant for continuous Y. Sparse matrix operations (helped greatly by the SparseM package) allows thousands of intercepts to be fitted when there are thousands of unique Y values. New function generators Mean, Quantile, and ExProb (exceedance probability distribution) have been added. A detailed case study is in http://biostat.mc.vanderbilt.edu/CourseBios330 Chapter 15. Ordinal regression is now a direct competitor to linear models, and far more robust, even allowing spikes in the distribution of Y. * More generality and ease of obtaining bootstrap confidence limits for all estimands (predictions, contrasts). The basic bootstrap is now implemented and tends to work better than the percentile bootstrap in terms of confidence coverage. * Change of Surv to Srv * Added tk/tcl progress bars for bootstrap and other repeated calculations. This can be turned off by specifying options(showprogress=FALSE) or options(showprogress='console') to use cat(). Use options(showevery=50) to update the progress bar only every 50 iterations. * Made all design matrices stored in model fits exclude any intercepts, with columns of ones added as needed with Predict() etc. * Dxy and c-index are now calculated with Therneau's survival package survConcordance functions which is blazing fast so can be used routinely in all model validations that used Dxy * plot.summary.rms now produces cleaner output with fewer confidence levels * Several bug corrections - Changes in version 4.0-0 (2013-07-10) * Cleaned up label logic in Surv, made it work with interval2 (thanks:Chris Andrews) * Fixed bug in val.prob - wrong denominator for Brier score if obs removed for logistic calibration * Fixed inconsistency in predictrms where predict() for Cox models used a design matrix that was centered on medians and modes rather than means (thanks: David van Klaveren d.vanklavere...@erasmusmc.nl) * Added mean absolute prediction error to Rq output * Made pr argument passed to predab.resample more encompassing * Fixed logLik method for ols * Made contrast.rms and summary.rms automatically compute bootstrap nonparametric confidence limits if fit was run through bootcov * Fixed bug in Predict where conf.type='simultaneous' was being ignored if bootstrap coefficients were present * For plot.Predict made default gray scale shaded confidence bands darker * For bootcov exposed eps argument to fitters and default to lower value * Fixed bug in plot.pentrace regarding effective.df plotting * Added setPb function for pop-up progress bars for simulations; turn off using options(showprogress=FALSE) or options(showprogress='console') * Added progress bars for predab.resample (for validate, calibrate) and bootcov * Added bootBCa function * Added seed to bootcov object * Added boot.type='bca' to Predict, contrast.rms, summary.rms * Improved summary.rms to use t critical values if df.residual defined * Added simultaneous contrasts to summary.rms * Fixed calculation of Brier score, g, gp in lrm.fit by handling special case of computing linear predictor when there are no predictors in the model * Fixed bug in prModFit preventing successful latex'ing of penalized lrms * Removed \synopsis from two Rd files * Added prmodsel argument to predab.resample * Correct Rd files to change Design to rms * Restricted NAMESPACE to functions expected to be called by users * Improved Fortran code to use better dimensions for array declarations * Added the basic bootstrap for confidence limits for bootBCa, contrast, Predict, summary * Fixed bug in latex.pphsm, neatened pphsm code * Neatened code in rms.s * Improved code for bootstrapping ranks of variables in anova.rms help file * Fixed bug in Function.rms - undefined Nam[[i]] if strat. Thanks: douglaswilk...@yahoo.com * Made quantreg be loaded at end of search list in Rq so it doesn't override latex generic in Hmisc * Improved plot.summary.rms to use blue of varying transparency instead of polygons to show confidence intervals, and to use only three confidence levels by default: 0.9 0.95 0.99 * Changed Surv to Srv; use of Surv in fitting functions will result in lack of time labels and assumption of Day as time unit; no longer override Surv in
[R] Appropriate forum for announcing R package updates
I have been confused about the appropriate e-mail address to use to make announcements to r-help for major package update. In the past I've submitted to r-packa...@lists.r-project.org without seeing the announcement appear on r-help. Thanks for any guidance. Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Appropriate forum for announcing R package updates
Thank you Marc Frank Marc Schwartz-3 wrote On Jul 10, 2013, at 1:29 PM, Frank Harrell lt; f.harrell@ gt; wrote: I have been confused about the appropriate e-mail address to use to make announcements to r-help for major package update. In the past I've submitted to R-packages@.r-project without seeing the announcement appear on r-help. Thanks for any guidance. Frank Hi Frank, R-packages (https://stat.ethz.ch/mailman/listinfo/r-packages) should be the correct list for those announcements. A quick check of a few recent posts shows that they are being forwarded to R-Help as well. It is however a low volume list. 116 posts in 2010, 77 in 2011, 72 in 2012 and 25 so far in 2013. The trend would seem to be downward. Regards, Marc Schwartz __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Appropriate-forum-for-announcing-R-package-updates-tp4671238p4671242.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quantile Regression/(package (quantreg))
Mike, Do something like: require(rms) dd - datadist(mydatarame); options(datadist='dd') f - Rq(y ~ rcs(age,4)*sex, tau=.5) # use rq function in quantreg summary(f) # inter-quartile-range differences in medians of y (b/c tau=.5) plot(Predict(f, age, sex)) # show age effect on median as a continuous variable For more help type ?summary.rms and ?Predict Frank When performing quantile regression (r package I used quantreg), the value of the quantile refers to the quantile value of the dependent variable. Typically when trying to predict, since the information we have are the independent variables, I am interested in trying to estimate the coefficients based on the quantile values of the independent variables' distribution. So that I can get an understanding, for certain ranges of the predictor/independent variable values, the (target/dependent variable) has (a certain level of exposure to the predictors)/(coefficients). Is there any way I can achieve that? Just in case, if I am incorrect about my understanding on the way quantiles are interpreted when using the package quantreg, please let me know. Thanks Mike -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Nomogram (rms) for model with shrunk coefficients
You are using an informal shrinkage method. It is much better to use penalized maximum likelihood estimation, built in to lrm. If you really want to go the informal route, compute the linear predictor from your final estimates and use ols( ) to predict that from the component variables (you'll get an R^2 of 1.0 so specify sigma= to ols) and use nomogram on the result. Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] stepwise discriminant analysis using wilks lambda
Unless you have detailed simulations to back up the performance of this method I would avoid it. It violates several statistical principles. Frank Hari wrote Hello R geeks, Waiting for an reply. Thanks, Hari - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/stepwise-discriminant-analysis-using-wilks-lambda-tp4668817p4669235.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rmean in survfit
One approach is to use the rms package's cph and Mean.cph functions. Mean.cph (cph calls coxph and can compute Kaplan-Meier and other survival estimates) can compute mean restricted life. Frank Dinesh W wrote I am using survfit to generate a survival curve. My population is such that my x axis is in days and i have a starting population of say 10,000 of which say only 2000 are left as of day 365. When I try to print rmean it does not print and in any case I am not interested in that. I actually want to sum up all the daily survival values starting at 1.0 (S_0) for day 1 through 0.2 (S_365) through day 365. I then want to assume r% retained each day and compute the remaining integral as sum of a geometric series through infinity ( (S_365 * r) * 1/(1-r) ) or through a specific future time period (730 for example in which case the summation portion 1/ (1-r) would change). Is there an easy way to do this? - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/rmean-in-survfit-tp4667668p4667708.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weighted regression in rms/Hmisc
You should be able to pass a weights argument through fit.mult.impute to ols. I assume that weights are with respect to variables that you, for some reason, do not want to have as predictors. Otherwise you are already conditioning on the weighting factors and no weights are needed. Frank Mullah Abu Shadeque wrote Hi all, While using the following package and code: library(rms) library(Hmisc) test.trans-aregImpute( ~ NearestWeekGestation + MaternalBMI + MomAge_Years + WtGain + ethnicity , n.impute=10 ,data=data.1) test.mod-fit.mult.impute(BirthWeight_g ~ rcs(NearestWeekGestation,4) + rcs(MaternalBMI,4) + rcs(MomAge_Years,4) + rcs(WtGain,4) + ethnicity ,fitter=ols,xtrans=test.trans,data=data.1) How can I specify for weighted OLS here where I have a defined weight vector w? I would be highly helped and grateful to you if you kindly reply to my mail. With Regards, SADIK [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Weighted-regression-in-rms-Hmisc-tp4667437p4667446.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to do an external validation with R
Splitting one dataset into training and test is still internal validation, and it requires an enormous sample size (around 20,000 typically) in order to be competitive with the bootstrap. Note that the Design package has been replaced with rms, and rms has two functions for external validation (val.prob and val.surv, should you have had external data unlike your case) and many more for internal validation. Frank beginner wrote I would like to do external validation using R software. So far I have used packages like Design and DAAG. However they perform internal validation rather than external one. In order to perform external validation I would have to split my data beforehand into training and test set, leave the test set on the site and use only the training set to select a model. I would then test the selected model with the sample set left initially on the site. I would like to repeat this process several times to make sure that all the samples are included at least once in a test set. I thought that I need to use a loop function in R to perform this process automatically. As I am new to R I don't know how to make a loop. Could you please help me with this or suggest an R package ? I would be very very grateful for help with this task ! - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/how-to-do-an-external-validation-with-R-tp4667404p4667426.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Tiff plot Resolution issues
Why do you need TIFF? I've never seen a science journal that claims they want TIFF figure submissions who are really serious about that. This is a very wasteful format. Most journals want PDF. Frank Aldo wrote I am trying to create a high resolution tiff. It is not working. I am on Windows XP 32-bit R 3.0.0 Code input: tiff(file=test.tiff,width=6.83,height=6.83,units=in, res=1200) Return Message: Error in tiff(file = test.tiff, width = 6.83, height = 6.83, units = in, : unable to start tiff() device In addition: Warning messages: 1: In tiff(file = test.tiff, width = 6.83, height = 6.83, units = in, : unable to allocate bitmap 2: In tiff(file = test.tiff, width = 6.83, height = 6.83, units = in, : opening device failed Also capabilities(tiff) returns TRUE and It must be a tiff, that is what the journal requires. Thank You -- Michael V. Clawson Predoctoral Research Associate School of Environmental and Forest Sciences, University of Washington [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Tiff-plot-Resolution-issues-tp4665945p4665948.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert continuous variable into discrete variable
It is important to check for lack of fit of the categorized variable. One way to do this is to test for the additional predictive ability of the original continuous variable after adjusting for its categorized version. It is very uncommon for a categorized continuous variable to fit well, because its assumed discontinuities seldom exist in nature and most relationships are not piecewise flat. Frank levanovd wrote Or even simpler (no need to specify labels): x-runif(100,0,100) u - cut(x, breaks = c(0, 3, 4.5, 6, 8, Inf), labels = FALSE) - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Convert-continuous-variable-into-discrete-variable-tp3671032p4665699.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Stepwise regression for multivariate case in R?
Since stepwise methods do not work as advertised in the univariate case I'm wondering why they should work in the multivariate case. Frank Jonathan Jansson wrote Hi! I am trying to make a stepwise regression in the multivariate case, using Wilks' Lambda test. I've tried this: greedy.wilks(cbind(Y1,Y2) ~ . , data=my.data ) But it only returns: Error in model.frame.default(formula = X[, j] ~ grouping, drop.unused.levels = TRUE) : variable lengths differ (found for 'grouping') What can be wrong here? I have checked and all variables in my.data is of the same length. //Jonathan [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Stepwise-regression-for-multivariate-case-in-R-tp4665505p4665526.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hosmer Lemeshow test
The H-L test is now considered to be obsolete by many. One replacement is the Hosmer - le Cessie test as implemented in the rms package residuals.lrm function. This is a one degree of freedom test. One problem with H-L is its use of arbitrary binning and suboptimal power. The new test requires no binning at all. Probably better still are directed tests such as tests of linearity and additivity. Frank David Carlson wrote The warning is commented out or it would have been printed: # warning(Some expected counts are less than 5. Use smaller number of groups) For 23 data points, the default of 10 bins is too many since one of the bins is 0. Eight bins gives the warning, but prints results. You didn't indicate how many bins SPSS used. - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-bounces@ [mailto: r-help-bounces@ ] On Behalf Of Endy BlackEndy Sent: Tuesday, April 23, 2013 10:31 AM To: r-help@ Subject: [R] Hosmer Lemeshow test Hi to everybody. I use the following routine (i found it in the internet) to compute the Hosmer-Lemeshow test in the framework of logistic regression. hosmerlemeshow = function(obj, g=10) { # first, check to see if we fed in the right kind of object stopifnot(family(obj)$family==binomial family(obj)$link==logit) y = obj$model[[1]] # the double bracket (above) gets the index of items within an object if (is.factor(y)) y = as.numeric(y)==2 yhat = obj$fitted.values cutyhat=cut(yhat,quantile(yhat,0:g/g),include.lowest=TRUE) obs = xtabs(cbind(1 - y, y) ~ cutyhat) expect = xtabs(cbind(1 - yhat, yhat) ~ cutyhat) if (any(expect 5)) # warning(Some expected counts are less than 5. Use smaller number of groups) chisq = sum((obs - expect)^2/expect) P = 1 - pchisq(chisq, g - 2) cat(H-L2 statistic,round(chisq,4), and its p value,P,\n) # by returning an object of class htest, the function will perform like the # built-in hypothesis tests return(structure(list( method = c(paste(Hosmer and Lemeshow goodness-of-fit test with, g, bins, sep= )), data.name = deparse(substitute(obj)), statistic = c(X2=chisq), parameter = c(df=g-2), p.value = P ), class='htest')) return(list(chisq=chisq,p.value=P)) } However when i run it with the data set below i get the value NaN. Using the same data set with SPSS i get a specific value. FlightNo Temp ThermalDisast 1 66 0 2 70 1 3 69 0 4 68 0 5 67 0 6 72 0 7 73 0 8 70 0 9 57 1 10 63 1 11 70 1 12 78 0 13 67 0 14 53 1 15 67 0 16 75 0 17 70 0 18 81 0 19 76 0 20 79 0 21 75 1 22 76 0 23 58 1 Any suggestions will greatly appreciated. With regards Endy [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Hosmer-Lemeshow-test-tp4665115p4665154.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] NAMESPACE and imports
I am cleaning up the rms package to not export functions not to be called directly by users. rms uses generic functions defined in other packages. For example there is a latex method in the Hmisc package, and rms has a latex method for objects of class anova.rms so there are anova.rms and latex.anova.rms functions in rms. I use: export(asis,bj,bjplot,bootBCa,bootcov,bootplot,bplot,calibrate,cph,catg,combineRelatedPredictors,confplot,contrast,coxphFit,cph,cr.setup,datadist,effective.df,fastbw,formatNP,gendata,gIndex,GiniMd,Glm,Gls,groupkm,Hazard,hazard.ratio.plot,histdensity,%ia%,ie.setup,interactions.containing,legend.nomabbrev,lm.pfit,lrm,lrtest,lsp,matinv,matrx,Newlabels,Newlevels,nomogram,num.intercepts,ols,ols.influence,oos.loglik,pantext,Penalty.matrix,Penalty.setup,pentrace,perimeter,perlcode,plot.xmean.ordinaly,pol,pphsm,predab.resample,Predict,psm,rcs,related.predictors,reVector,robcov,Rq,sascode,scored,sensuc,setPb,show.influence,specs,strat,Surv,[.Surv,survdiffplot,survest,Survival,survplot,univarLR,validate,val.prob,val.probg,val.surv,vif,which.influence) importFrom(Hmisc) S3method(anova, rms) S3method(latex, anova.rms, bj, cph, Glm, Gls, lrm, naprint.delete, ols, pphsm, psm, rms, Rq, summary.rms, validate) When doing R CMD INSTALL I get: [using R 2.15.3] ** preparing package for lazy loading Error : c(bad 'S3method' directive: S3method(latex, anova.rms, bj, cph, Glm, Gls, lrm, naprint.delete, , bad 'S3method' directive: ols, pphsm, psm, rms, Rq, summary.rms, validate)) ERROR: lazy loading failed for package ‘rms’ Any advice appreciated. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/NAMESPACE-and-imports-tp4664718.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NAMESPACE and imports
Right, I should have said import(Hmisc) instead of importFrom(Hmisc), but that does not explain the error message. Blaser Nello wrote Not sure this fixes your problem, but as far as I can know (and can tell from the manual: http://cran.r-project.org/doc/manuals/r-release/R-exts.pdf), importFrom needs to know what functions you are importing [e.g. importFrom(Hmisc, latex) importFrom(stats, anova)]. -Original Message- From: r-help-bounces@ [mailto: r-help-bounces@ ] On Behalf Of Frank Harrell Sent: Freitag, 19. April 2013 16:26 To: r-help@ Subject: [R] NAMESPACE and imports I am cleaning up the rms package to not export functions not to be called directly by users. rms uses generic functions defined in other packages. For example there is a latex method in the Hmisc package, and rms has a latex method for objects of class anova.rms so there are anova.rms and latex.anova.rms functions in rms. I use: export(asis,bj,bjplot,bootBCa,bootcov,bootplot,bplot,calibrate,cph,catg,combineRelatedPredictors,confplot,contrast,coxphFit,cph,cr.setup,datadist,effective.df,fastbw,formatNP,gendata,gIndex,GiniMd,Glm,Gls,groupkm,Hazard,hazard.ratio.plot,histdensity,%ia%,ie.setup,interactions.containing,legend.nomabbrev,lm.pfit,lrm,lrtest,lsp,matinv,matrx,Newlabels,Newlevels,nomogram,num.intercepts,ols,ols.influence,oos.loglik,pantext,Penalty.matrix,Penalty.setup,pentrace,perimeter,perlcode,plot.xmean.ordinaly,pol,pphsm,predab.resample,Predict,psm,rcs,related.predictors,reVector,robcov,Rq,sascode,scored,sensuc,setPb,show.influence,specs,strat,Surv,[.Surv,survdiffplot,survest,Survival,survplot,univarLR,validate,val.prob,val.probg,val.surv,vif,which.influence) importFrom(Hmisc) S3method(anova, rms) S3method(latex, anova.rms, bj, cph, Glm, Gls, lrm, naprint.delete, ols, pphsm, psm, rms, Rq, summary.rms, validate) When doing R CMD INSTALL I get: [using R 2.15.3] ** preparing package for lazy loading Error : c(bad 'S3method' directive: S3method(latex, anova.rms, bj, cph, Glm, Gls, lrm, naprint.delete, , bad 'S3method' directive: ols, pphsm, psm, rms, Rq, summary.rms, validate)) ERROR: lazy loading failed for package ‘rms’ Any advice appreciated. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/NAMESPACE-and-imports-tp4664718.html Sent from the R help mailing list archive at Nabble.com. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/NAMESPACE-and-imports-tp4664718p4664728.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NAMESPACE and imports
Now I see it. S3method() wants two arguments so I need to create multiple S3method() statements for each generic. Frank Frank Harrell wrote Right, I should have said import(Hmisc) instead of importFrom(Hmisc), but that does not explain the error message. Blaser Nello wrote Not sure this fixes your problem, but as far as I can know (and can tell from the manual: http://cran.r-project.org/doc/manuals/r-release/R-exts.pdf), importFrom needs to know what functions you are importing [e.g. importFrom(Hmisc, latex) importFrom(stats, anova)]. -Original Message- From: r-help-bounces@ [mailto: r-help-bounces@ ] On Behalf Of Frank Harrell Sent: Freitag, 19. April 2013 16:26 To: r-help@ Subject: [R] NAMESPACE and imports I am cleaning up the rms package to not export functions not to be called directly by users. rms uses generic functions defined in other packages. For example there is a latex method in the Hmisc package, and rms has a latex method for objects of class anova.rms so there are anova.rms and latex.anova.rms functions in rms. I use: export(asis,bj,bjplot,bootBCa,bootcov,bootplot,bplot,calibrate,cph,catg,combineRelatedPredictors,confplot,contrast,coxphFit,cph,cr.setup,datadist,effective.df,fastbw,formatNP,gendata,gIndex,GiniMd,Glm,Gls,groupkm,Hazard,hazard.ratio.plot,histdensity,%ia%,ie.setup,interactions.containing,legend.nomabbrev,lm.pfit,lrm,lrtest,lsp,matinv,matrx,Newlabels,Newlevels,nomogram,num.intercepts,ols,ols.influence,oos.loglik,pantext,Penalty.matrix,Penalty.setup,pentrace,perimeter,perlcode,plot.xmean.ordinaly,pol,pphsm,predab.resample,Predict,psm,rcs,related.predictors,reVector,robcov,Rq,sascode,scored,sensuc,setPb,show.influence,specs,strat,Surv,[.Surv,survdiffplot,survest,Survival,survplot,univarLR,validate,val.prob,val.probg,val.surv,vif,which.influence) importFrom(Hmisc) S3method(anova, rms) S3method(latex, anova.rms, bj, cph, Glm, Gls, lrm, naprint.delete, ols, pphsm, psm, rms, Rq, summary.rms, validate) When doing R CMD INSTALL I get: [using R 2.15.3] ** preparing package for lazy loading Error : c(bad 'S3method' directive: S3method(latex, anova.rms, bj, cph, Glm, Gls, lrm, naprint.delete, , bad 'S3method' directive: ols, pphsm, psm, rms, Rq, summary.rms, validate)) ERROR: lazy loading failed for package ‘rms’ Any advice appreciated. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/NAMESPACE-and-imports-tp4664718.html Sent from the R help mailing list archive at Nabble.com. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/NAMESPACE-and-imports-tp4664718p4664744.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the joy of spreadsheets (off-topic)
What a terrific article. Thanks for sharing! The more we critically examine how research is actually done the more frightened we become. Frank -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Composite Quantile Regression
Does anyone know of R functions for doing composite quantile regression (Hou and Yuan Ann Stat 36:1108, 2008)? The paper's authors do not talk about software in their paper or on their web sites. Thanks Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Composite-Quantile-Regression-tp4663463.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Model Selection based on individual t-values with the largest possible number of variables in regression
To say that these strategies represent bad statistical practice is to put it mildly. Frank mister_O wrote Dear R-Community, When writing my master thesis, I faced with difficult issue. Analyzing the capital structure determinants I have one dependent variable (Total debt ratio = TD) and 15 independent ones. At the first stage I normalized my data by deleting outliers from each variable (Pairwise deletion) and in the result I got every variable to be with different length. Now when selecting relevant variables for the best model, neither stepwise nor forward or backward procedures don't work perfectly since there are a number of other combinations of variables wich have also high t-values. Thus, wichever model I pick, you never know whether this model is trustworthy. I tried to calculate all possible combinations of independent variables, but since I have 15 ones, there are thousands of such combinations and R simply refuses to calculate them! (computer crashes) I wonder if you can help me to write the code in R in order to find the model wich include as many variables as it possible with significant t-values? cheers, Oleg - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Model-Selection-based-on-individual-t-values-with-the-largest-possible-number-of-variables-in-regresn-tp4663174p4663202.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Re-order variables listed in nomogram?
nomogram does not have an option for specifying a new order for the variables. With some exceptions you should be able to reorder the variables in the model formula to achieve the desired result. Frank Eleni Rapsomaniki-3 wrote Hi, I am using the function nomogram in the rms package for survival analysis. How is the order in which variables determined and how can I change it? I use it with a cph() model. Many Thanks Eleni Rapsomaniki Clinical Epidemiology Group Department of Epidemiology and Public Health University College London __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Re-order-variables-listed-in-nomogram-tp4662070p4662139.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Export R generated tables and figures to MS Word
The best rendering of advanced tables is done by converting from pdf to Word. See http://biostat.mc.vanderbilt.edu/SweaveConvert Frank Robert Baer wrote R2wd (http://cran.r-project.org/web/packages/R2wd/R2wd.pdf lt;http://cran.r-project.org/web/packages/R2wd/R2wd.pdfgt;) might do what you want. Rob On 3/12/2013 7:02 PM, Santosh wrote: Dear Rxperts, I am aware of Sweave that generates reports into a pdf, but do know of any tools to generate to export to a MS Word document... Is there a way to use R to generate and export report/publication quality tables and figures and export them to MS word (for reporting purposes)? Thanks so much, Santosh [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Robert W. Baer, Ph.D. Professor of Physiology Kirksille College of Osteopathic Medicine A. T. Still University of Health Sciences Kirksville, MO 63501 USA [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Export-R-generated-tables-and-figures-to-MS-Word-tp4661132p4661180.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accuracy some classifiers
Which accuracy score are you using? If it is proportion correctly classified, that is a misleading improper scoring rule. Frank rkok wrote I am using machine learning for one researching. I am using some classifiers with 5-fold CV . I would like to know how it is possible to extract the accuracy, for example, for KNN,neural networks and J48, for each one of 5-fold because when I apply CV to my classifier, I obtain the mean accuracy of 5-fold but each accuracy/error of each fold is not returned. Any help is welcome and grateful. Thanks in advance! Regards!! - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Accuracy-some-classifiers-tp4661109p4661181.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bootstrap BCa confidence limits with your own resamples
I like to bootstrap regression models, saving the entire set of bootstrapped regression coefficients for later use so that I can get confidence limits for a whole set of contrasts derived from the coefficients. I'm finding that ordinary bootstrap percentile confidence limits can provide poor coverage for odds ratios for binary logistic models with small N. So I'm exploring BCa confidence intervals. Because the matrix of bootstrapped regression coefficients is generated outside of the boot packages' boot() function, I need to see if it is possible to compute BCa intervals using boot's boot.ci() function after constructing a suitable boot object. The function below almost works, but it seems to return bootstrap percentile confidence limits for BCa limits. The failure is probably due to my being new to BCa limits and not understanding the inner workings. Has anyone successfully done something similar to this? Thanks very much, Frank bootBCa - function(estimate, estimates, n, conf.int=0.95) { require(boot) if(!exists('.Random.seed')) runif(1) w - list(sim= 'ordinary', stype = 'i', t0 = estimate, t = as.matrix(estimates), R = length(estimates), data= 1:n, strata = rep(1, n), weights = rep(1/n, n), seed = .Random.seed, statistic = function(...) 1e10, call = list('rigged', weights='junk')) np - boot.ci(w, type='perc', conf=conf.int)$percent m - length(np) np - np[c(m-1, m)] bca - tryCatch(boot.ci(w, type='bca', conf=conf.int), error=function(...) list(fail=TRUE)) if(length(bca$fail) bca$fail) { bca - NULL warning('could not obtain BCa bootstrap confidence interval') } else { bca - bca$bca m - length(bca) bca - bca[c(m-1, m)] } list(np=np, bca=bca) } - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Bootstrap-BCa-confidence-limits-with-your-own-resamples-tp4661045.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bootstrap BCa confidence limits with your own resamples
That's very helpful John. Did you happen to run a test to make sure that boot.ci(..., type='bca') in fact gives the BCa intervals or that they at least disagree with percentile intervals? Frank John Fox wrote Dear Frank, I'm not sure that it will help, but you might look at the bootSem() function in the sem package, which creates objects that inherit from boot. Here's an artificial example: -- snip -- library(sem) for (x in names(CNES)) CNES[[x]] - as.numeric(CNES[[x]]) model.cnes - specifyModel() F - MBSA2, lam1 F - MBSA7, lam2 F - MBSA8, lam3 F - MBSA9, lam4 F - F, NA, 1 MBSA2 - MBSA2, the1 MBSA7 - MBSA7, the2 MBSA8 - MBSA8, the3 MBSA9 - MBSA9, the4 sem.cnes - sem(model.cnes, data=CNES) summary(sem.cnes) set.seed(12345) # for reproducibility system.time(boot.cnes - bootSem(sem.cnes, R=5000)) class(boot.cnes) boot.ci(boot.cnes) -- snip -- I hope this helps, John -Original Message- From: r-help-bounces@ [mailto:r-help-bounces@r- project.org] On Behalf Of Frank Harrell Sent: Tuesday, March 12, 2013 11:27 AM To: r-help@ Subject: [R] Bootstrap BCa confidence limits with your own resamples I like to bootstrap regression models, saving the entire set of bootstrapped regression coefficients for later use so that I can get confidence limits for a whole set of contrasts derived from the coefficients. I'm finding that ordinary bootstrap percentile confidence limits can provide poor coverage for odds ratios for binary logistic models with small N. So I'm exploring BCa confidence intervals. Because the matrix of bootstrapped regression coefficients is generated outside of the boot packages' boot() function, I need to see if it is possible to compute BCa intervals using boot's boot.ci() function after constructing a suitable boot object. The function below almost works, but it seems to return bootstrap percentile confidence limits for BCa limits. The failure is probably due to my being new to BCa limits and not understanding the inner workings. Has anyone successfully done something similar to this? Thanks very much, Frank bootBCa - function(estimate, estimates, n, conf.int=0.95) { require(boot) if(!exists('.Random.seed')) runif(1) w - list(sim= 'ordinary', stype = 'i', t0 = estimate, t = as.matrix(estimates), R = length(estimates), data= 1:n, strata = rep(1, n), weights = rep(1/n, n), seed = .Random.seed, statistic = function(...) 1e10, call = list('rigged', weights='junk')) np - boot.ci(w, type='perc', conf=conf.int)$percent m - length(np) np - np[c(m-1, m)] bca - tryCatch(boot.ci(w, type='bca', conf=conf.int), error=function(...) list(fail=TRUE)) if(length(bca$fail) bca$fail) { bca - NULL warning('could not obtain BCa bootstrap confidence interval') } else { bca - bca$bca m - length(bca) bca - bca[c(m-1, m)] } list(np=np, bca=bca) } - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Bootstrap- BCa-confidence-limits-with-your-own-resamples-tp4661045.html Sent from the R help mailing list archive at Nabble.com. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Bootstrap-BCa-confidence-limits-with-your-own-resamples-tp4661045p4661077.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SAS and R complement each other
I'm not sure why you posted the original note. I quit using SAS in 1991 and haven't needed it yet. Frank RogerJDeAngelis wrote Sorry about the double post. But I keep getting 'post' rejections, so I resubmitted about an hour later. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/SAS-and-R-complement-each-other-tp4660157p4660190.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple left hand side variables in a formula
Achim this is perfect. I had not seen Formula before. Thanks for writing it! Frank Achim Zeileis-4 wrote On Fri, 1 Mar 2013, Frank Harrell wrote: Thank you Bill. A temporary re-arrangement of the formula will allow me to do the usual subset= na.action= processing afterwards. Nice idea. I don't need the dot notation very often for this application. That's what the Formula package provides. It allows for multiple parts and multiple responses on both side of the ~. Internally, the formula is decomposed and separate terms are produced. And using an auxiliary formula it is assured that a single model frame (with unified NA processing). And all of this is hidden from the user by providing methods that are as standard as possible, see: vignette(Formula, package = Formula) hth, Z Frank William Dunlap wrote I don't know how much of the information that model.frame supplies you need, but you could make a data.frame containing all the variables on both sides of them formula by changing lhs~rhs into ~lhs+rsh before calling model.frame. E.g., f - function (formula) { if (length(formula) == 3) { # has left hand side envir - environment(formula) formula - formula(call(~, call(+, formula[[2]], formula[[3]]))) environment(formula) - envir } formula } This doesn't quite take care of the wild-card dot in the formula: straight variables are omitted from dot's expansion but functions of variables are not: colnames(model.frame(f(log(mpg)+hp ~ .), data=mtcars)) [1] log(mpg) hp mpg cyl [5] disp drat wt qsec [9] vs am gear carb Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-bounces@ [mailto: r-help-bounces@ ] On Behalf Of Frank Harrell Sent: Friday, March 01, 2013 4:17 PM To: r-help@ Subject: [R] Multiple left hand side variables in a formula The lattice package uses special logic to allow for multiple left-hand-side variables in a formula, e.g. y1 + y2 ~ x. Is there an elegant way to do this outside of lattice? I'm trying to implement a data summarization function that logically takes multiple dependent variables. The usual invocation of model.frame( ) causes R to try to do arithmetic addition to create a single dependent variable. Thanks Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Multiple-left-hand-side- variables-in-a-formula-tp4660060.html Sent from the R help mailing list archive at Nabble.com. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Multiple-left-hand-side-variables-in-a-formula-tp4660060p4660065.html Sent from the R help mailing list archive at Nabble.com. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Multiple-left-hand-side-variables-in-a-formula-tp4660060p4660080.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple left hand side variables in a formula
Hi Gabor, This is not for a regression function but for a major update I'm working on for the summary.formula function in the Hmisc package. So I need to handle several data types in the formula. Thanks Frank Gabor Grothendieck wrote Gabor Grothendieck wrote On Fri, Mar 1, 2013 at 7:16 PM, Frank Harrell lt; f.harrell@ gt; wrote: The lattice package uses special logic to allow for multiple left-hand-side variables in a formula, e.g. y1 + y2 ~ x. Is there an elegant way to do this outside of lattice? I'm trying to implement a data summarization function that logically takes multiple dependent variables. The usual invocation of model.frame( ) causes R to try to do arithmetic addition to create a single dependent variable. Try: lm( cbind(Sepal.Length, Sepal.Width) ~., iris) On Fri, Mar 1, 2013 at 8:02 PM, Frank Harrell lt; f.harrell@ gt; wrote: Thanks for your reply Gabor. That doesn't handle a mixture of factor and numeric variables on the left hand side. Frank It can handle 2 level factors lm(cbind(Sepal.Length, setosa = Species == setosa) ~ ., iris) and more with some manual effort: lm(cbind(virginica = Species == virginica, setosa = Species == setosa) ~ ., iris) Typically you don't see more than that as a dependent variable. Do you actually need more? -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Multiple-left-hand-side-variables-in-a-formula-tp4660060p4660081.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Expressions in lattice conditional variables
I would like to have a lattice conditioning ( | var ) variable have expression() as values because I want panel labels to be able to use plotmath notation for subscripts, etc. lattice barks at this. Does anyone know of a trick workaround? An attempted example program is below. Thanks -Frank require(lattice) set.seed(1) var - c(rep('A', 100), rep('B', 100)) trt - sample(c('T1','T2'), 200, TRUE) x - c(runif(100), 10*runif(100)) y - x + c(runif(100)/10, runif(100)) N - tapply(x, llist(var, trt), function(x) sum(!is.na(x))) print(N) vn - vector('expression', length(var)) for(v in unique(var)) { i - var == v n - tapply(!is.na(x[i]), trt[i], sum) nam - names(n) w - sprintf('paste(%s, (, n[%s]==%g,~~n[%s]==%g,))', v, nam[1], n[1], nam[2], n[2]) cat(w, '\n') vn[var == v] - parse(text=w) n - sprintf('%s (n%s=%g, n%s=%g)', v, nam[1],n[1], nam[2],n[2]) vn[var == v] - n } trt - factor(trt) xyplot(as.integer(trt) ~ x | vn, panel=panel.bpplot, ylim=c(0,3), scale=list(y=list(at=1:2, labels=levels(trt)), x=list(relation='free', limits=list(c(0,1),c(0,13, ylab='Treatment', layout=c(1,2)) Error in unique.default(x) : unimplemented type 'expression' in 'HashTableSetup' - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Expressions-in-lattice-conditional-variables-tp4660089.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Expressions in lattice conditional variables
Whoops - these 2 lines should have been omitted from the program: n - sprintf('%s (n%s=%g, n%s=%g)', v, nam[1],n[1], nam[2],n[2]) vn[var == v] - n Frank Harrell wrote I would like to have a lattice conditioning ( | var ) variable have expression() as values because I want panel labels to be able to use plotmath notation for subscripts, etc. lattice barks at this. Does anyone know of a trick workaround? An attempted example program is below. Thanks -Frank require(lattice) set.seed(1) var - c(rep('A', 100), rep('B', 100)) trt - sample(c('T1','T2'), 200, TRUE) x - c(runif(100), 10*runif(100)) y - x + c(runif(100)/10, runif(100)) N - tapply(x, llist(var, trt), function(x) sum(!is.na(x))) print(N) vn - vector('expression', length(var)) for(v in unique(var)) { i - var == v n - tapply(!is.na(x[i]), trt[i], sum) nam - names(n) w - sprintf('paste(%s, (, n[%s]==%g,~~n[%s]==%g,))', v, nam[1], n[1], nam[2], n[2]) cat(w, '\n') vn[var == v] - parse(text=w) n - sprintf('%s (n%s=%g, n%s=%g)', v, nam[1],n[1], nam[2],n[2]) vn[var == v] - n } trt - factor(trt) xyplot(as.integer(trt) ~ x | vn, panel=panel.bpplot, ylim=c(0,3), scale=list(y=list(at=1:2, labels=levels(trt)), x=list(relation='free', limits=list(c(0,1),c(0,13, ylab='Treatment', layout=c(1,2)) Error in unique.default(x) : unimplemented type 'expression' in 'HashTableSetup' - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Expressions-in-lattice-conditional-variables-tp4660089p4660090.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xYplot help
xYplot has many options that are passed to panel.xYplot. Did you read the documentation? You can suppress labels, use an automatically generated Key() function to control where you want keys, and use several other options. Start with label.curve=FALSE and go from there. Frank bwr87 wrote I'm trying to work the lattice xYplot function to plot confidence intervals for two groups on the same plot. The problem is that it automatically labels the groups (1 and 2) on the plot itself, and the labels get obscured by the CI lines and make it look bad. This is my code: xYplot(Cbind(Mean,L95,U95) ~ jitter(Time), type='b', data=A,main=Bouts Vs. Time,col=c('black', 'gray'), groups=Group,title=Bouts,xlab=Time, ylab=Bouts, method=bars, lwd=3, ylim=c(0,30)) A is the data, with columns Time, Group, Mean, L95, U95. L95 and U95 are the lower and upper confidence interval estimates. Group is either 1 or 2, each measured at the same number of time points. The goal is simply to take the labels off of the plot itself. Then I can use an auto.key to put the labels on the side. Please help! -Ben [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/xYplot-help-tp4660102p4660108.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple left hand side variables in a formula
The lattice package uses special logic to allow for multiple left-hand-side variables in a formula, e.g. y1 + y2 ~ x. Is there an elegant way to do this outside of lattice? I'm trying to implement a data summarization function that logically takes multiple dependent variables. The usual invocation of model.frame( ) causes R to try to do arithmetic addition to create a single dependent variable. Thanks Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Multiple-left-hand-side-variables-in-a-formula-tp4660060.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple left hand side variables in a formula
Thanks for your reply Gabor. That doesn't handle a mixture of factor and numeric variables on the left hand side. Frank Gabor Grothendieck wrote On Fri, Mar 1, 2013 at 7:16 PM, Frank Harrell lt; f.harrell@ gt; wrote: The lattice package uses special logic to allow for multiple left-hand-side variables in a formula, e.g. y1 + y2 ~ x. Is there an elegant way to do this outside of lattice? I'm trying to implement a data summarization function that logically takes multiple dependent variables. The usual invocation of model.frame( ) causes R to try to do arithmetic addition to create a single dependent variable. Try: lm( cbind(Sepal.Length, Sepal.Width) ~., iris) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Multiple-left-hand-side-variables-in-a-formula-tp4660060p4660062.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Multiple left hand side variables in a formula
Thank you Bill. A temporary re-arrangement of the formula will allow me to do the usual subset= na.action= processing afterwards. Nice idea. I don't need the dot notation very often for this application. Frank William Dunlap wrote I don't know how much of the information that model.frame supplies you need, but you could make a data.frame containing all the variables on both sides of them formula by changing lhs~rhs into ~lhs+rsh before calling model.frame. E.g., f - function (formula) { if (length(formula) == 3) { # has left hand side envir - environment(formula) formula - formula(call(~, call(+, formula[[2]], formula[[3]]))) environment(formula) - envir } formula } This doesn't quite take care of the wild-card dot in the formula: straight variables are omitted from dot's expansion but functions of variables are not: colnames(model.frame(f(log(mpg)+hp ~ .), data=mtcars)) [1] log(mpg) hp mpg cyl [5] disp drat wt qsec [9] vs am gear carb Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-bounces@ [mailto: r-help-bounces@ ] On Behalf Of Frank Harrell Sent: Friday, March 01, 2013 4:17 PM To: r-help@ Subject: [R] Multiple left hand side variables in a formula The lattice package uses special logic to allow for multiple left-hand-side variables in a formula, e.g. y1 + y2 ~ x. Is there an elegant way to do this outside of lattice? I'm trying to implement a data summarization function that logically takes multiple dependent variables. The usual invocation of model.frame( ) causes R to try to do arithmetic addition to create a single dependent variable. Thanks Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Multiple-left-hand-side- variables-in-a-formula-tp4660060.html Sent from the R help mailing list archive at Nabble.com. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Multiple-left-hand-side-variables-in-a-formula-tp4660060p4660065.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Selecting First Incidence from Longitudinal Data
= ID COMPL SEX HEREDITY 1 0 1 2 1 0 1 2 1 3 1 2 2 0 0 1 2 1 0 1 2 2 0 1 2 2 0 1 3 0 0 1 3 0 0 1 3 0 0 1 3 0 0 1 3 2 0 1 4 0 1 2 4 0 1 2 , header = TRUE) aggregate(. ~ ID, data = subset(dat, COMPL != 0), head, 1) Hope this helps, Rui Barradas Em 23-02-2013 14:28, Tasnuva Tabassum escreveu: I have a longitudinal competing risk data of the form: ID COMPL SEX HEREDITY 1 0 1 2 1 0 1 2 1 3 1 2 2 0 0 1 2 1 0 1 2 2 0 1 2 2 0 1 3 0 0 1 3 0 0 1 3 0 0 1 3 0 0 1 3 2 0 1 4 0 1 2 4 0 1 2. Where, COMPL= health complication of diabetic patients which has value labels as 0= no complication,1=coronary heart disease, 2=retinopathy, 3= nephropathy. I want to select only the first complication that occurred to each patient. What R function can I use? [[alternative HTML version deleted]] __** R-help@ mailing list https://stat.ethz.ch/mailman/**listinfo/r-helplt;https://stat.ethz.ch/mailman/listinfo/r-helpgt; PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html lt;http://www.R-project.org/posting-guide.htmlgt; and provide commented, minimal, self-contained, reproducible code. __** R-help@ mailing list https://stat.ethz.ch/mailman/**listinfo/r-helplt;https://stat.ethz.ch/mailman/listinfo/r-helpgt; PLEASE do read the posting guide http://www.R-project.org/** posting-guide.html lt;http://www.R-project.org/posting-guide.htmlgt; and provide commented, minimal, self-contained, reproducible code. -- == Xiaogang Su, Ph.D. Associate Professor Statistician School of Nursing, University of Alabama Birmingham, AL 35294-1210 (205) 934-2355 [Office] xgsu@ xiaogangsu@ https://sites.google.com/site/xgsu00/ [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- == Xiaogang Su, Ph.D. Associate Professor Statistician School of Nursing, University of Alabama Birmingham, AL 35294-1210 (205) 934-2355 [Office] xgsu@ xiaogangsu@ https://sites.google.com/site/xgsu00/ __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Selecting-First-Incidence-from-Longitudinal-Data-tp4659455p4659530.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting survival curves after multiple imputation
I haven't seen anyone solve this. I think it would be reasonable to do a time point by time point averaging (over multiple imputations) of the underlying survival curve, although there is some question about whether to freeze the centering constant (sum of beta times covariate mean). What will be harder to get is confidence intervals for predicted probabilities after multiple imputation. I hope someone will take up the challenge. Frank Robert Long wrote I am working with some survival data with missing values. I am using the mice package to do multiple imputation. I have found code in this thread which handles pooling of the MI results: https://stat.ethz.ch/pipermail/r-help/2007-May/132180.html Now I would like to plot a survival curve using the pooled results. Here is a reproducible example: require(survival) require(mice) set.seed(2) dt - colon fit - coxph(Surv(time,etype)~rx + sex + age, data=colon) dummy - data.frame(sex=c(1,1,1),rx=c(Obs,Lev,Lev+5FU),age=c(40,40,40)) plot(survfit(fit, newdata=dummy) ) # now create some missing values in the data dt - colon dt$rx[sample(1:nrow(dt),50)] - NA dt$sex [sample(1:nrow(dt),50)] - NA dt$age[sample(1:nrow(dt),50)] - NA imp -mice(dt) fit.imp - coxph.mids(Surv(time,etype)~rx + sex + age,imp) # Note, this function is defined below... imputed=summary.impute(pool.impute(fit.imp)) print(imputed) # now, how to plot a survival curve with the pooled results ? ## begin code from linked thread above coxph.mids - function (formula, data, ...) { call - match.call() if (!is.mids(data)) stop(The data must have class mids) analyses - as.list(1:data$m) for (i in 1:data$m) { data.i- complete(data, i) analyses[[i]] - coxph(formula, data = data.i, ...) } object - list(call = call, call1 = data$call, nmis = data$nmis, analyses = analyses) return(object) } pool.impute - function (object, method = smallsample) { if ((m - length(object$analyses)) 2) stop(At least two imputations are needed for pooling.\n) analyses - object$analyses k - length(coef(analyses[[1]])) names - names(coef(analyses[[1]])) qhat - matrix(NA, nrow = m, ncol = k, dimnames = list(1:m,names)) u - array(NA, dim = c(m, k, k), dimnames = list(1:m, names, names)) for (i in 1:m) { fit - analyses[[i]] qhat[i, ] - coef(fit) u[i, , ] - vcov(fit) } qbar - apply(qhat, 2, mean) ubar - apply(u, c(2, 3), mean) e - qhat - matrix(qbar, nrow = m, ncol = k, byrow = TRUE) b - (t(e) %*% e)/(m - 1) t - ubar + (1 + 1/m) * b r - (1 + 1/m) * diag(b/ubar) f - (1 + 1/m) * diag(b/t) df - (m - 1) * (1 + 1/r)^2 if (method == smallsample) { if( any( class(fit) == coxph ) ){ ### this loop is the hack for survival analysis ### status - fit$y[ , 2] n.events - sum(status == max(status)) p- length( coefficients( fit ) ) dfc - n.events - p } else { dfc - fit$df.residual } df - dfc/((1 - (f/(m + 1)))/(1 - f) + dfc/df) } names(r) - names(df) - names(f) - names fit - list(call = call, call1 = object$call, call2 = object$call1, nmis = object$nmis, m = m, qhat = qhat, u = u, qbar = qbar, ubar = ubar, b = b, t = t, r = r, df = df, f = f) return(fit) } summary.impute - function(object){ if (!is.null(object$call1)){ cat(Call: ) dput(object$call1) } est - object$qbar se - sqrt(diag(object$t)) tval - est/se df - object$df pval - 2 * pt(abs(tval), df, lower.tail = FALSE) coefmat - cbind(est, se, tval, pval) colnames(coefmat) - c(Estimate, Std. Error, t value, Pr(|t|)) cat(\nCoefficients:\n) printCoefmat( coefmat, P.values=T, has.Pvalue=T, signif.legend=T ) cat(\nFraction of information about the coefficients missing due to nonresponse:,\n) print(object$f) ans - list( coefficients=coefmat, df=df, call=object$call1, fracinfo.miss=object$f ) invisible( ans ) } ### end code from linked thread above __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Plotting-survival-curves-after-multiple-imputation
Re: [R] Nomogram after Cox Random Effect (frailty) model
This can't be done directly, but if you can output predicted values from coxme and fit an rms ols model that predicts these predicted values with an R^2 of 1.0 you can use nomogram() on the ols fit to get what you want. Frank george boul wrote Dear R-users, I am a novice R-user with some experience in using the RMS package for taking nomograms after various survival models. This time, I am trying to plot a nomogram after a Random Effects Cox, implemented by the coxme package. My questions are: 1. Is it possible to take a nomogram directly after the coxme survival function? 2. If not is there a way to take the linear predictor results and plug them into the RMS cox model and get the nomogram I want? Any instructions/suggestions would be highly appreciable? Thank you in advance George Dr. George Bouliotis Research Fellow in BiostatisticsMRC-MHTMR (Midland Hub for Trials Methodology Research) University of BirminghamPublic Health BuildingBirmingham B115 2TTUnited Kingdom [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Nomogram-after-Cox-Random-Effect-frailty-model-tp4658544p4658545.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Relative Risk in logistic regression
I am curious why one would want risk ratios. Unlike odds ratios, they are not interpretable without reference to the base risk. For example a risk ratio of 2 cannot possibly apply to anyone with a starting risk exceeding 1/2. I think it is most helpful to use one of the existing nomograms to show someone how the base risk and odds ratio translate to final risk, for a range of base risk. Frank David Winsemius wrote On Jan 30, 2013, at 5:49 AM, aminreza Aamini wrote: Hi all, I am very grateful to all those who write to me 1) how i can obtain relative risk (risk ratio) in logistic regression in R. 2) how to obtain the predicted risk for a certain individual using fitted regression model in R. You obtain the predicted probabilities with something like: predict(model, data.frame(x1=a, x2=30), type = response) See ?predict.glm This would give the odds ratios (similar but larger than the risk ratios): exp(coef(model)) -- David Winsemius, MD Alameda, CA, USA __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Relative-Risk-in-logistic-regression-tp4657040p4657163.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regression Modeling Strategies 3-Day Short Course March 2013
*RMS Short Course 2013* Frank E. Harrell, Jr., Ph.D., Professor and Chair Department of Biostatistics, Vanderbilt University School of Medicine *March 4, 5 6, 2013* 8:00am - 4:30pm Student Life Center Board of Trust Room Vanderbilt University Nashville Tennessee USA See http://biostat.mc.vanderbilt.edu/2013RMSShortCourse for details. The course includes statistical methodology, case studies, and use of the R rms package. Please email interest to Eve Anderson { eve.a.ander...@vanderbilt.edu } -- Frank E Harrell Jr Professor and Chairman School of Medicine Department of Biostatistics Vanderbilt University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Difference between R and SAS in Corcordance index in ordinal logistic regression
For lrm fits, predict(fit, type='mean') predicts the mean Y, not a probability. Frank __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Difference between R and SAS in Corcordance index in ordinal logistic regression
lrm does some binning to make the calculations faster. The exact calculation is obtained by running f - lrm(...) rcorr.cens(predict(f), DA), which results in: C IndexDxy S.D. nmissing 0.96814404 0.93628809 0.0380833632. 0. uncensored Relevant Pairs Concordant Uncertain 32. 722. 699. 0. I.e., C=.968 instead of .963. But this is even farther away than the value from SAS you reported. If you don't believe the rcorr.cens result, create a tiny example and do the calculations by hand. Frank blackscorpio81 wrote Dear R users, Please allow to me ask for your help. I am currently using Frank Harrell Jr package rms to model ordinal logistic regression with proportional odds. In order to assess model predictive ability, C concordance index is displayed and equals to 0.963. This is the code I used with the data attached data.csv http://r.789695.n4.nabble.com/file/n4656409/data.csv : require(rms) a-read.csv2(/data.csv,row.names = 1,na.strings = c(, ),dec=.) lrm(DA~SJ+TJ,data=a) Logistic Regression Model lrm(formula = DA~SJ+TJ, data = a) Frequencies of Responses 1 2 3 4 6 13 9 4 Model Likelihood Discrimination Rank Discrim. Ratio Test Indexes Indexes Obs32 LR chi2 53.14 R2 0.875 C 0.963 max |deriv| 6e-06 d.f. 2g 8.690Dxy 0.925 Pr( chi2) 0.0001 gr 5942.469gamma 0.960 gp 0.486 tau-a 0.673 Brier0.022 Coef S.E.Wald Z Pr(|Z|) y=2 -0.6161 0.6715-0.92 0.3589 y=3 -6.5949 2.3750-2.78 0.0055 y=4-16.23585.3737 -3.02 0.0025 SJ 1.4341 0.5180 2.77 0.0056 TJ 0.5312 0.2483 2.14 0.0324 I wanted to compare the results with SAS. I found the same slopes and intercept with opposite signs, which is normal since R models the probabilities P(Y=k|X) whereas SAS models the probabilities P(Y=k|X) (see pdf attached, page 2 , table Association des probabilités prédites et des réponses observées). SAS_Report_-_Logistic_Regression.pdf http://r.789695.n4.nabble.com/file/n4656409/SAS_Report_-_Logistic_Regression.pdf I chose the order for levels. I controlled that the corresponding probabilities P(Y=k|X) are the same with both softwares. But I can't understand why in SAS the C index drops from 0.963 down to 0.332. I read a lot of things about this and it seems to me that both softwares use slightly different technique to compute the C index ; it is nevertheless surprising to me to observe such a shift in the results. Does anyone have a clue on this ? Thank you very much for you help Blackscorpio - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Difference-between-R-and-SAS-in-Corcordance-index-in-ordinal-logistic-regression-tp4656409p4656508.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Difference between R and SAS in Corcordance index in ordinal logistic regression
Please define 'mean probabilities'. To compute the C-index or Dxy you need anything that is monotonically related to the prediction of interest, including the linear combination of covariates ignoring all intercepts. In other words you don't need to go to the trouble of computing probabilities unless you are binning, as the binning is usually done on a controllable 0-1 scale. When I bin I just choose the middle intercept, I seem to recall. Also try running SAS with a very tiny BINWIDTH and see if you get 1 - .968 as the answer for C. [I wrote the original algorithm SAS uses for this in the old SAS PROC LOGIST. Binning was just for speed.] You might also re-run SAS after negating the response variable. Frank blackscorpio wrote Dear Dr Harrell, Thank you very much for your answer. Actually I also tried to found the C index by hand on these data using the mean probabilities and I found 0.968, as you just showed. I understand now why I had a slight difference with the outpout of lrm. I am thus convinced that this result is correct. I read on the SAS help that the procedure logistic also proceed to some binning (BINWIDTH option) : http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect010.htm But I cannot explain why the difference between the two softwares is that huge, especially since the class probabilities are the same. Do you think it could be due to the fact that mean probabilities are computed differently ? Thank for your help and best regards, OC Date: Thu, 24 Jan 2013 05:28:13 -0800 From: f.harrell@ To: r-help@ Subject: Re: [R] Difference between R and SAS in Corcordance index in ordinal logistic regression lrm does some binning to make the calculations faster. The exact calculation is obtained by running f - lrm(...) rcorr.cens(predict(f), DA), which results in: C IndexDxy S.D. n missing 0.96814404 0.93628809 0.0380833632. 0. uncensored Relevant Pairs Concordant Uncertain 32. 722. 699. 0. I.e., C=68 instead of .963. But this is even farther away than the value from SAS you reported. If you don't believe the rcorr.cens result, create a tiny example and do the calculations by hand. Frank blackscorpio81 wrote Dear R users, Please allow to me ask for your help. I am currently using Frank Harrell Jr package rms to model ordinal logistic regression with proportional odds. In order to assess model predictive ability, C concordance index is displayed and equals to 0.963. This is the code I used with the data attached data.csv lt;http://r.789695.n4.nabble.com/file/n4656409/data.csvgt; : require(rms) a-read.csv2(/data.csv,row.names =,na.strings = c(, ),dec=.) lrm(DA~SJ+TJ,data= Logistic Regression Model lrm(formula =A~SJ+TJ, data = a) Frequencies of Responses 1 2 3 4 6 13 9 4 Model Likelihood Discrimination Rank Discrim. Ratio Test Indexes Indexes Obs32 LR chi2 53.14 R2 0.875 C 0.963 max |deriv| 6e-06 d.f. 2g 8.690Dxy 0.925 Pr( chi2) 0.0001 gr 5942.469gamma 0.960 gp 0.486 tau-a 0.673 Brier0.022 Coef S.E.Wald Z Pr(|Z|) y=-0.6161 0.6715-0.92 0.3589 y=-6.5949 2.3750-2.78 0.0055 y= -16.23585.3737 -3.02 0.0025 SJ 1.4341 0.5180 2.77 0.0056 TJ 0.5312 0.2483 2.14 0.0324 I wanted to compare the results with SAS. I found the same slopes and intercept with opposite signs, which is normal since R models the probabilities P(Y=X) whereas SAS models the probabilities P(Y=k|X) (see pdf attached, page 2 , table Association des probabilités prédites et des réponses observées). SAS_Report_-_Logistic_Regression.pdf lt;http://r.789695.n4.nabble.com/file/n4656409/SAS_Report_-_Logistic_Regression.pdfgt; I chose the order for levels. I controlled that the corresponding probabilities P(Y=X) are the same with both softwares. But I can't understand why in SAS the C index drops from 0.963 down to 0.332. I read a lot of things about this and it seems to me that both softwares use slightly different technique to compute the C index ; it is nevertheless surprising to me to observe such a shift in the results. Does anyone have a clue on this ? Thank you very much for you help Blackscorpio __ R-help@r-project.org
Re: [R] Does psm::Surv handle interval2 data?
Chris, I've fixed Surv in rms. The fix will be in the next release. For now you can do source('http://biostat.mc.vanderbilt.edu/tmp/Surv.s') after issuing require(rms). Frank Frank Harrell wrote Chris, Thanks for sending the specifics. It appears that I've let Surv in rms fall behind recent versions of Surv in survival. It will take me a few days to get this fixed. I'll send a follow-up note then. Frank Andrews, Chris wrote Does Surv in psm handle interval2 data? The argument list seems to indicate it does but I get an error. Thanks, Chris # code library('survival') left - c(1, 3, 5, NA) right -c(2, 3, NA, 4) Surv(left, right, type='interval2') survreg(Surv(left, right, type='interval2') ~ 1) library('rms') Surv(left, right, type='interval2') # error args(Surv) psm(Surv(left, right, type='interval2') ~ 1) # same error (of course) psm(survival::Surv(left, right, type='interval2') ~ 1) # runs # output R version 2.15.2 (2012-10-26) -- Trick or Treat Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-w64-mingw32/x64 (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. library('survival') Loading required package: splines left - c(1, 3, 5, NA) right -c(2, 3, NA, 4) Surv(left, right, type='interval2') [1] [1, 2] 3 5+ 4- survreg(Surv(left, right, type='interval2') ~ 1) Call: survreg(formula = Surv(left, right, type = interval2) ~ 1) Coefficients: (Intercept) 1.317943 Scale= 0.6098782 Loglik(model)= -5.3 Loglik(intercept only)= -5.3 n= 4 library('rms') Loading required package: Hmisc Hmisc library by Frank E Harrell Jr Type library(help='Hmisc'), ?Overview, or ?Hmisc.Overview') to see overall documentation. NOTE:Hmisc no longer redefines [.factor to drop unused levels when subsetting. To get the old behavior of Hmisc type dropUnusedLevels(). Attaching package: 'Hmisc' The following object(s) are masked from 'package:survival': untangle.specials The following object(s) are masked from 'package:base': format.pval, round.POSIXt, trunc.POSIXt, units Attaching package: 'rms' The following object(s) are masked from 'package:survival': Surv Surv(left, right, type='interval2') # error Error in Surv(left, right, type = interval2) : argument event is missing, with no default args(Surv) function (time, time2, event, type = c(right, left, interval, counting, interval2), origin = 0) NULL psm(Surv(left, right, type='interval2') ~ 1) # same error (of course) Error in Surv(left, right, type = interval2) : argument event is missing, with no default psm(survival::Surv(left, right, type='interval2') ~ 1) # runs Parametric Survival Model: Weibull Distribution psm(formula = survival::Surv(left, right, type = interval2) ~ 1) Model LikelihoodDiscrimination Ratio Test Indexes Obs4LR chi2 0.00R2 0.000 Events 6d.f. 0g0.000 sigma 0.6099gr 1.000 CoefS.E. Wald Z Pr(|Z|) (Intercept) 1.3179 0.3598 3.66 0.0002 Log(scale) -0.4945 0.5977 -0.83 0.4081 ** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Does-psm-Surv-handle-interval2-data-tp4655472p4656092.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Does psm::Surv handle interval2 data?
Chris, Thanks for sending the specifics. It appears that I've let Surv in rms fall behind recent versions of Surv in survival. It will take me a few days to get this fixed. I'll send a follow-up note then. Frank Andrews, Chris wrote Does Surv in psm handle interval2 data? The argument list seems to indicate it does but I get an error. Thanks, Chris # code library('survival') left - c(1, 3, 5, NA) right -c(2, 3, NA, 4) Surv(left, right, type='interval2') survreg(Surv(left, right, type='interval2') ~ 1) library('rms') Surv(left, right, type='interval2') # error args(Surv) psm(Surv(left, right, type='interval2') ~ 1) # same error (of course) psm(survival::Surv(left, right, type='interval2') ~ 1) # runs # output R version 2.15.2 (2012-10-26) -- Trick or Treat Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-w64-mingw32/x64 (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. library('survival') Loading required package: splines left - c(1, 3, 5, NA) right -c(2, 3, NA, 4) Surv(left, right, type='interval2') [1] [1, 2] 3 5+ 4- survreg(Surv(left, right, type='interval2') ~ 1) Call: survreg(formula = Surv(left, right, type = interval2) ~ 1) Coefficients: (Intercept) 1.317943 Scale= 0.6098782 Loglik(model)= -5.3 Loglik(intercept only)= -5.3 n= 4 library('rms') Loading required package: Hmisc Hmisc library by Frank E Harrell Jr Type library(help='Hmisc'), ?Overview, or ?Hmisc.Overview') to see overall documentation. NOTE:Hmisc no longer redefines [.factor to drop unused levels when subsetting. To get the old behavior of Hmisc type dropUnusedLevels(). Attaching package: 'Hmisc' The following object(s) are masked from 'package:survival': untangle.specials The following object(s) are masked from 'package:base': format.pval, round.POSIXt, trunc.POSIXt, units Attaching package: 'rms' The following object(s) are masked from 'package:survival': Surv Surv(left, right, type='interval2') # error Error in Surv(left, right, type = interval2) : argument event is missing, with no default args(Surv) function (time, time2, event, type = c(right, left, interval, counting, interval2), origin = 0) NULL psm(Surv(left, right, type='interval2') ~ 1) # same error (of course) Error in Surv(left, right, type = interval2) : argument event is missing, with no default psm(survival::Surv(left, right, type='interval2') ~ 1) # runs Parametric Survival Model: Weibull Distribution psm(formula = survival::Surv(left, right, type = interval2) ~ 1) Model LikelihoodDiscrimination Ratio Test Indexes Obs4LR chi2 0.00R2 0.000 Events 6d.f. 0g0.000 sigma 0.6099gr 1.000 CoefS.E. Wald Z Pr(|Z|) (Intercept) 1.3179 0.3598 3.66 0.0002 Log(scale) -0.4945 0.5977 -0.83 0.4081 ** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues [[alternative HTML version deleted]] __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Does-psm-Surv-handle-interval2-data-tp4655472p4655475.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] suggestions about import SAS results to R.
SAS' export to csv is buggy (improper quote matching, etc.). Instead consider having SAS create a version 5 transport file, then use the Hmisc package's sasxport.get function to handle labels, etc. Frank Ista Zahn wrote Well it was worth a try. If you haven't solved the problem some other way I suggest exporting from SAS to a plain text format like .csv and read into R with read.table. Best, Ista On Wed, Jan 2, 2013 at 5:44 PM, Yuan, Rebecca lt; rebecca.yuan@ gt; wrote: Hello Ista, Thanks for mention this package, however, it will not support big file, where I got the error message: -- Error in read.sas7bdat(C:/Documents and Settings/test.sas7bdat) : big endian files are not supported -- Cheers, Rebecca -Original Message- From: Ista Zahn [mailto: istazahn@ ] Sent: Wednesday, January 02, 2013 5:25 PM To: Yuan, Rebecca Cc: R help; sas-l@.uga Subject: Re: [R] suggestions about import SAS results to R. Hi Rebecca, I've had success with the sas7bdat package ( http://cran.r-project.org/web/packages/sas7bdat/ ) Best, Ista On Wed, Jan 2, 2013 at 5:15 PM, Yuan, Rebecca lt; rebecca.yuan@ gt; wrote: Hello all, Thanks for the suggestions. I tried to import data from sas to R using foreign package. In the first step, I tried to use read.ssd() to convert the sas file sales.sas7bdat to SAS Transport formats. But I got an error message as --- source(testss.r) Error in file.symlink(oldPath, linkPath) : symbolic links are not supported on this version of Windows --- While in testss.r, --- require(foreign) rm(list=ls()) sashome - C:/Program Files/SAS/SASFoundation/9.2/ checkfile - read.ssd(C:/Documents and Settings/test,sales,sascmd = file.path(sashome, sas.exe)) --- I am using windows xp and R 2.15.2. Thanks! Rebecca -Original Message- From: mehmet.suzen@ [mailto: mehmet.suzen@ ] On Behalf Of Suzen, Mehmet Sent: Wednesday, January 02, 2013 4:14 PM To: David Winsemius Cc: Yuan, Rebecca; R help; sas-l@.uga Subject: Re: [R] suggestions about import SAS results to R. On Jan 2, 2013, at 12:37 PM, Yuan, Rebecca wrote: I am wondering if there is an efficient way to read SAS data directly in R, or what would be a better connection between SAS and R if I need to use R to deal with data achieved from SAS? You may try foreign and SASxport packages: http://cran.r-project.org/web/packages/foreign/index.html http://cran.r-project.org/web/packages/SASxport/index.html But I heard that there might be some issues with the newest SAS version but give it a try... -- This message, and any attachments, is for the intended r...{{dropped:2}} __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- This message, and any attachments, is for the intended...{{dropped:6}} __ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/suggestions-about-import-SAS-results-to-R-tp465p4654521.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] suggestions about import SAS results to R.
Not clear why you responded to my note since you were not using the solution I suggested. Frank By the way if you need to read SAS7BDAT FILES frequently it is worth the licensing fee for STAT/Transfer, which can create binary R .rda objects. I have code that finishes the job by putting labels on individual variables. Frank - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/suggestions-about-import-SAS-results-to-R-tp465p4654554.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.