[R] Regression Modeling Strategies and the rms Package 4-Day Short Course May 2018

2018-03-29 Thread Frank Harrell
*RMS Short Course 2018*
Frank E. Harrell, Jr., Ph.D., Professor
Department of Biostatistics, Vanderbilt University School of Medicine
fharrell.com @f2harrell

*May 15-18, 2018* With Optional R Workshop May 14
9:00am - 4:00pm
Alumni Hall
Vanderbilt University
Nashville Tennessee USA

See http://biostat.mc.vanderbilt.edu/RMSShortCourse2018 for details.

The course includes statistical methodology, case studies, and use of the R
rms package.

Please email interest to Jill Shell 

--
Frank E Harrell Jr  Professor  School of Medicine

Department of *Biostatistics*  *Vanderbilt University*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help understanding why glm and lrm.fit runs with my data, but lrm does not

2017-09-14 Thread Frank Harrell
Fixed 'maxiter' in the help file.  Thanks.

Please give the original source of that dataset.

That dataset is a tiny sample of GUSTO-I and not large enough to fit this
model very reliably.

A nomogram using the full dataset (not publicly available to my knowledge)
is already available in http://biostat.mc.vanderbilt.edu/tmp/bbr.pdf

Use lrm, not lrm.fit for this.  Adding maxit=20 will probably make it work
on the small dataset but still not clear on why you are using this dataset.

Frank


--
Frank E Harrell Jr  Professor  School of Medicine

Department of *Biostatistics*  *Vanderbilt University*

On Thu, Sep 14, 2017 at 10:48 AM, David Winsemius 
wrote:

>
> > On Sep 14, 2017, at 12:30 AM, Bonnett, Laura <
> l.j.bonn...@liverpool.ac.uk> wrote:
> >
> > Dear all,
> >
> > I am using the publically available GustoW dataset.  The exact version I
> am using is available here: https://na01.safelinks.
> protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Fopen%3Fid%
> 3D0B4oZ2TQA0PAoUm85UzBFNjZ0Ulk=02%7C01%7Cf.harrell%40vanderbilt.edu%
> 7Cadb58b13c3994f89209708d4fb8807f0%7Cba5a7f39e3be4ab3b45067fa80fa
> ecad%7C0%7C0%7C636410009046132507=UZgX3%2Ba%
> 2FU2Eeh8ybHMI6JnF0Npd2XJPXAzlmtEhDgOY%3D=0
> >
> > I would like to produce a nomogram for 5 covariates - AGE, HYP, KILLIP,
> HRT and ANT.  I have successfully fitted a logistic regression model using
> the "glm" function as shown below.
> >
> > library(rms)
> > gusto <- spss.get("GustoW.sav")
> > fit <- glm(DAY30~AGE+HYP+factor(KILLIP)+HRT+ANT,family=
> binomial(link="logit"),data=gusto,x=TRUE,y=TRUE)
> >
> > However, my review of the literature and other websites suggest I need
> to use "lrm" for the purposes of producing a nomogram.  When I run the
> command using "lrm" (see below) I get an error message saying:
> > Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
> >  Unable to fit model using "lrm.fit"
> >
> > My code is as follows:
> > gusto2 <- gusto[,c(1,3,5,8,9,10)]
> > gusto2$HYP <- factor(gusto2$HYP, labels=c("No","Yes"))
> > gusto2$KILLIP <- factor(gusto2$KILLIP, labels=c("1","2","3","4"))
> > gusto2$HRT <- factor(gusto2$HRT, labels=c("No","Yes"))
> > gusto2$ANT <- factor(gusto2$ANT, labels=c("No","Yes"))
> > var.labels=c(DAY30="30-day Mortality", AGE="Age in Years",
> KILLIP="Killip Class", HYP="Hypertension", HRT="Tachycardia", ANT="Anterior
> Infarct Location")
> > label(gusto2)=lapply(names(var.labels),function(x)
> label(gusto2[,x])=var.labels[x])
> >
> > ddist = datadist(gusto2)
> > options(datadist='ddist')
> >
> > fit1 <- lrm(DAY30~AGE+HYP+KILLIP+HRT+ANT,gusto2)
> >
> > Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
> >  Unable to fit model using "lrm.fit"
> >
> > Online solutions to this problem involve checking whether any variables
> are redundant.  However, the results for my data suggest  that none are.
> > redun(~AGE+HYP+KILLIP+HRT+ANT,gusto2)
> >
> > Redundancy Analysis
> >
> > redun(formula = ~AGE + HYP + KILLIP + HRT + ANT, data = gusto2)
> >
> > n: 2188 p: 5nk: 3
> >
> > Number of NAs:   0
> >
> > Transformation of target variables forced to be linear
> >
> > R-squared cutoff: 0.9   Type: ordinary
> >
> > R^2 with which each variable can be predicted from all other variables:
> >
> >   AGEHYP KILLIPHRTANT
> > 0.028  0.032  0.053  0.046  0.040
> >
> > No redundant variables
> >
> > I've also tried just considering "lrm.fit" and that code seems to run
> without error too:
> > lrm.fit(cbind(gusto2$AGE,gusto2$KILLIP,gusto2$HYP,
> gusto2$HRT,gusto2$ANT),gusto2$DAY30)
> >
> > Logistic Regression Model
> >
> > lrm.fit(x = cbind(gusto2$AGE, gusto2$KILLIP, gusto2$HYP, gusto2$HRT,
> > gusto2$ANT), y = gusto2$DAY30)
> >
> >   Model Likelihood DiscriminationRank
> Discrim.
> >  Ratio Test   Indexes   Indexes
> > Obs  2188LR chi2 233.59R2   0.273C
>  0.846
> >  0   2053d.f. 5g1.642Dxy
>  0.691
> >  1135Pr(> chi2) <0.0001gr   5.165gamma
>  0.696
> > max |deriv| 4e-09  gp   0.079tau-a
>  0.080
> >Brier0.048
> >
> >   Coef S.E.   Wald Z Pr(>|Z|)
> > Intercept -13.8515 0.9694 -14.29 <0.0001
> > x[1]0.0989 0.0103   9.58 <0.0001
> > x[2]0.9030 0.1510   5.98 <0.0001
> > x[3]1.3576 0.2570   5.28 <0.0001
> > x[4]0.6884 0.2034   3.38 0.0007
> > x[5]0.6327 0.2003   3.16 0.0016
> >
> > I was therefore hoping someone would explain why the "lrm" code is
> producing an error message, while "lrm.fit" and "glm" do not.  In
> particular I would welcome a solution to ensure I can produce a nomogram.
>
> Try this:
>
> lrm  # look at code, do a search on "fail"
> ?lrm.fit  # read the structure of the returned value of lrm.fit
>
> my.fit <- lrm.fit(x = 

Re: [R] rms::latex.anova broken?

2017-02-08 Thread Frank Harrell
In recent versions of rms on CRAN there was a non-downward compatible
change.  To get latex output for summary, anova, and print on fit objects
you leave off file="" (because we're usually using knitr anyway) and use

options(prType='latex')
anova(f) # LaTeX output

You can use options(prType='html') to get special html output for RMarkdown
html reports and html notebooks.

There are small changes in Hmisc, e.g., leave off file="" for
latex(describe()).

Frank

--
Frank E Harrell Jr  Professor and Chairman  School of Medicine

Department of *Biostatistics*  *Vanderbilt University*

On Wed, Feb 8, 2017 at 12:19 PM, Kevin E. Thorpe 
wrote:

> Hi Frank.
>
> I sent this to r-help yesterday but have not received any answers from the
> list readers. I thought I'd send to you directly to see if you had any
> insight. Since I posted this yesterday I have updated my R installation to
> the latest patched version and still get the errors.
>
> I use latex(anova(...)) quite a bit so a fix or workaround would by most
> appreciated.
>
> All the best,
>
> Kevin
>
>
>  Forwarded Message 
> Subject: rms::latex.anova broken?
> Date: Tue, 7 Feb 2017 14:23:46 -0500
> From: Kevin E. Thorpe 
> To: R Help Mailing List 
>
> I am re-running some logistic regression analyses using lrm from the rms
> package but latex(anova(...)) appears to be broken on my system.
>
> Here is some anova() output followed by the latex() error for two models
> since the error changes. My sessionInfo() follows the other output. I have
> updated all my packages and re-installed Hmisc and rms plus dependencies.
> The only thing I haven't done yet is update R completely. Has anyone else
> encountered this and know how to solve it?
>
> anova(full)
>>
> Wald Statistics  Response: id14
>
>  Factor  Chi-Square d.f. P
>  birthweight_kilo 0.87   1   0.3517
>  ageinmonth   4.12   1   0.0423
>  zbmi 6.49   1   0.0108
>  maxtbf  15.16   1   0.0001
>  cowsmilk 6.54   1   0.0106
>  Male 1.96   1   0.1611
>  multivitamin 0.76   1   0.3819
>  bottleuse0.13   1   0.7194
>  preterm  0.50   1   0.4811
>  AGEINTRO_cowsmilk0.06   1   0.8032
>  AGEINTRO_complefood  0.61   1   0.4356
>  TOTAL   30.49  11   0.0013
>
>> latex(anova(full),file="",table.env=FALSE,booktabs=TRUE)
>>
> Error in ifelse(sn %nin% c("d.f.", "MS", "Partial SS"), math(sn), sn) :
>   could not find function "math"
>
>> anova(full.nl)
>>
> Wald Statistics  Response: id14
>
>  Factor  Chi-Square d.f. P
>  birthweight_kilo 3.68   2   0.1588
>   Nonlinear   2.65   1   0.1037
>  ageinmonth  16.25   2   0.0003
>   Nonlinear  13.45   1   0.0002
>  zbmi 4.07   2   0.1310
>   Nonlinear   0.23   1   0.6323
>  maxtbf  15.81   2   0.0004
>   Nonlinear   2.57   1   0.1092
>  cowsmilk 3.34   2   0.1880
>   Nonlinear   1.16   1   0.2821
>  Male 1.21   1   0.2711
>  multivitamin 0.57   1   0.4494
>  bottleuse0.06   1   0.8100
>  preterm  0.01   1   0.9418
>  AGEINTRO_cowsmilk3.65   2   0.1612
>   Nonlinear   3.28   1   0.0700
>  AGEINTRO_complefood  5.40   2   0.0671
>   Nonlinear   4.00   1   0.0455
>  TOTAL NONLINEAR 25.41   7   0.0006
>  TOTAL   52.13  18   <.0001
>
>> latex(anova(full.nl),file="",table.env=FALSE,booktabs=TRUE)
>>
> Error in paste0(specs$lspace, specs$italics(substring(rowl, 2)), sep = "")
> :
>   attempt to apply non-function
>
> sessionInfo()
>>
> R version 3.2.3 Patched (2016-01-31 r70055)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Slackware 14.2
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8LC_COLLATE=C
>  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> other attached packages:
> [1] rms_5.1-0   SparseM_1.74Hmisc_4.0-2 ggplot2_2.2.1
> [5] Formula_1.2-1   survival_2.40-1 lattice_0.20-34 knitr_1.15.1
>
> loaded via a namespace (and not attached):
>  [1] Rcpp_0.12.9 RColorBrewer_1.1-2  plyr_1.8.4
>  [4] base64enc_0.1-3 tools_3.2.3 rpart_4.1-10
>  [7] digest_0.6.12   polspline_1.1.12tibble_1.2
> [10] gtable_0.2.0htmlTable_1.9   checkmate_1.8.2
> [13] nlme_3.1-131Matrix_1.2-8mvtnorm_1.0-5
> [16] 

[R] [R-pkgs] Major Update to rms package: 5.0-0

2016-11-04 Thread Frank Harrell
A major new version of the rms package is now on CRAN.  The most
user-visible changes are:

- interactive plotly graphic methods for model fits.  The best example of
this is survplot for npsurv (Kaplan-Meier) estimates where the number of
risk pop up as you hover over the curves, and you can click to bring up
confidence bands for differences in survival curves

- html methods for model fit summaries especially when using Rmarkdown html
notebooks

- instead of running print(fit, latex=TRUE) use
   options(prType='latex')
   print(fit)
   Or: options(prType='html')

A complete list of changes is below.  See
http://biostat.mc.vanderbilt.edu/Rrms for an html notebook showing examples
of new features.

Changes in version 5.0-0 (2016-10-31)
   * plot.summary.rms: implemented plotly interactive plots if
options(grType='plotly') in effect
* plot.anova.rms: implemented plotly interactive plots if
options(grType='plotly') in effect; remove psmall argument; changed margin
default to chisq and P
* ggplot.Predict: implemented plotly interactive plots if
options(grType='plotly') in effect
* print(fit, md=TRUE), prModFit, prStats: added latex/html methods using
htmlTable package and MathJax for latex math
* html.anova.rms, html.validate, html.summary.rms: new functions for use
with html and MathJax/knitr/RStudio
* latex methods for model fits: added md=TRUE argument to produce
MathJax-compatible latex and html code for fitted models when using R
Markdown
* html: new methods for model fit objects for use with R Markdown
* formatNP: fixed error when digits=NA
* latex.anova.rms: fixed error in not rounding enough columns doe to using
all.is.numeric intead of is.numeric
* catg: corrected bug that disallowed explicit catg() in formulas
* ggplot.Predict: added height and width for plotly
* survplot: respected xlim with diffbands.  Thanks: Toni G
* reVector: changed to reListclean and stored model stats components as
lists so can handle mixture of numeric and character, e.g., name of
clustering variable
* survplotp.npsurv: new function for interactive survival curve graphs
using plotly
* anova, summary, latex, print for model fits: use options(grType='html')
or 'latex' to set output type, output htmltools::HTML marked html so that
chunk header doesn't need results='asis'
* latex methods - set file default to ''
* GiniMd: moved to Hmisc package
* plot.nomogram: fixed bug where abbreviations were being ignored.  Thanks:
Zongheng Zhang
* nomogram: improved examples in help file
* survplot.npsurv: fixed n.risk when competing risks
* survest.cph, survfit.cph, others: fixed large problems due to
incompatibility with survival 2.39-5 for survival predictions; changed
object Strata to strata in cph to be compatible with survival package
* new test survest.r to more comprehensively check survfit.cph and
survest.cph
* ggplot.Predict: quit ignoring xlim; suppress confidence bands if two
group bariables because ggplot2 geom_ribbon doesn't support multiple
aesthetics
* predictrms: set contrasts for ordered factors to contr.treatment instead
of contr.poly
* val.prob: changed line of identity to use wide grayscale line instead of
dashed line

--
Frank E Harrell Jr  Professor and Chairman  School of Medicine

Department of *Biostatistics*  *Vanderbilt University*

[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] Massive Update to Hmisc package

2016-11-03 Thread Frank Harrell
Hmisc 4.0-0 is now on CRAN.  The package has undergone a massive update.
The most user-visable changes are;

- support for Rmarkdown html notebooks

- advanced html tables using the htmlTable package and summaryM function;
can copy and paste into word processors

- support for plotly interactive graphics, e.g.
  options(grType='plotly')
  plot(describe(mydata))

- new function ggfreqScatter

- new object markupSpecs that drives LaTeX and html translations

- describe function computes new statistics and relabels Unique as Distinct

The full list of changes is below.  You can look at examples of html
notebook reports produced using the new Hmisc at
http://biostat.mc.vanderbilt.edu/Hmisc .  These include interactive plotly
graphics.

Changes in version 4.0-0 (2016-10-31)
   * summaryP: fixed exclude1 logic - was not excluding enough levels
(inconsistent use of variable names vs. labels)
* latexTranslate: any NAs in first argument changed to "" before conversion
* minor.tick: added arguments to pass through (thanks: vhorvath)
* tests/latexpng.r: test conversion of LaTeX table to png
* latexTabular: made it properly call latexTranslate separately for each
column, convert argument to matrix or data frame if a vector
* tests/latexTabular.r: new test
* latexDotchart: added call to latexTranslate to escape special characters
for LaTeX such as underscore and pound sign
* ggfreqScatter: new function
* grType: new non-exported service function to sense if plotly is in effect
* plot.describe: new function to make plotly graph for describe objects
* describe: changed output: 3 decimal places for Info, new format for
frequency table, separated logic for 10 extreme values from logic for
frequency table, significant changes to print.describe
* dotchartp: new version of dotchart3 for plotly charts
* summaryD: modified to use dotchartp if options(grType='plotly') is in
effect
* nFm: added argument html
* label.default, label.Surv, labelPlotmath: added html argument to
implement HTML markup in place of plotmath (for plotly)
* now imports htmlTable, viridis
* html.contents.data.frame: return invisibly if file=''
* htmlVerbatim: new function
* html.contents.data.frame: improved html
* html.data.frame: changed column headers from h4 to 
* ggfreqScatter: added html argument
* knitrSet: was ignoring default figure h and w
* html.summaryM: new function using new version of latex.summaryM
* markupSpecs: new list object defining markup elements for latex and html,
used by summaryM
* show.html, print.html: removed; conflicted with htmltools/rmarkdown
* label: required exact match on attribute name "label" to not retrieve the
"labels" attribute (thanks: Ivan Puzek)
* htmlSN: new function to convert floating point to scientific format usint
html
* upData, cleanup.import: fix NA becoming new factor levels (thanks: Beau
Bruce)
* htmlTranslate: new function
* plotlySave: new function
* histSpikep: new function
* upData: new argument html
* plot.describe: added ggplot2 graphics, improved continuous display using
ggplotly
* labelPlotmath: cleaned up by using markupSpecs
* capitalize: fixed bug - did not cap. first letter if other letters after
it were upper case (thanks: dchiu911)
* html.summaryM: added brmsd argument to put mean, SD on separate line
* Save: changed default back to use gzip for speed
* knitrSet: used new knitr way to set aliases for option names
* latexTabular: made translate argument apply to body of table also;
implemented multi-line header
* html.data.frame: added several arguments
* describe: added html method
* html markupSpecs object: added bibliographic database utility functions
* bppltp: new service function for extended box plots for plotly
* plot.summaryM: new plotly method
* prType: new service function used to detect user settings for html, latex
for print methods
* html.contents.data.frame: removed file and append arguments and output an
html character vector instead
* prList: new function
* GiniMd: moved to Hmisc from rms
* describe: added GMD (Gini's mean difference) and relabeled unique to
distinct
* latex.summaryM: added Needspace{2.7in} before 2nd and later strata
* ggplot.summaryP: added point labels for use with plotly as hover text
* latex.summaryP: fixed bad linebreak at strata boundaries
* dotchartpl: new plotly dot chart function especially for summaryP
* plot.summaryP: new plotly method using dotchartpl when
options(grType='plotly')
* putHfig: new function to render graphics with captions in HTML reports
* pngNeedle: new function, like latexNeedle but useful with HTML reports
* html.data.frame: added width and caption arguments
* plotlyParm: list object with plotly helper functions
* summaryD, dotchartp: added symbol and col arguments
* upData: improved efficiency, added labelled class to variables without it
but having labels (e.g., data imported using haven package)
* gbayesMixPost: mixed error in posterior odds of the mixing parameter that
resulted in incorrect posterior probabilities when the 

Re: [R] Trouble getting rms::survplot(..., n.risk=TRUE) to behave properly

2016-06-02 Thread Frank Harrell
This happens when you have not strat variables in the model.


--
Frank E Harrell Jr  Professor and Chairman  School of Medicine

Department of *Biostatistics*  *Vanderbilt University*

On Thu, Jun 2, 2016 at 10:55 AM, Steve Lianoglou 
wrote:

> Hello foks,
>
> I'm trying to plot the number of patients at-risk by setting the
> `n.risk` parameter to `TRUE` in the rms::survplot function, however it
> looks as if the numbers presented in the rows for each category are
> just summing up the total number of patients at risk in all groups for
> each timepoint -- which is to say that the numbers are equal in each
> category down the rows, and they don't seem to be the numbers specific
> to each group.
>
> You can reproduce the observed behavior by simply running the code in
> the Examples section of ?survplot, which I'll paste below for
> convenience.
>
> Is the error between the chair and the keyboard, here, or is this perhaps
> a bug?
>
> === code ===
> library(rms)
> n <- 1000
> set.seed(731)
> age <- 50 + 12*rnorm(n)
> label(age) <- "Age"
> sex <- factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4)))
> cens <- 15*runif(n)
> h <- .02*exp(.04*(age-50)+.8*(sex=='Female'))
> dt <- -log(runif(n))/h
> label(dt) <- 'Follow-up Time'
> e <- ifelse(dt <= cens,1,0)
> dt <- pmin(dt, cens)
> units(dt) <- "Year"
> dd <- datadist(age, sex)
> options(datadist='dd')
> S <- Surv(dt,e)
>
> f <- cph(S ~ rcs(age,4) + sex, x=TRUE, y=TRUE)
> survplot(f, sex, n.risk=TRUE)
> ===
>
> I'm using the latest version of rms (4.5-0) running on R 3.3.0-patched.
>
> === Output o sessionInfo() ===
> R version 3.3.0 Patched (2016-05-26 r70671)
> Platform: x86_64-apple-darwin13.4.0 (64-bit)
> Running under: OS X 10.11.4 (El Capitan)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> other attached packages:
> [1] rms_4.5-0   SparseM_1.7 Hmisc_3.17-4ggplot2_2.1.0
> [5] Formula_1.2-1   survival_2.39-4 lattice_0.20-33
>
> loaded via a namespace (and not attached):
>  [1] Rcpp_0.12.5 cluster_2.0.4   MASS_7.3-45
>  [4] splines_3.3.0   munsell_0.4.3   colorspace_1.2-6
>  [7] multcomp_1.4-5  plyr_1.8.3  nnet_7.3-12
> [10] grid_3.3.0  data.table_1.9.6gtable_0.2.0
> [13] nlme_3.1-128quantreg_5.24   TH.data_1.0-7
> [16] latticeExtra_0.6-28 MatrixModels_0.4-1  polspline_1.1.12
> [19] Matrix_1.2-6gridExtra_2.2.1 RColorBrewer_1.1-2
> [22] codetools_0.2-14acepack_1.3-3.3 rpart_4.1-10
> [25] sandwich_2.3-4  scales_0.4.0mvtnorm_1.0-5
> [28] foreign_0.8-66  chron_2.3-47zoo_1.7-13
> ===
>
>
> Thanks,
> -steve
>
>
> --
> Steve Lianoglou
> Computational Biologist
> Genentech
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Recommendation for updating packages such as nlme

2016-02-16 Thread Frank Harrell
I use this function to update my installed R packages:

updatePac <- function (checkBuilt = FALSE, ...)
update.packages(repos = "http://cran.rstudio.com;, instlib = 
"/usr/local/lib/R/site-library",
 checkBuilt = checkBuilt, ...)

When I type updatePac() I get:

Warning: package 'MASS' in library '/usr/lib/R/library' will not be updated
Warning: package 'Matrix' in library '/usr/lib/R/library' will not be 
updated
Warning: package 'mgcv' in library '/usr/lib/R/library' will not be updated
Warning: package 'nlme' in library '/usr/lib/R/library' will not be updated
Warning: package 'nnet' in library '/usr/lib/R/library' will not be updated
Warning: package 'spatial' in library '/usr/lib/R/library' will not be 
updated

Should I not try to update recommended packages in 
/usr/local/lib/R/site-library on my Ubuntu system but specify 
/usr/lib/R/library instead?

-- 

Frank E Harrell Jr  Professor and Chairman  School of Medicine

Department of *Biostatistics*   *Vanderbilt University*


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Ordinal regression with some categories combined for some data

2016-01-17 Thread Frank Harrell
This does seem to be a good situation for ordinal regression.  The R rms 
package's orm function allows for thousands of categories in Y. But it 
doesn't handle censoring.

This discussion would be better for stats.stackexchange.com

Frank
-- 

Frank E Harrell Jr  Professor and Chairman  School of Medicine

Department of *Biostatistics*   *Vanderbilt University*


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Correct notation for functions, packages when using LaTex

2015-12-11 Thread Frank Harrell
Rolf I believe \textsf is the correct font to use for the symbol R but 
not necessarily for the names of R variables and functions.  I'd like 
more discussion about that.

Frank

-- 

Frank E Harrell Jr  Professor and Chairman  School of Medicine

Department of *Biostatistics*   *Vanderbilt University*


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with improveProb function in Hmisc in R

2015-10-02 Thread Frank Harrell
Please note that Kirsten is cross-posting to stats.stackexchange.com 
creating extra work for everyone.

-- 

Frank E Harrell Jr  Professor and Chairman  School of Medicine

Department of *Biostatistics*   *Vanderbilt University*


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] Major updates to Hmisc and rms packages

2015-09-30 Thread Frank Harrell
New versions of Hmisc and rms are available on CRAN.  Changes are listed 
below.

The most significant change to Hmisc is the addition of the ffCompress 
function that creates an optimal ff package object for large data frames 
by computing the maximum number of bits used by each numeric or logical 
variable in the data frame.  And the html.latex method now implements 
conversion of latex code generated by Hmisc to html for dynamic 
insertion in R Markdown documents (if you install the system package 
TeX4ht).

The most significant changes to rms are the correct handling of offset 
variables and updating the demos.


Hmisc Changes in version 3.17-0 (2015-09-20)
* format.df (used by latex.default): added space after textless, 
textgreater
  * label: changed default for units to value of plot
  * getRs: replaced where argument with guser, grepo, gdir, dir to 
allow easy fetching of updated functions from Hmisc etc.
  * Separated sas.get source code from other .get functions and from 
upData/cleanup.import by putting into 3 separate files.  Moved stata.get 
into misc.get.s
  * upData: for Stat/Transfer exported R workspaces, change 
variables into factors to incorporate value labels when present; added 
subset argument and reporting of number of observations pre and post 
subsetting
  * latex.default: added comma after botcap directive for ctable.  
Thanks: Paul Trowbridge
* Hmisc-internal.Rd: removed  alias{[.terms}
  * latex.default: for longtable when no caption is given, subtract 
one from table counter
  * latex.summaryM: quit ignoring insert.bottom if it is a character 
string (thanks: JoAnn Alvarez)
  * minor.tick: revised version by Earl Belllinger that fixes 
problem reported in https://github.com/harrelfe/Hmisc/issues/28
  * several functions: used new names when assigning temporary functions
  * NAMESPACE: add imports to base functions to avoid new R CMD 
CHECK warnings
  * ffCompress: new function
  * knitrSet: changed fig.path default to '' instead of NULL to work 
with knitr 1.11
  * html.latex: added argument rmarkdown
  * htmltools: added to suggests in DESCRIPTION
  * tests: new test script latex-html.Rmd for latex -> html under 
Rmarkdown/knitr/Rstudio, new test for cut2
  * plsmo, panel.plsmo: added method='intervals', mobs, ifun arguments


rms Changes in version 4.4-0 (2015-09-28)
* contrast.rms: made SE a vector not a matrix, added 4 list logic 
for nvary, added new test from JoAnn Alvarez
  * plot.summary.rms: correct bug where pch was ignored.  Thanks: 
Tamas Ferenci
  * prModFit: fix print(fit, latex=FALSE) when fit is result of 
robcov, bootcov
  * NAMESPACE: added imports for base functions used to avoid 
warnings with R CMD CHECK; new test rcs.r
  * prModFit: added rmarkdown argument.  All print.* methods can 
pass this argument.
  * All print methods for fit objects: left result as prModFit 
instead of invisible() so that rmarkdown will work
  * demo/all.R: updated for plot and ggplot methods, npsurv
  * cph, predictrms, rms, rms.trans, rmsMisc: changed Design 
function to return new objects sformula (formula without cluster()) and 
mmcolnames which provides a new way to get rid of strat() main effects 
and interactions involving non-reference cells; handle offsets in cph 
and predict() (not yet in Predict); new internal function 
removeFormulaTerms that does character manipulation to remove terms like 
cluster() or offset() or the dependent variable(s).  This gets around 
the problem with [.terms messing up offset terms when you subset on 
non-offset terms
  * Glm, ols: fixed offset
  * bj, Gls, lrm, orm, psm, Rq: change to new offset method and sformula
  * Predict: added offset=list(offsetvariable=value)
  * several: made temporary function names unique to avoid warnings 
with R CMD CHECK
  * ggplot.Predict: changed facet_wrap_labeller to not mess with 
class of returned object from ggplotGrob
  * Design: fixed column names for matrix predictors
  * Design, cph: handled special case where model is fit on a fit$x 
matrix
  * dxy.cens: exported
  * cph: added debug argument
  * tests/cph4.r: new tests for various predictor types
  * rms: changed warning to error if an ordered factor appears in 
the model and options(contrasts) is not set properly
  * rms transformation functions: made more robust by checking ! 
length instead of is.null


-- 

Frank E Harrell Jr  Professor and Chairman  School of Medicine

Department of *Biostatistics*   *Vanderbilt University*


[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org

Re: [R] Nagelkerke Pseudo R-squared

2015-07-18 Thread Frank Harrell
It's implemented in the R rms package's lrm and orm functions.



-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Nagelkerke-Pseudo-R-squared-tp4710014p4710031.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different behavior of model.matrix between R 3.2 and R3.1.1

2015-06-15 Thread Frank Harrell
Thank you very much Terry.  I'm still puzzled at why this worked a year 
ago.  What changed?  I'd very much like to reverse the change by setting 
an argument somewhere or manipulating the terms object.

I echo your sentiments about the general approach.

Frank


On 06/15/2015 09:05 AM, Therneau, Terry M., Ph.D. wrote:
 Frank,
   I don't think there is any way to fix your problem except the way 
 that I did it.

 library(survival)
 tdata - data.frame(y=c(1,3,3,5, 5,7, 7,9, 9,13),
 x1=factor(letters[c(1,1,1,1,1,2,2,2,2,2)]),
 x2= c(1,2,1,2,1,2,1,2,1,2))

 fit1 - lm( y ~ x1 * strata(x2) - strata(x2), tdata)
 coef(fit1)
(Intercept)x1b x1a:strata(x2)x2=2 
 x1b:strata(x2)x2=2
   3.00   5.00   1.00 1.67

 Your code is calling model.matrix with the same model frame and terms 
 structure as the lm call above (I checked).  In your case you know 
 that the underlying model has 2 intercepts (strata), one for the group 
 with x2=1 and another for the group with x2=2, but how is the 
 model.matrix routine supposed to guess that?  It can't, so 
 model.matrix returns the proper result for the lm call.  As seen above 
 the result is not singular, while for the Cox model it is singular due 
 to the extra intercept.

 This is simply an extension of leaving the intercept term in the 
 model and then removing that column from the returned X matrix, which 
 is necessary to have the correct coding for ordinary factor variables, 
 something we've both done since day 1.  In order for model.matrix to 
 do the right thing with interactions, it has to know how many 
 intercepts there actually are.

 I've come to the conclusion that the entire thrust of 'contrasts' in S 
 was wrong headed, i.e., the remove redundant columns from the X 
 matrix ahead of time logic.  It is simply not possible for the 
 model.matrix routine to guess correctly for all y and x combinations, 
 something that been acknowledged in R by changing the default for 
 singular.ok to TRUE. Dealing with this after the fact via a good 
 contrast function (a la SAS -- heresy!) would have been a much better 
 design choice.  But as long as I'm in R the coxph routine tries to be 
 a good citizen.

 Terry T.

-- 

Frank E Harrell Jr  Professor and Chairman  School of Medicine

Department of *Biostatistics*   *Vanderbilt University*


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different behavior of model.matrix between R 3.2 and R3.1.1

2015-06-14 Thread Frank Harrell
Terry - your example didn't demonstrate the problem because the variable 
that interacted with strata (zed) was not a factor variable.


But I had stated the problem incorrectly.  It's not that there are too 
many strata terms; there are too many non-strata terms when the variable 
interacting with the stratification factor is a factor variable.  Here 
is a simple example, where I have attached no packages other than the 
basic startup packages.


strat - function(x) x
d - expand.grid(a=c('a1','a2'), b=c('b1','b2'))
d$y - c(1,3,2,4)
f - y ~ a * strat(b)
m - model.frame(f, data=d)
Terms - terms(f, specials='strat', data=d)
specials - attr(Terms, 'specials')
temp - survival:::untangle.specials(Terms, 'strat', 1)
Terms - Terms[- temp$terms]
model.matrix(Terms, m)

  (Intercept) aa2 aa1:strat(b)b2 aa2:strat(b)b2
1   1   0  0  0
2   1   1  0  0
3   1   0  1  0
4   1   1  0  1
. . .

The column corresponding to a='a1' b='b2' should not be there 
(aa1:strat(b)b2).


This does seem to be a change in R.  Any help appreciated.

Note that after subsetting out strat terms using Terms[ - temp$terms], 
Terms attributes factor and term.labels are:


attr(,factors)
 a a:strat(b)
y0  0
a1  2
strat(b) 0  1
attr(,term.labels)
[1] a  a:strat(b)


Frank


On 06/11/2015 08:44 AM, Therneau, Terry M., Ph.D. wrote:

Frank,
   I'm not sure what is going on.  The following test function works for
me in both 3.1.1 and 3.2, i.e, the second model matrix has fewer
columns.  As I indicated to you earlier, the coxph code removes the
strata() columns after creating X because I found it easier to correctly
create the assign attribute.

   Can you create a worked example?

require(survival)
testfun - function(formula, data) {
 tform - terms(formula, specials=strata)
 mf - model.frame(tform, data)

 terms2 - terms(mf)
 strat - untangle.specials(terms2, strata)
 if (length(strat$terms)) terms2 - terms2[-strat$terms]
 X - model.matrix(terms2, mf)
 X
}

tdata - data.frame(y= 1:10, zed = 1:10, grp =
factor(c(1,1,1,2,2,2,1,1,3,3)))

testfun(y ~ zed*grp, tdata)

testfun(y ~ strata(grp)*zed, tdata)


Terry T.

- original message --

For building design matrices for Cox proportional hazards models in the
cph function in the rms package I have always used this construct:

Terms - terms(formula, specials=c(strat, cluster, strata),
data=data)
specials - attr(Terms, 'specials')
stra- specials$strat
Terms.ns - Terms
  if(length(stra)) {
temp - untangle.specials(Terms.ns, strat, 1)
Terms.ns - Terms.ns[- temp$terms]#uses [.terms function
  }
X - model.matrix(Terms.ns, X)[, -1, drop=FALSE]

The Terms.ns logic removes stratification factor main effects so that
if a stratification factor interacts with a non-stratification factor,
only the interaction terms are included, not the strat. factor main
effects. [In a Cox PH model stratification goes into the nonparametric
survival curve part of the model].

Lately this logic quit working; model.matrix keeps the unneeded main
effects in the design matrix.  Does anyone know what changed in R that
could have caused this, and possibly a workaround?


---




--

Frank E Harrell Jr  Professor and Chairman  School of 
Medicine
Department of *Biostatistics*   *Vanderbilt University*

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Different behavior of model.matrix between R 3.2 and R 3.1.1

2015-06-10 Thread Frank Harrell
For building design matrices for Cox proportional hazards models in the 
cph function in the rms package I have always used this construct:

Terms - terms(formula, specials=c(strat, cluster, strata), data=data)
specials - attr(Terms, 'specials')
stra- specials$strat
Terms.ns - Terms
 if(length(stra)) {
   temp - untangle.specials(Terms.ns, strat, 1)
   Terms.ns - Terms.ns[- temp$terms]#uses [.terms function
 }
X - model.matrix(Terms.ns, X)[, -1, drop=FALSE]

The Terms.ns logic removes stratification factor main effects so that 
if a stratification factor interacts with a non-stratification factor, 
only the interaction terms are included, not the strat. factor main 
effects. [In a Cox PH model stratification goes into the nonparametric 
survival curve part of the model].

Lately this logic quit working; model.matrix keeps the unneeded main 
effects in the design matrix.  Does anyone know what changed in R that 
could have caused this, and possibly a workaround?

Note that cph is a kind of front-end to the survival package's coxph 
function.  Therry Therneau uses more complex logic to construct the 
design matrix reliably.  I'd like to avoid that logic because it creates 
an overly wide design matrix before removing the unneeded columns.

Thanks for any assistance,
Frank


-- 

Frank E Harrell Jr  Professor and Chairman  School of Medicine

Department of *Biostatistics*   *Vanderbilt University*


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] Version 3.16-0 of Hmisc now on CRAN

2015-05-04 Thread Frank Harrell
Several updates have been made to the Hmisc package:

Changes in version 3.16-0 (2015-04-25)
* html.contents.data.frame: corrected html space character to add 
semicolon
* ggplot.summaryP: added size of points according to denominators
* colorFacet: new function
* labelPlotmath: added chexpr argument (used by rms::ggplot.Predict)
* rcsplineFunction: added type='integral'
* summaryP: fixed bug with sort=FALSE using mfreq when shouldn't
* summaryP: stored levels(val) in original levels order
* summaryM: removed observations added by addMarginal when computing 
N in left column of tables
* html.latex: added method for htlatex, added where argument, 
cleaned up code, implemented file='' for knitr when using html/Rmarkdown
* summaryM, summary.formula: changed calls to translate to gsub()
* summaryP: corrected but in exclude1 logic, moved exclude1 to 
methods that operate on summaryP objects and out of summaryP itself
* addMarginal: respect original levels, add argument margloc
* added latticeExtra:: in front of function calls
* numeric.string, all.is.numeric: replaced options(warn=-1) with 
suppressWarnings() (thanks: Yihui)
* arrGrob, print.arrGrob: new functions
* wtd.var: added maximum likelihood method, fixed unbiased method, 
improved documentation (all provided by Benjamin Tyner)
* Changed all any(duplicated()) to anyDuplicated(); thanks Benjamin 
Tyler
* getRs: new function to interact with 
https://github.com/harrelfe/rscripts
* knitrSet: new function to setup knitr with nice defaults for books 
etc.
  * rcorr: fixed sensing of NAs and diagonal elements of n matrix; 
thanks: Keith Jewell, Campden BRI Group; similar for hoeffd

-- 

Frank E Harrell Jr  Professor and Chairman  School of Medicine

Department of *Biostatistics*   *Vanderbilt University*


[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Correction: Regression Modeling Strategies 4-Day Short Course March 2015

2015-01-21 Thread Frank Harrell
*RMS Short Course 2015*
Frank E. Harrell, Jr., Ph.D., Professor and Chair
Department of Biostatistics, Vanderbilt University School of Medicine

*March 3, 4, 5  6, 2015* With Optional R Workshop March 2
9:00am - 4:00pm
Student Life Center Board of Trust Room
Vanderbilt University
Nashville Tennessee USA

See http://biostat.mc.vanderbilt.edu/2015RMSShortCourse for details.

The course includes statistical methodology, case studies, and use of 
the R rms package.

Please email interest to Ashlee Bartley
CORRECTED EMAIL:  ashlee.bart...@vanderbilt.edu

-
If your web browser handles redirection in a non-standard way, go to
http://biostat.mc.vanderbilt.edu/wiki/Main/2015RMSShortCourse

-- 

Frank E Harrell Jr  Professor and Chairman  School of Medicine

Department of *Biostatistics*   *Vanderbilt University*


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regression Modeling Strategies 4-Day Short Course March 2015

2015-01-19 Thread Frank Harrell
Subject: Regression Modeling Strategies 4-Day Short Course March 2015

*RMS Short Course 2015*
Frank E. Harrell, Jr., Ph.D., Professor and Chair
Department of Biostatistics, Vanderbilt University School of Medicine

*March 3, 4, 5  6, 2015* With Optional R Workshop March 2
9:00am - 4:00pm
Student Life Center Board of Trust Room
Vanderbilt University
Nashville Tennessee USA

See http://biostat.mc.vanderbilt.edu/2015RMSShortCourse for details.

The course includes statistical methodology, case studies, and use of 
the R rms package.

Please email interest to Ashlee Bartley
ashlee.f.bart...@vanderbilt.edu

-- 

Frank E Harrell Jr  Professor and Chairman  School of Medicine

Department of *Biostatistics*   *Vanderbilt University*


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] CRAN submission of Hmisc 3.14-5

2014-09-19 Thread Frank Harrell

An update to the Hmisc package is now on CRAN.  Some recent changes:

   * latex.summaryM: fixed bug in caption with test=TRUE.  Thanks: 
Yonghao Pua
   * combined.levels: sensed all NA vector, now return non-factor 
numeric instead

   * dataframeReduce: handle all-NA factor variable
   * subplot: replaced with latest version from TeachingDemos package 
by Greg Snow
   * latexTabular: fixed error in example in help file; result is not a 
file

   * latex: another test added in tests/latex.s
   * summaryP: removed observations with a right-hand-side variable missing
   * latex.summaryP: fixed bug with wrong column labels due to reshape 
reordering columns coming from factor levels alphabetically instead of 
by original levels
   * format.df: added %  = = to list of characters handled, the last 
two by going into math mode

   * latex.summaryP: use blank if denominator 0, instead of NaN
   * summary.formula: fixed problem with deparse formula.  Thanks: 
Andreas Kiermeier
   * describe: added relative information measure for numeric variables 
- a measure of how continuous the variable is
   * wtd.table: detect duplications using duplicated() instead of 
diff(x) to handle Inf.  Thanks: Benjamin Tyner

   * DESCRIPTION, NAMESPACE: multiple function changes to work in R-devel

See http://biostat.mc.vanderbilt.edu/Hmisc, 
https://github.com/harrelfe/Hmisc for more information.


Frank Harrell
Vanderbilt University

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] CRAN update: rms 4.2-1

2014-09-19 Thread Frank Harrell

An update to the rms package is now on CRAN.  Some recent changes:

   * plot.summary.rms: allowed a vector for lwd, and passed lwd to 
confbar.  Thanks: Michael Friendly
   * gendata: Starting in R 3.1.0, as.data.frame.labelled or 
as.data.frame.list quit working when length vary; workaround

   * predictrms, ols: handle offset in formula.  Thanks: Max Gordon
   * pentrace: neatened code, added new argument noaddzero if user 
wants to prevent unpenalized model from being tried; add new test script 
in tests
   * bplot: fixed bug whereby xlabrot was ignored.  Thanks: Sven 
Krackow sven.krac...@gmail.com; new test for bplot in tests directory

   * plot.Predict: fixed bug in which 2nd argument to perim was not correct
   * validate.ols: Shane McIntosh fixed the passing of the tolerance 
argument to predab.resample
   * predictrms: computed offset earlier so always defined no matter 
the value of type
   * plot.Predict: added scaletrans argument, fixed use of subscripts 
in pan

   * lrm, lrm.fit: added scale argument
   * orm, orm.fit: added scale argument
   * vcov.orm: accounted for scale when extracting covariance matrix
   * npsurv: was not passing type argument
   * npsurv: start storing all classes created by survfit.formula
   * logLik.Gls: added.  Makes AIC(Gls object) work.
   * NAMESPACE: several changes


See http://biostat.mc.vanderbilt.edu/rms and 
https://github.com/harrelfe/rms for more information.


Frank Harrell
Vanderbilt University

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Predictions from coxph or cph objects

2014-07-06 Thread Frank Harrell
When using cph in the rms package there is a function Mean that operates  
on cph objects to produce an R function for computing the mean or  
restricted mean life time.


Frank

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using subplot (from Hmisc) along with par(mfrow)

2014-07-06 Thread Frank Harrell
Greg I just re-copied the latest subplot and its help file from  
TeachingDemos to Hmisc for the next release.  Thanks for pointing this out.

Frank

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] survplot invert number at risk labels

2014-07-02 Thread Frank Harrell
Please try hard to create a simple reproducible example, then I'll work 
on it.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SAS VS R - What type of analysis is better?

2014-06-06 Thread Frank Harrell
I can't think of an example where R does not work better than SAS except 
for a few cases of mixed effects regression models and for processing 
enormous datasets when the R user does not want to learn about the 
latest R tools for large datasets.  I quit using SAS in 1991 (in favor 
of S-Plus and transitioned to R around 2000) and have never looked back. 
 Lately what has really made R powerful is its ability to interface 
with other languages and especially the way it works in a reproducible 
analysis/dynamic report document context.


Frank
--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] Updates to Hmisc, rms, and greport packages

2014-04-16 Thread Frank Harrell

The Hmisc package has had a number of updates/fixes:

Changes in version 3.14-4 (2014-04-13)
   * rcspline.eval: stop new logic for ignoring outer values when there 
are many ties when there are also many ties on interior values.  Added 
new logic to use interior unique values of x when the number of unique x 
is small.

   * latexBuild: generalized with insert argument
   * latex.default: modified to use mods of latexBuild, fixed bug with 
caption.loc='bottom' (thanks: YacineH)
   * latex.default: fixed bug where comma did not appear after 
caption={} for ctable (thanks: Johannes Hofrichter)
   * tests: fit.mult.impute.bootstrap.r: added new example (thanks: 
Jane Cook)
   * fit.mult.impute: added call for fit.mult.impute in returned 
object, replacing call from fitter; makes update work for fit.mult.impute
   * summary.formula: fixed recognition of multiple left-hand-side 
variables to trigger call to summaryM (thanks: Kevin Thorpe)
   * summaryM: changed group label to '' instead of 0 for formulas like 
age + sex ~ 1

   * Ecdf: added what argument to all functions
   * nobsY: return constructed id vector
   * addMarginal: instead of .marginal. being logical, make it contain 
names of variables being marginalized over

   * mChoice.c: fixed some inconsistencies

The rms package has also had some changes, and the survfit.formula 
function has been REMOVED, replaced by the npsurv function:


Changes in version 4.2-0 (2014-04-13)
   * Deprecated survfit.formula so would not overlap with function in 
survival

   * Added function npsurv, survplot.npsurv
   * REMOVED survfit.formula
   * Used new type argument to label.Surv for fitting functions
   * cph: added weights argument to residuals.coxph (Thanks: Thomas Lumley)
   * survfit.cph: fixed bug in using wrong type variable.  Thanks: 
Zhiyuan Sun

   * cph: added weighted=TRUE in call to residuals.coxph (Thanks: T Lumley)
   * orm.fit: improved ormfit to not try to deal with NaN in V, 
assuming that step-halving will happen


Some improvements have been made in the greport package:

Changes in version 0.5-1 (2014-04-15)
   * survReport: changed to use npsurv instead of survfit.formula
	 * exReport: changed order of output so that analysis of randomized 
patients marked for exclusions appears last; use LaTeX chngpage package 
to allow detailed table to go into left margin so as to be centered on page

   * exReport: added adjustwidth argument
 * accrualReport: allowed enrollment target N to be omitted
   * exReport: fine tuning
   * nriskReport: new report to show number of subjects still being 
followed at each day

 * Merge: added support for data.table
   * nriskReport: added id() variable
   * exReport: fixed bug when there is an exclusion with 0 frequency
   * accrualReport: improved graphics formatting, added minrand argument
   * accrualReport: added enrollmax argument, clarified notation
   * exReport: added ignoreExcl, ignoreRand arguments
   * all: added greportOption texwhere; default is 'gentex'; can 
specify texwhere='' to write non-appendix LaTeX code to console as for knitr
   * dReport: for byx for discrete Y, sense when Y is binary and use 
Wilson interval instead of bootstrap; adjust SE using confidence 
interval if proportion is 0 or 1
   * dReport: changed discreteness non-binary classification to use 
maximum number of unique values or Y instead of minimum

   * add globalVariables call to nriskReport


--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] New version of Hmisc package on CRAN

2014-03-04 Thread Frank Harrell

Recent changes include the following.

Changes in version 3.14-2 (2014-02-26)
   * latex.default: improved logic using new function in Misc: latexBuild
	 * latex.default: fixed bug with ctable=TRUE with no caption by 
removing default label

   * latex.default: improved formatting for insert.top
   * latex.default: added tests, fixed insert.bottom
   * latex.summaryM: return stat summary key as legend attribute, use 
this according to insert.bottom argument
   * latex.summary.formula.response: fixed bug related to computation 
of cdec.  Thanks: Kevin Thorpe
   * latex.default: added new argument star: ctables uses this to 
spread over two columns when the LaTeX document is in \twocolumn mode. 
Thanks:  David Whiting


Changes in version 3.14-1 (2014-02-25)
   * Added latexNeedle function
   * Change latexTherm, latexNeedle to use user LaTeX macro \tooltipn 
to do the pop-up

   * latex.default: changed line breaks around \end{tabular}
   * latex.summaryM: put insert.bottom text in minipage so \tooltip 
will not devote wide space to it
   * sas.get: added defaultencoding argument and logic (Thanks: 
Reinhold Koch)

   * plot.summaryP: omit tick marks for proportion  1.0
   * format.df (used by latex): fixed na.blank logic for character var
   * latex: removed newlines when ending environments, added hyperref 
argument

   * latex: added center='centerline', fixed 'centering'
   * upData, cleanup.import, dataframeReduce: changed argument pr to print
   * rcspline.eval: added more evasive action in case of extreme ties

Changes in version 3.14-0 (2014-01-22)
   * Added trans argument to varclus
   * Removed recode, existsFunction functions, under.unix object, 
survfitKM, functions used only by S-Plus: comment, mem, mulbar.chart, 
p.sunflowers

   * as.category, is.category, untangle.special: removed
   * Removed reference to .R. from many functions
   * Remove oldClass, oldUnclass, getFunction
   * latex.default: changed 'rotate' to 'sideways' for ctable mode. 
Thanks: Simon Zehnder szehn...@uni-bonn.de

   * gView: removed
   * ldBands: removed
   * summaryP: new function - graphical replacement for tables of 
proportions
   * ynbind: new function for combining related yes/no variables into a 
matrix with a label

   * added file argument to prn
   * summaryP: added autoarrange
   * added addMarginal and nobsY functions
   * pBlock: new function for blocking variables for summaryP
   * summaryP: changed text positioning to grid absolutes, added 
text.at argument
   * scat1d, histSpike: if grid used and y has grid units, fixed logic 
for frac

   * plsmo, panel.plsmo: added scat1d.opts argument
   * label.Surv, units.Surv: added, removed ::: in survival calls
   * summarize: added keepcolnames argument
   * Suppressed startup message unless options(Hverbose=TRUE) is set
   * summaryS: new function - multi-panel lattice xy and dot plots
   * summaryD: added ylab argument
   * dotchart3: quit letting left margin be less than pre-existing one
   * multLines: new function
   * Improved nobsY to respect subject IDs when counting number of 
subjects, and to return an attribute 'formula' without id variable; 
changed bpplotM, summaryP, summaryS to use this
   * Removed nobsY calculations from bpplotM, summaryP, summaryS, 
enhanced nobsY to allow stratification by treatment

   * panel.bpplot: added violin and violin.opts arguments
   * summaryS: added medvPanel support during-plot vertical violin plots
   * plot.summaryP: padded x-axis limits
   * latexTabular: added translate and hline arguments; moved to its 
own file and help page

   * latexTherm: added tooltip using LaTeX ocgtools package
   * summaryP: stopped reversing order of panels
   * summaryM: added table.env argument, changed how model.frame built
   * latex.summaryM: changed to print proportions by default, added 
round='auto'

   * character.table: added xpd=NA; thanks: Dale
   * summaryP: added latex method
   * latex.default: added insert.top argument
   * summaryM: added stratification (multiple tables)



--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] rms package updated

2014-03-04 Thread Frank Harrell

Changes are listed below.

NOTE: The next release of rms will replace the survfit.formula function 
with a new function named npsurv (nonparametric survival estimates) so 
as to not conflict with the survival package.  This is a 
NON-DOWNWARD-COMPATIBLE change.  To get Kaplan-Meier estimates in the 
NEXT release use npsurv() instead of survfit().


The web site for more information about rms is 
http://biostat.mc.vanderbilt.edu/Rrms



Changes in version 4.1-3 (2014-03-02)
   * num.intercepts: removed (is in Hmisc)
	 * survfit.formula, cph, psm: changed to use inputAttributes attribute 
of Surv objects (introduced earlier in survival package so that rms 
could drop its customized Surv function)

   * Exported survfit.formula
   * Changed survival fitting functions and residuals to use units.Surv

Changes in version 4.1-2 (2014-02-28)
   * psm: Fixed bug to allow computation of Dxy with left censoring
   * val.prob: Fixed recently introduced bug that made calibration 
intercept and slope always 0,1.  Thanks: lars.engerst...@lio.se

   * plot.Predict: added between to leave space between panels
   * orm.fit: fixed error in kmid calculation when heavy ties at first 
level of y.  Thanks: Yuwei Zhu
   * setPb: changed default to now use tktcl to show progress bars for 
simulations

   * predictrms: fixed bug with type='terms'
   * val.surv: handle case where survival estimates=0 or 1 when using 
log-log transform


Changes in version 4.1-1 (2014-01-22)
   * Removed use of under.unix in anova.rms, latex.summary, plot.nomogram
   * Removed use of oldUnclass, oldClass, is.category
   * Fixed class of Rq object; had failed with bootcov.  Thanks: Max Gordon
   * survplot: preserved par()
   * Srv: removed, changed all uses to Surv() for new survival package 
that preserves attributes for Surv inputs

   * survplot.survfit, survdiffplot: added conf='diffbands'
   * predictrms: fixed num. intercepts calculation order
   * survplot, survdiffplot: used original standard error for 
survdiffplot, and fun

   * dyx.cens: allow left-censoring

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] New package on CRAN: greport

2014-03-04 Thread Frank Harrell
The first submission of the new greport package is now on CTAN.  This 
package facilitates graphical reporting of clinical trials an is highly 
tied to LaTeX, Hmisc, and lattice.  It creates hyperlinks and pop-up 
tooltips in the resulting pdf report.  Tables are greatly de-emphasized, 
but hyperlinks take you from primary graphics to supporting tables in an 
appendix, which are hyperlinked back to the graphic.


The package's web page, with example pdf reports, is at 
http://biostat.mc.vanderbilt.edu/Greport


--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regression Modeling Strategies 4-Day Short Course March 2014

2014-02-10 Thread Frank Harrell

My yearly Regression Modeling Strategies course is expanded to 4 days
this year to be able relax the pace a bit.  Details are below.
Questions welcomed.
-

*RMS Short Course 2014*
Frank E. Harrell, Jr., Ph.D., Professor and Chair
Department of Biostatistics, Vanderbilt University School of Medicine

*March 4, 5, 6  7, 2014*
9:00am - 4:00pm (9:00am - 2:00pm March 7)
Alumni Hall
Vanderbilt University
Nashville Tennessee USA

See http://biostat.mc.vanderbilt.edu/2014RMSShortCourse for details.

The course includes statistical methodology, case studies, and use of
the R rms package.

Please email interest to Audrey Carvajal {audrey.carva...@vanderbilt.edu}

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Handlig large SAS file in R

2014-01-27 Thread Frank Harrell
For that you need to purchase Stat/Transfer.
Frank


hans012 wrote
 Hey Guys
 I have a .sas7bdat file of 1.79gb that i want to read.
 I am using the .sas7bdat package to read the file and after i typed the
 command read.sas7bdat('filename.sas7bdat') it has been 3 hours with no
 result so far.
 Is there a way that i can see the progress of the read? 
 Or is there another way to read the file with less computing time?
 I do not have access to SAS, the file was sent to me.
 
 Let me know what you guys think
 KR
 Hans





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Handlig-large-SAS-file-in-R-tp4684212p4684250.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] Major update to Hmisc package available on CRAN

2014-01-23 Thread Frank Harrell
Hmisc has had the following updates, with some of the new functions 
demonstrated on http://biostat.mc.vanderbilt.edu/HmiscNew .  Much of the 
new work is motivated by the need to replace tables with graphics.



Changes in version 3.14-0 (2014-01-22)
   * Added trans argument to varclus
   * Removed recode, existsFunction functions, under.unix object, 
survfitKM, functions used only by S-Plus: comment, mem, mulbar.chart, 
p.sunflowers

   * as.category, is.category, untangle.special: removed
   * Removed reference to .R. from many functions
   * Remove oldClass, oldUnclass, getFunction
   * latex.default: changed 'rotate' to 'sideways' for ctable mode. 
Thanks: Simon Zehnder szehn...@uni-bonn.de

   * gView: removed
   * ldBands: removed
   * summaryP: new function - graphical replacement for tables of 
proportions
   * ynbind: new function for combining related yes/no variables into a 
matrix with a label

   * added file argument to prn
   * summaryP: added autoarrange
   * added addMarginal and nobsY functions
   * pBlock: new function for blocking variables for summaryP
   * summaryP: changed text positioning to grid absolutes, added 
text.at argument
   * scat1d, histSpike: if grid used and y has grid units, fixed logic 
for frac

   * plsmo, panel.plsmo: added scat1d.opts argument
   * label.Surv, units.Surv: added, removed ::: in survival calls
   * summarize: added keepcolnames argument
   * Suppressed startup message unless options(Hverbose=TRUE) is set
   * summaryS: new function - multi-panel lattice xy and dot plots
   * summaryD: added ylab argument
   * dotchart3: quit letting left margin be less than pre-existing one
   * multLines: new function
   * Improved nobsY to respect subject IDs when counting number of 
subjects, and to return an attribute 'formula' without id variable; 
changed bpplotM, summaryP, summaryS to use this
   * Removed nobsY calculations from bpplotM, summaryP, summaryS, 
enhanced nobsY to allow stratification by treatment

   * panel.bpplot: added violin and violin.opts arguments
   * summaryS: added medvPanel support during-plot vertical violin plots
   * plot.summaryP: padded x-axis limits
   * latexTabular: added translate and hline arguments; moved to its 
own file and help page

   * latexTherm: added tooltip using LaTeX ocgtools package
   * summaryP: stopped reversing order of panels
   * summaryM: added table.env argument, changed how model.frame built
   * latex.summaryM: changed to print proportions by default, added 
round='auto'

   * character.table: added xpd=NA; thanks: Dale
   * summaryP: added latex method
   * latex.default: added insert.top argument
   * summaryM: added stratification (multiple tables)


--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regression Modeling Strategies 4-Day Short Course March 2013

2014-01-16 Thread Frank Harrell
My yearly Regression Modeling Strategies course is expanded to 4 days 
this year to be able relax the pace a bit.  Details are below. 
Questions welcomed.

-

*RMS Short Course 2014*
Frank E. Harrell, Jr., Ph.D., Professor and Chair
Department of Biostatistics, Vanderbilt University School of Medicine

*March 4, 5, 6  7, 2014*
9:00am - 4:00pm (9:00am - 2:00pm March 7)
Alumni Hall
Vanderbilt University
Nashville Tennessee USA

See http://biostat.mc.vanderbilt.edu/2014RMSShortCourse for details.

The course includes statistical methodology, case studies, and use of
the R rms package.

Please email interest to Audrey Carvajal {audrey.carva...@vanderbilt.edu}

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] howto join matrices produced by rcorr()

2014-01-01 Thread Frank Harrell
This is an unstable process.  I suggest using the bootstrap to get a
confidence interval for the rank of each correlation coefficient among all
non-diagonal correlations.



-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/howto-join-matrices-produced-by-rcorr-tp4682867p4682932.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Possible to vary widths and/or heights of lattice panels?

2013-12-22 Thread Frank Harrell
Thanks very much Rich and Duncan.  latticeExtra's resizePanels function 
was a perfect solution.


Frank

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Possible to vary widths and/or heights of lattice panels?

2013-12-21 Thread Frank Harrell
Is it possible to vary the size of the panels in lattice within one 
page?  The examine I have in mind is a 3 row by 2 column display where I 
vary the y-axis scales in a dot plot and there are only a few levels on 
the y-axis for one of the rows.  I'd like to remove wasted space in that 
row.


On a related issue 
http://stackoverflow.com/questions/9654244/multipage-lattice-panel-arrangement 
has some good information.


Thanks for any pointers.
Frank

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Simple way to define a function to be used in a formula object inside another function

2013-12-15 Thread Frank Harrell

I would like to do this:

f - function(formula, data=NULL) {
 gg - sqrt
 model.frame(formula, data=data)
 }
x - y - 1:10
f(y ~ gg(x))
Error in eval(expr, envir, enclos) : could not find function gg

Is there a simple way to get access to gg from within the model.frame 
invocation inside f?


Thanks
Frank

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple way to define a function to be used in a formula object inside another function

2013-12-15 Thread Frank Harrell

Thank you Bill, that worked perfectly.
Frank

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] rms 4.1-0

2013-12-06 Thread Frank Harrell

The rms package has had several updates in version 4.1-0:

   * Fixed orm.fit to not create penalty matrix if not needed 
(penalties are not yet implemented anyway)

   * Added yscale argument to plot.Predict
   * Added Wald test simulation to orm help file
   * Added example in help file for plot.anova.rms of adding a line 
combining the effects of two predictors in dot chart

   * Fixed grid interpretation error in survplot.survfit
   * Changed plot.anova.rms to use dotchart3 instead of dotchart2
   * Fixed bug in summary.rms - was taking reciprocal of effect ratio 
with orm even if not loglog family (thanks: Yong Hao Pua 
puayong...@gmail.com

   * Removed link to print.lm, summary.lm in ols.Rd
   * Added ntrans argument to plot.anova.rms
   * Fixed handling of intercepts in Rq, validate.Rq
   * Removed residuals.Glm, residuals.rms (also from Rd, NAMESPACE)
   * Removed other .rms methods and other remnants from fooling S+ 
dispatcher
   * Fixed bug in lm.pfit when penalty used (thanks: Yong Hao Pua 
puayong...@gmail.com)

   * Fixed bug in calibrate.default for ols (thanks: Andy Bush)
   * Change print.contrast.rms to insert NA for SE if fun is not the 
identity function
   * Added margin argument to plot.anova.rms to print selected stats in 
right margin of dot chart
   * Added anova argument to plot.Predict to allow overall association 
test statistics to be added to panels
   * Fixed bug in val.prob in which the logistic model was re-fitted 
instead of fixing coefficients at 0,1.  This resulted in model 
statistics (including c-index) to always be favorable even when 
predictions were worse than change.  Thanks: Kirsen Van Hoorde 
kirsten.vanhoo...@esat.kuleuven.be
   * Fixed bug in survdiffplot where conf.int was always overridden by 
value from survfit.  Thanks: Kamil Fijorek kamilfijo...@gmail.com
   * Fixed bug in grid= for survplot.* and survdiffplot.  Thanks: Kamil 
Fijorek
   * Fixed rms.s to account for possible offset in names(nmiss). 
Thanks: Larry Hunsicker
   * Fixed psm.s to not compute Dxy if simple right censoring is not in 
effect.  Thanks: I.M. Nolte
   * rcs: respect system option fractied, passed to rcspline.eval; can 
be used to get old behavior

   * Gls: as nlme 3.1-113 exports more functions, removed nlme:::

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] Hmisc package 3.13-0

2013-11-23 Thread Frank Harrell
A significant update to the Hmisc package is now available on CRAN for 
all platforms.  Hmisc source is now on github at 
https://github.com/harrelfe/Hmisc and the full change log may be found 
at https://github.com/harrelfe/Hmisc/commits/master


The most important updates are additions of new graphics functions for 
summarizing and displaying data with an aim of replacing tables.  There 
is also an interface to Duncan Murdoch's tables package for advanced 
LaTeX table making, allowing Hmisc variable labels and units of 
measurement to be automatically incorporated into table row or column 
labels.  New functions are overviewed and exemplified at 
http://biostat.mc.vanderbilt.edu/HmiscNew .  There is also a new 
function latexTherm that writes a LaTeX picture environment for 
including a series of thermometers inside LaTeX text.  This is intended 
especially for figure captions where the analyst wishes to provide a 
snapshot of the fraction of observations that are used in the current 
graphic.


The dotchart3 function is an improvement of dotchart2.

The rcspline.eval function, used by the rms package's rcs function, has 
better default knot placement for restricted cubic splines (natural 
splines) when there are ties in the predictor at the lowest or highest 
value (e.g., clumping at zero).


The most recent updates to Hmisc are

   * Changed n location (nloc argument) in bpplotM
   * Improved dotchart3 to better compute string widths when there is a 
mixture of expressions and regular strings for auxdata/auxtitle
   * Changed rlegend to not take logs if log axes are in effect.  Fixes 
Ecdf(..., log='x', label.curves=list(keys=1:3)).  Thanks: Bayazid Sarker 
sarkarbaya...@gmail.com

 * Extended non-panel (regular) version of plsmo to handle matrix y
   * Likewise for summaryRc
   * Added xlim to bpplotM
   * Added latexTherm function to create LaTeX picture environments to 
add a series of thermometers to LaTeX text
   * Fixed deff to handle the case where R^2 = 1.  Thanks: Matthieu 
Stigler matthieu.stig...@gmail.com

   * Added new test file for wtd.mean, wtd.quantile
   * New test aregImpute3.r for glm Poisson regression
   * Improved describe.vector to format single unique values
   * Took aware warning about var, s.e., t, p in fit.mult.impute
   * Changed wtd.loess.noiter to use loess instead of stats:::simpleLoess

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A problem about nomogram

2013-11-12 Thread Frank Harrell
You did not follow the posting guide, did you use pure ascii email, and 
used illegal characters in your source code.  This caused extra work. 
Once I cleaned up your characters and made the example self-contained, 
the labels worked fine for me.  Here's the cleaned-up code:


library(rms)
x1 - runif(20)
x2 - runif(20)
y - sample(0:1, 20, TRUE)
label(x1) - 'A'
label(x2) - 'B'

ddist - datadist(x1,x2)
options(datadist='ddist')
f - lrm(y~x1+x2)
x-nomogram(f, fun=function(x)1/(1+exp(-x)), 
fun.at=c(.001,.01,.05,seq(0,1,by=.1),.95,.99,.999),vnames=labels)


plot(x)

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Code Book from SPSS Data

2013-10-28 Thread Frank Harrell

Not certain about .por but this works with ordinary SPSS files:

require(Hmisc)
dat - spss.get(...)  # gets variable labels, etc.
contents(dat)
html(contents(dat), ...)

The last command produces a hyperlinked data dictionary, e.g., for each 
variable the number of levels is given and you click on that number to 
see the levels.  Variables having the same levels are combined in the 
latter part.


Frank

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] missclassification in estimating proportions

2013-10-02 Thread Frank Harrell
This is dichotomania and is an inappropriate use of continuous 
variables.  Use an information-preserving approach such as rank 
correlation.  Also, this is not an R question.

Frank

---
Assume that I want to compare two methods, say skinfold measurement and 
BMI, in how they classify subjects into four categories. In a 2x2 table 
I would simply calculate sensitivity and specificity and NPV an PPV, 
using the standard formulas, but can the same be directly applied in the 
4x4 table? Or are other methods preferable?


Thanks in advance

Patrick


--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interperting results of glmnet and coxph plot, Brier score and Harrel's C-Index - am I doing something wrong ???

2013-09-28 Thread Frank Harrell
), replace =  
 TRUE) ##
 random sampling with replacement
# create a dataframe for trainSet with time, status and  
 selected
 variables in binary representation for evaluation in pec
reformat_trainSet - reformat_dataSet [train,]


# glmnet.cox only with meaningful features selected by stepwise
 bidirectional AIC feature selection
glmnet.cox.meaningful.test - step(coxph(Surv(time,status) ~
 .,data=reformat_dataSet),direction=both)

selectedVarCox   -
 predict_matrix[,attr(glmnet.cox.meaningful.test$terms,term.labels)]
reformat_testSet -  
 as.data.frame(cbind(surv_obj,selectedVarCox))
reformat_testSet - reformat_dataSet [-train,]


 # compute c-index (Harrell's) for cox-glmnet models
if (is.null(glmnet.cox.meaningful)){
  cIndexCoxglmnet - c(cIndexCoxglmnet,NULL)
}else{
  cIndexCoxglmnet - c(cIndexCoxglmnet,
 1-rcorr.cens(predict(glmnet.cox.meaningful,
 reformat_testSet),Surv(reformat_testSet$time,reformat_testSet 
 $status))[1])
}
  }

  #Get average C-Index
  cIndex- mean (unlist(cIndexCoxglmnet),rm.na=TRUE)

  #create a list of all the objects generated

 assign 
 (name,c(eval(parse(text=name)),glmnet.cv=list(glmnet.cv),glmnet.obj=li
 st(glmnet.obj),

 selectedVar=list(colnames(selectedVar)),glmnet.cox=list(glmnet.cox),

 glmnet 
 .cox.meaningful=list(glmnet.cox.meaningful),ipec.coxglmnet=list(ipec.c
 oxglmnet),
cIndex=cIndex))

  # save image of the workspace after each iteration
save.image(final_subgroup_analysissubgroup_analysis.RData)


 __
 

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius, MD
 Alameda, CA, USA
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Interperting-results-of-glmnet-and-coxph-plot-Brier-score-and-Harrel-s-C-Index-am-I-doing-something--tp4677166p4677175.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regression model for predicting ranks of the dependent variable

2013-09-15 Thread Frank Harrell

require(rms)
?orm# ordinal regression model

For a case study see Handouts in 
http://biostat.mc.vanderbilt.edu/CourseBios330


Since you have lost the original values, one part of the case study will 
not apply: the use of Mean().


Frank
-
I have a dataset which has several predictor variables and a dependent 
variable, score (which is numeric). The score for each row is 
calculated using a formula which uses some of the predictor variables. 
But, the score figures are not explicitly given in the dataset. The 
scores are only arranged in ascending order, and the ranks of the 
numbers are given (like 1, 2, 3, 4, etc.; rank 1 means that the 
particular row had the highest score, 2 means it had the second highest 
score and so on). So, if the data has 100 rows, the output has ranks 
from 1 to 100.
I don't think it would be proper to treat the output column as a numeric 
one, since it is an ordinal variable, and the distance (difference in 
scores) between ranks 1 and 2 may not be the same as that between ranks 
2 and 3. However, most R regression models for ordinal regression are 
made for output such as (high, medium, low), where each level of the 
output does not necessarily correspond to a unique row. In my case, each 
output (rank) corresponds to a unique row.
So please suggest me what models I could use for this problem. Will 
treating the output as numeric instead of ordinal be a reasonable 
approximation? Or will the usual models for ordinal regression work on 
this dataset as well?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex challenge

2013-08-17 Thread Frank Harrell

Bill I found a workaround:

  f - ff(formula, lab)
  f - as.formula(gsub(`, , as.character(deparse(f

Thanks for your elegant solution.
Frank

--
Thanks Bill.  The problem is one of the results of convertName might be
'Heading(Age in Years)*age'  (this is for the tables package), and
as.name converts this to `Heading(...)*age` and the backticks cause
the final formula to have a mixture of regular elements and ` ` quoted
expression elements, making the formula invalid.
Best,
Frank
---

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex challenge

2013-08-16 Thread Frank Harrell
Thanks Bill.  The problem is one of the results of convertName might be 
'Heading(Age in Years)*age'  (this is for the tables package), and 
as.name converts this to `Heading(...)*age` and the backticks cause 
the final formula to have a mixture of regular elements and ` ` quoted 
expression elements, making the formula invalid.

Best,
Frank
---

The following makes the name converter function an argument to ff (and 
restores the colon operator to the list of formula operators), but I'm 
not sure what you need the converter to do.


ff - function(expr, convertName = function(name)paste0(toupper(name), 
z)) {
if (is.call(expr)  is.name(expr[[1]])  
is.element(as.character(expr[[1]]), c(~,+,-,*,/,%in%,(, 
:))) {

for(i in seq_along(expr)[-1]) {
expr[[i]] - Recall(expr[[i]], convertName = convertName)
}
} else if (is.name(expr)) {
expr - as.name(convertName(expr))
}
expr
}

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: [hidden email] [mailto:[hidden email]] On Behalf
 Of Frank Harrell
 Sent: Thursday, August 15, 2013 7:47 PM
 To: RHELP
 Subject: Re: [R] regex challenge

 Bill that is very impresive.  The only problem I'm having is that I want
 the paste0(toupper(...)) to be a general function that returns a
 character string that is a legal part of a formula object that can't be
 converted to a 'name'.

 Frank


 ---
 Oops, I left ( out of the list of operators.


 ff - function(expr) {
  if (is.call(expr)  is.name(expr[[1]]) 
   is.element(as.character(expr[[1]]),
 c(~,+,-,*,/,%in%,())) {
  for(i in seq_along(expr)[-1]) {
  expr[[i]] - Recall(expr[[i]])
  }
  } else if (is.name(expr)) {
  expr - as.name(paste0(toupper(as.character(expr)), z))
  }
  expr
 }

   ff(a)
 CATz + (AGEz + Heading(Females) * (sex == Female) * SBPz) *
  Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() *
  COUNTRYz * Heading() * SEXz

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com


   -Original Message-
   From: [hidden email] [mailto:[hidden email]] On Behalf
   Of William Dunlap
   Sent: Thursday, August 15, 2013 6:03 PM
   To: Frank Harrell; RHELP
   Subject: Re: [R] regex challenge
  
   Try this one
  
   ff - function (expr)
   {
   if (is.call(expr)  is.name(expr[[1]]) 
is.element(as.character(expr[[1]]),  c(~, +, -, *,
 /, :, %in%))) {
   # the above list should cover the standard formula operators.
   for (i in seq_along(expr)[-1]) {
   expr[[i]] - Recall(expr[[i]])
   }
   }
   else if (is.name(expr)) {
  # the conversion itself
   expr - as.name(paste0(toupper(as.character(expr)), z))
   }
   expr
   }
  
ff(a)
   CATz + (age + Heading(Females) * (sex == Female) * sbp) *
   Heading() * Gz + (age + sbp) * Heading() * TRIOz ~ Heading() *
   COUNTRYz * Heading() * SEXz
  
   Bill Dunlap
   Spotfire, TIBCO Software
   wdunlap tibco.com
  
  
-Original Message-
From: [hidden email] [mailto:[hidden email]] On Behalf
Of Frank Harrell
Sent: Thursday, August 15, 2013 4:45 PM
To: RHELP
Subject: Re: [R] regex challenge
   
I really appreciate the excellent ideas from Bill Dunlap and Greg
 Snow.
  Both suggestions almost work perfectly.  Greg's recognizes
 expressions
such as sex=='female' but not ones such as age  21, age  21, a 
- b 
0, and possibly other legal R expressions.  Bill's idea is 
similar to

what Duncan Murdoch suggested to me.  Bill's doesn't catch the case
 when
a variable appears both in an expression and as a regular variable
 (sex
in the example below):
   
f - function(formula) {
   trms - terms(formula)
   variables - as.list(attr(trms, variables))[-1]
   ## the 'variables' attribute is stored as a call to list(),
   ## so we changed the call to a list and removed the first 
element

   ## to get the variables themselves.
   if (attr(trms, response) == 1) {
 ## terms does not pull apart right hand side of formula,
 ## so we assume each non-function is to be renamed.
 responseVars - lapply(all.vars(variables[[1]]), as.name)
 variables - variables[-1]
   } else {
 responseVars - list()
   }
   ## omit non-name variables from list of ones to change.
   ## This is where you could expand calls to certain functions.
   variables - variables[vapply(variables, is.name, TRUE)]
   variables - c(responseVars, variables) # all are names now
   names(variables) - vapply(variables, as.character, )
   newVars - lapply(variables, function(v) 
as.name(paste0(toupper(v),

z)))
   formula(do.call(substitute, list(formula, newVars)),
env=environment(formula))
}
   
a - cat

Re: [R] regex challenge

2013-08-15 Thread Frank Harrell
I really appreciate the excellent ideas from Bill Dunlap and Greg Snow. 
 Both suggestions almost work perfectly.  Greg's recognizes expressions 
such as sex=='female' but not ones such as age  21, age  21, a - b  
0, and possibly other legal R expressions.  Bill's idea is similar to 
what Duncan Murdoch suggested to me.  Bill's doesn't catch the case when 
a variable appears both in an expression and as a regular variable (sex 
in the example below):


f - function(formula) {
  trms - terms(formula)
  variables - as.list(attr(trms, variables))[-1]
  ## the 'variables' attribute is stored as a call to list(),
  ## so we changed the call to a list and removed the first element
  ## to get the variables themselves.
  if (attr(trms, response) == 1) {
## terms does not pull apart right hand side of formula,
## so we assume each non-function is to be renamed.
responseVars - lapply(all.vars(variables[[1]]), as.name)
variables - variables[-1]
  } else {
responseVars - list()
  }
  ## omit non-name variables from list of ones to change.
  ## This is where you could expand calls to certain functions.
  variables - variables[vapply(variables, is.name, TRUE)]
  variables - c(responseVars, variables) # all are names now
  names(variables) - vapply(variables, as.character, )
  newVars - lapply(variables, function(v) as.name(paste0(toupper(v), 
z)))
  formula(do.call(substitute, list(formula, newVars)), 
env=environment(formula))

}

a - cat + (age + Heading(Females) * (sex == Female) * sbp) *
Heading() * g + (age + sbp) * Heading() * trio ~ Heading() *
country * Heading() * sex
f(a)

Output:

CATz + (AGEz + Heading(Females) * (SEXz == Female) * SBPz) *
Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() *
COUNTRYz * Heading() * SEXz

The method also doesn't work if I replace sex == 'Female' with x3  4, 
converting to X3z  4.  I'm not clear on how to code what kind of 
expressions to ignore.


Thanks!
Frank

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] regex challenge

2013-08-15 Thread Frank Harrell
Bill that is very impresive.  The only problem I'm having is that I want 
the paste0(toupper(...)) to be a general function that returns a 
character string that is a legal part of a formula object that can't be 
converted to a 'name'.


Frank


---
Oops, I left ( out of the list of operators.


ff - function(expr) {
if (is.call(expr)  is.name(expr[[1]]) 
 is.element(as.character(expr[[1]]), 
c(~,+,-,*,/,%in%,())) {

for(i in seq_along(expr)[-1]) {
expr[[i]] - Recall(expr[[i]])
}
} else if (is.name(expr)) {
expr - as.name(paste0(toupper(as.character(expr)), z))
}
expr
}

 ff(a)
CATz + (AGEz + Heading(Females) * (sex == Female) * SBPz) *
Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() *
COUNTRYz * Heading() * SEXz

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: [hidden email] [mailto:[hidden email]] On Behalf
 Of William Dunlap
 Sent: Thursday, August 15, 2013 6:03 PM
 To: Frank Harrell; RHELP
 Subject: Re: [R] regex challenge

 Try this one

 ff - function (expr)
 {
 if (is.call(expr)  is.name(expr[[1]]) 
  is.element(as.character(expr[[1]]),  c(~, +, -, *, 
/, :, %in%))) {

 # the above list should cover the standard formula operators.
 for (i in seq_along(expr)[-1]) {
 expr[[i]] - Recall(expr[[i]])
 }
 }
 else if (is.name(expr)) {
# the conversion itself
 expr - as.name(paste0(toupper(as.character(expr)), z))
 }
 expr
 }

  ff(a)
 CATz + (age + Heading(Females) * (sex == Female) * sbp) *
 Heading() * Gz + (age + sbp) * Heading() * TRIOz ~ Heading() *
 COUNTRYz * Heading() * SEXz

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com


  -Original Message-
  From: [hidden email] [mailto:[hidden email]] On Behalf
  Of Frank Harrell
  Sent: Thursday, August 15, 2013 4:45 PM
  To: RHELP
  Subject: Re: [R] regex challenge
 
  I really appreciate the excellent ideas from Bill Dunlap and Greg 
Snow.
Both suggestions almost work perfectly.  Greg's recognizes 
expressions

  such as sex=='female' but not ones such as age  21, age  21, a - b 
  0, and possibly other legal R expressions.  Bill's idea is similar to
  what Duncan Murdoch suggested to me.  Bill's doesn't catch the case 
when
  a variable appears both in an expression and as a regular variable 
(sex

  in the example below):
 
  f - function(formula) {
 trms - terms(formula)
 variables - as.list(attr(trms, variables))[-1]
 ## the 'variables' attribute is stored as a call to list(),
 ## so we changed the call to a list and removed the first element
 ## to get the variables themselves.
 if (attr(trms, response) == 1) {
   ## terms does not pull apart right hand side of formula,
   ## so we assume each non-function is to be renamed.
   responseVars - lapply(all.vars(variables[[1]]), as.name)
   variables - variables[-1]
 } else {
   responseVars - list()
 }
 ## omit non-name variables from list of ones to change.
 ## This is where you could expand calls to certain functions.
 variables - variables[vapply(variables, is.name, TRUE)]
 variables - c(responseVars, variables) # all are names now
 names(variables) - vapply(variables, as.character, )
 newVars - lapply(variables, function(v) as.name(paste0(toupper(v),
  z)))
 formula(do.call(substitute, list(formula, newVars)),
  env=environment(formula))
  }
 
  a - cat + (age + Heading(Females) * (sex == Female) * sbp) *
   Heading() * g + (age + sbp) * Heading() * trio ~ Heading() *
   country * Heading() * sex
  f(a)
 
  Output:
 
  CATz + (AGEz + Heading(Females) * (SEXz == Female) * SBPz) *
   Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() *
   COUNTRYz * Heading() * SEXz
 
  The method also doesn't work if I replace sex == 'Female' with x3  4,
  converting to X3z  4.  I'm not clear on how to code what kind of
  expressions to ignore.
 
  Thanks!
  Frank
 
  __
  [hidden email] mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

  and provide commented, minimal, self-contained, reproducible code.

 __
 [hidden email] mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.
... [show rest of quote]

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org

[R] regex challenge

2013-08-14 Thread Frank Harrell
I would like to be able to use gsub or gsubfn to process a formula and 
to translate the variables but to ignore expressions in the formula. 
Supposing that the R formula has already been transformed into a 
character string and that the transformation is to convert variable 
names to upper case and to append z to the names, an example would be to 
convert y1 + y2 ~ a*(b + c) + d + f * (h == 3) + (sex == 'male')*i to 
Y1z + Y2z ~ Az*(Bz + Cz) + Dz + Fz * (h == 3) + (sex == 'male')*Iz.  Any 
expression that is not just a simple variable name would be left alone.


Does anyone want to try their hand at creating a regex that would 
accomplish this?


Thanks
Frank
--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rms plot.Predict when type=p

2013-08-02 Thread Frank Harrell
Mike, 2 out of 3 of your requests are handled simply.  I've tidied up 
your code and show how to include extra parameters into the plot( ) call 
below.  The request to change plotting symbol color for each factor 
level is not consistent with the lattice philosophy for non-superposed 
variables (here on the x-axis) where the labels tell all.  The last 2 
lines in the code uses superpositioning for group but the col argument 
affects the color of confidence bands; I didn't look into that further.


Frank Harrell
--
require(rms)
n = 30
group = factor(sample(c('a','b','c'), n, TRUE))
x1= runif(n)
dat   = data.frame(group, x1,
   y = as.numeric(group) + 0.2*x1 + rnorm(n) )

d - datadist(dat) ; options(datadist=d)

f - ols(y ~ x1 + group, data=dat)
p - Predict(f, group)
plot(p, ~group, nlines=TRUE, type='p', ylab='fitted Y', xlab='Treatment',
 pch=4, lwd=3)
p - Predict(f, x1=seq(0,1,by=.1), group)
plot(p, ~ x1, groups='group', col=3:1)

--

Hi all,

I'm trying to change the pch, color, lty, and lwd in a type=p plot
produced by plot.Predict in the rms package.  What I'm shooting for is a
separate plotting symbol symbol color for each level of a factor (and also
to manage the line width for the CIs).

Here's what I hope is a working example

#make up some data
. . .
Mike Babyak
Department of Psychiatry and Behavioral Sciences

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Does a general latex table-making function exist?

2013-07-31 Thread Frank Harrell
Our Hmisc package summary.formula function and its latex methods can 
make some fairly advanced tables.  But the tables have to be regular. 
For example, all rows of the tables are based on the same data frame. 
I'm thinking that what is needed is a ggplot2-like set of functions for 
building a table row-by-row or row-by-block of rows.  Different row 
blocks could have different denominators, e.g., the first part of the 
table might be on everyone and a latter block of rows be for females, 
with different summary statistics computed.


Has anyone already written functions creating LaTeX markup with such 
functionality?


Thanks
Frank

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Does a general latex table-making function exist?

2013-07-31 Thread Frank Harrell

Duncan,

I had read your excellent tables package vignette at 
http://cran.r-project.org/web/packages/tables/vignettes/tables.pdf when 
it first came out.  It is extremely impressive.  I'm glad to be reminded 
to give it another look.


Is there a way to make the special symbols n and 1 refer to the number 
of non-missing observations rather than the length of a vector?


Do you feel like taking on this challenge?  An example of an irregular 
table I'm thinking of is the following


   Females  Males
 Q1 Med Q3   (n)   Q1 Med Q3   (n)
Age  25  49 63 (1016) 26  50 64  (1767)

Canadians
 Weight (kg) 57  63 74 ( 243) 67  73 90  ( 401)

Canadians could mean country=='Canada'.

Thanks!
Frank

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Jul 26, 2013; 12:34am

2013-07-26 Thread Frank Harrell

Thanks Rich and Jim and apologies for omitting the line

x - c(285, 43.75, 94, 150, 214, 375, 270, 350, 41.5, 210, 30, 37.6,
281, 101, 210)

But the fundamental problem remains that vertical spacing is not correct 
unless I waste a lot of image space at the top.


Frank


--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Can't figure out why short figure won't work

2013-07-26 Thread Frank Harrell
[Sorry I can't quote past messages as I get mail on Nabble and have been 
told I can't reply through Nabble].


Thanks Kennel for recommended a narrowing range for usr y-limits.  That 
does help quite a bit.


But I found a disappointing aspect of the graphics system: When you 
change height= on the device call you have to change the y-coordinates 
to keep the same absolute vertical positioning.


Frank

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Can't figure out why short figure won't work

2013-07-25 Thread Frank Harrell
I can't get the points and a and b to render correctly (they are 
squeezed into the axis) unless I make the height of the figure waste a 
lot of space.  I'd appreciate any ideas.  The code is below. Thanks!


png('/tmp/z.png', width=480, height=100)
par(mar=c(3,.5,1,.5))
plot.new()
par(usr=c(-10, 410, -.04, 1.04))
par(mgp=c(1.5,.5,0))
axis(1, at=seq(0, 400, by=50))
axis(1, at=seq(0, 400, by=25), tcl=-.25, labels=FALSE)
title(xlab='Mean Count')
points(x, rep(-.02, length(x)))
lines(rep(.25*1500, 2), c(-.04, .04), col='blue')
lines(rep(.196*1500, 2), c(-.04, .04), col='blue')
text(.25*1500, .07, 'a', col='blue')
text(.196*1500,.07, 'b', col='blue')
dev.off()

It works if I use height=350.

Frank


 version
   _
platform   x86_64-pc-linux-gnu
arch   x86_64
os linux-gnu
system x86_64, linux-gnu
status
major  3
minor  0.1
year   2013
month  05
day16
svn rev62743
language   R
version.string R version 3.0.1 (2013-05-16)
nickname   Good Sport

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [R-pkgs] Major Update to rms package

2013-07-11 Thread Frank Harrell
The rms (Regression Modeling Strategies) package has undergone a 
massive update.  The entire list of updates is at the bottom of this 
note.  CRAN has the update for linux and will soon have it for Windows 
and Mac - check http://cran.r-project.org/web/packages/rms/ for 
availability.  This rms update relies on a major update of the Hmisc 
package.


The most user-visible changes are:
  * Addition of a major new modeling function orm (ordinal regression 
model) that supports 5 distribution families (one being the logistic, 
for proportional odds models) and is meant for continuous Y.  Sparse 
matrix operations (helped greatly by the SparseM package) allows 
thousands of intercepts to be fitted when there are thousands of unique 
Y values.  New function generators Mean, Quantile, and ExProb 
(exceedance probability distribution) have been added.  A detailed case 
study is in http://biostat.mc.vanderbilt.edu/CourseBios330 Chapter 15. 
Ordinal regression is now a direct competitor to linear models, and far 
more robust, even allowing spikes in the distribution of Y.
   * More generality and ease of obtaining bootstrap confidence limits 
for all estimands (predictions, contrasts).  The basic bootstrap is now 
implemented and tends to work better than the percentile bootstrap in 
terms of confidence coverage.

   * Change of Surv to Srv
   * Added tk/tcl progress bars for bootstrap and other repeated 
calculations.  This can be turned off by specifying 
options(showprogress=FALSE) or options(showprogress='console') to use 
cat().  Use options(showevery=50) to update the progress bar only every 
50 iterations.
   * Made all design matrices stored in model fits exclude any 
intercepts, with columns of ones added as needed with Predict() etc.
   * Dxy and c-index are now calculated with Therneau's survival 
package survConcordance functions which is blazing fast so can be used 
routinely in all model validations that used Dxy
   * plot.summary.rms now produces cleaner output with fewer confidence 
levels

   * Several bug corrections

-

Changes in version 4.0-0 (2013-07-10)
   * Cleaned up label logic in Surv, made it work with interval2 
(thanks:Chris Andrews)
   * Fixed bug in val.prob - wrong denominator for Brier score if obs 
removed for logistic calibration
   * Fixed inconsistency in predictrms where predict() for Cox models 
used a design matrix that was centered on medians and modes rather than 
means (thanks: David van Klaveren d.vanklavere...@erasmusmc.nl)

   * Added mean absolute prediction error to Rq output
   * Made pr argument passed to predab.resample more encompassing
   * Fixed logLik method for ols
   * Made contrast.rms and summary.rms automatically compute bootstrap 
nonparametric confidence limits if fit was run through bootcov
   * Fixed bug in Predict where conf.type='simultaneous' was being 
ignored if bootstrap coefficients were present
   * For plot.Predict made default gray scale shaded confidence bands 
darker

   * For bootcov exposed eps argument to fitters and default to lower value
   * Fixed bug in plot.pentrace regarding effective.df plotting
   * Added setPb function for pop-up progress bars for simulations; 
turn off using options(showprogress=FALSE) or 
options(showprogress='console')
   * Added progress bars for predab.resample (for validate, calibrate) 
and bootcov

   * Added bootBCa function
   * Added seed to bootcov object
   * Added boot.type='bca' to Predict, contrast.rms, summary.rms
   * Improved summary.rms to use t critical values if df.residual defined
   * Added simultaneous contrasts to summary.rms
   * Fixed calculation of Brier score, g, gp in lrm.fit by handling 
special case of computing linear predictor when there are no predictors 
in the model
   * Fixed bug in prModFit preventing successful latex'ing of penalized 
lrms

   * Removed \synopsis from two Rd files
   * Added prmodsel argument to predab.resample
   * Correct Rd files to change Design to rms
   * Restricted NAMESPACE to functions expected to be called by users
   * Improved Fortran code to use better dimensions for array declarations
   * Added the basic bootstrap for confidence limits for bootBCa, 
contrast, Predict, summary

   * Fixed bug in latex.pphsm, neatened pphsm code
   * Neatened code in rms.s
   * Improved code for bootstrapping ranks of variables in anova.rms 
help file
   * Fixed bug in Function.rms - undefined Nam[[i]] if strat.  Thanks: 
douglaswilk...@yahoo.com
   * Made quantreg be loaded at end of search list in Rq so it doesn't 
override latex generic in Hmisc
   * Improved plot.summary.rms to use blue of varying transparency 
instead of polygons to show confidence intervals, and to use only three 
confidence levels by default: 0.9 0.95 0.99
   * Changed Surv to Srv; use of Surv in fitting functions will result 
in lack of time labels and assumption of Day as time unit; no longer 
override Surv in 

[R] Appropriate forum for announcing R package updates

2013-07-10 Thread Frank Harrell
I have been confused about the appropriate e-mail address to use to make 
announcements to r-help for major package update.  In the past I've 
submitted to r-packa...@lists.r-project.org without seeing the 
announcement appear on r-help.


Thanks for any guidance.
Frank

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Appropriate forum for announcing R package updates

2013-07-10 Thread Frank Harrell
Thank you Marc
Frank

Marc Schwartz-3 wrote
 On Jul 10, 2013, at 1:29 PM, Frank Harrell lt;

 f.harrell@

 gt; wrote:
 
 I have been confused about the appropriate e-mail address to use to make
 announcements to r-help for major package update.  In the past I've
 submitted to 

 R-packages@.r-project

  without seeing the announcement appear on r-help.
 
 Thanks for any guidance.
 Frank
 
 
 
 Hi Frank,
 
 R-packages (https://stat.ethz.ch/mailman/listinfo/r-packages) should be
 the correct list for those announcements.
 
 A quick check of a few recent posts shows that they are being forwarded to
 R-Help as well.
 
 It is however a low volume list. 116 posts in 2010, 77 in 2011, 72 in 2012
 and 25 so far in 2013. The trend would seem to be downward.
 
 Regards,
 
 Marc Schwartz
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Appropriate-forum-for-announcing-R-package-updates-tp4671238p4671242.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quantile Regression/(package (quantreg))

2013-06-28 Thread Frank Harrell

Mike,

Do something like:

require(rms)
dd - datadist(mydatarame); options(datadist='dd')
f - Rq(y ~ rcs(age,4)*sex, tau=.5)  # use rq function in quantreg
summary(f)  # inter-quartile-range differences in medians of y (b/c tau=.5)
plot(Predict(f, age, sex))   # show age effect on median as a continuous 
variable


For more help type ?summary.rms and ?Predict

Frank



When performing quantile regression (r package I used quantreg), the 
value of the quantile refers to the quantile value of the dependent 
variable.
Typically when trying to predict, since the information we have are the 
independent variables, I am interested in trying to estimate the 
coefficients based on the quantile values of the independent variables' 
distribution. So that I can get an understanding, for certain ranges of 
the predictor/independent variable values, the (target/dependent 
variable) has (a certain level of exposure to the 
predictors)/(coefficients).

Is there any way I can achieve that?

Just in case, if I am incorrect about my understanding on the way 
quantiles are interpreted when using the package quantreg, please let me 
know.


Thanks
Mike
--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Nomogram (rms) for model with shrunk coefficients

2013-06-24 Thread Frank Harrell
You are using an informal shrinkage method.  It is much better to use 
penalized maximum likelihood estimation, built in to lrm.


If you really want to go the informal route, compute the linear 
predictor from your final estimates and use ols( ) to predict that from 
the component variables (you'll get an R^2 of 1.0 so specify sigma= to 
ols) and use nomogram on the result.


Frank


--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] stepwise discriminant analysis using wilks lambda

2013-06-11 Thread Frank Harrell
Unless you have detailed simulations to back up the performance of this
method I would avoid it.  It violates several statistical principles.
Frank

Hari wrote
 Hello R geeks, 
 
 Waiting for an reply.
 
 Thanks,
 Hari





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/stepwise-discriminant-analysis-using-wilks-lambda-tp4668817p4669235.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rmean in survfit

2013-05-22 Thread Frank Harrell
One approach is to use the rms package's cph and Mean.cph functions. 
Mean.cph (cph calls coxph and can compute Kaplan-Meier and other survival
estimates) can compute mean restricted life.
Frank

Dinesh W wrote
 I am using survfit to generate a survival curve. My population is such
 that my x axis is in days and i have a starting population of say 10,000
 of which say only 2000 are left as of day 365. When I try to print rmean
 it does not print and in any case I am not interested in that. I actually
 want to sum up all the daily survival values starting at 1.0 (S_0) for day
 1 through 0.2 (S_365) through day 365. I then want to assume r% retained
 each day and compute the remaining integral as sum of a geometric series
 through infinity ( (S_365 * r) * 1/(1-r) ) or through a specific future
 time period (730 for example in which case the summation portion 1/ (1-r)
 would change).  Is there an easy way to do this?





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/rmean-in-survfit-tp4667668p4667708.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted regression in rms/Hmisc

2013-05-19 Thread Frank Harrell
You should be able to pass a weights argument through fit.mult.impute to ols. 
I assume that weights are with respect to variables that you, for some
reason, do not want to have as predictors.  Otherwise you are already
conditioning on the weighting factors and no weights are needed.
Frank

Mullah Abu Shadeque wrote
 Hi all, 
 
 While using  the following package and code: 
  
 library(rms)
 library(Hmisc)
 test.trans-aregImpute(
 ~ NearestWeekGestation + MaternalBMI + MomAge_Years +  
    WtGain + ethnicity , n.impute=10
 ,data=data.1)
  
  test.mod-fit.mult.impute(BirthWeight_g ~ rcs(NearestWeekGestation,4)
 +   
    rcs(MaternalBMI,4) + rcs(MomAge_Years,4)
 + rcs(WtGain,4) + ethnicity   
    ,fitter=ols,xtrans=test.trans,data=data.1)
  
  
 How
 can I specify for weighted OLS here where I have a defined weight vector
 w?
  
 I
 would be highly helped and grateful to you if you kindly reply to my mail.
  
 With
 Regards,
 SADIK  
   [[alternative HTML version deleted]]
 
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Weighted-regression-in-rms-Hmisc-tp4667437p4667446.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to do an external validation with R

2013-05-18 Thread Frank Harrell
Splitting one dataset into training and test is still internal validation,
and it requires an enormous sample size (around 20,000 typically) in order
to be competitive with the bootstrap.

Note that the Design package has been replaced with rms, and rms has two
functions for external validation (val.prob and val.surv, should you have
had external data unlike your case) and many more for internal validation.
Frank

beginner wrote
 I would like to do external validation using R software. So far I have
 used packages like Design and DAAG. However they perform internal
 validation rather than external one. In order to perform external
 validation I would have to split my data beforehand into training and test
 set, leave the test set on the site and use only the training set to
 select a model. I would then test the selected model with the sample set
 left initially on the site. I would like to repeat this process several
 times to make sure that all the samples are included at least once in a
 test set. I thought that I need to use a loop function in R to perform
 this process automatically. As I am new to R I don't know how to make a
 loop. Could you please help me with this or suggest an R package ? I would
 be very very grateful for help with this task !





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-do-an-external-validation-with-R-tp4667404p4667426.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Tiff plot Resolution issues

2013-05-01 Thread Frank Harrell
Why do you need TIFF?  I've never seen a science journal that claims they
want TIFF figure submissions who are really serious about that.  This is a
very wasteful format.  Most journals want PDF.
Frank

Aldo wrote
 I am trying to create a high resolution tiff. It is not working.
 
 I am on Windows XP 32-bit
 R 3.0.0
 
 Code input:
 tiff(file=test.tiff,width=6.83,height=6.83,units=in,  res=1200)
 
 Return Message:
 Error in tiff(file = test.tiff, width = 6.83, height = 6.83, units =
 in,  :
   unable to start tiff() device
 In addition: Warning messages:
 1: In tiff(file = test.tiff, width = 6.83, height = 6.83, units = in, 
 :
   unable to allocate bitmap
 2: In tiff(file = test.tiff, width = 6.83, height = 6.83, units = in, 
 :
   opening device failed
 
 Also
 capabilities(tiff) returns TRUE
 and
 It must be a tiff, that is what the journal requires.
 Thank You
 
 -- 
 
 Michael V. Clawson
 
 Predoctoral Research Associate
 
 School of Environmental and Forest Sciences,
 
 University of Washington
 
   [[alternative HTML version deleted]]
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Tiff-plot-Resolution-issues-tp4665945p4665948.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert continuous variable into discrete variable

2013-04-29 Thread Frank Harrell
It is important to check for lack of fit of the categorized variable.  One
way to do this is to test for the additional predictive ability of the
original continuous variable after adjusting for its categorized version. 
It is very uncommon for a categorized continuous variable to fit well,
because its assumed discontinuities seldom exist in nature and most
relationships are not piecewise flat.
Frank

levanovd wrote
 Or even simpler (no need to specify labels):
 
 x-runif(100,0,100) 
 u - cut(x, breaks = c(0, 3, 4.5, 6, 8, Inf), labels = FALSE)





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Convert-continuous-variable-into-discrete-variable-tp3671032p4665699.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stepwise regression for multivariate case in R?

2013-04-26 Thread Frank Harrell
Since stepwise methods do not work as advertised in the univariate case I'm
wondering why they should work in the multivariate case.
Frank

Jonathan Jansson wrote
 Hi! I am trying to make a stepwise regression in the multivariate case,
 using Wilks' Lambda test.
 I've tried this: 
  greedy.wilks(cbind(Y1,Y2) ~ . , data=my.data )
 But it only returns:
 Error in model.frame.default(formula = X[, j] ~ grouping,
 drop.unused.levels = TRUE) : 
   variable lengths differ (found for 'grouping') 
 What can be wrong here? I have checked and all variables in my.data is of
 the same length.
 //Jonathan
 
   [[alternative HTML version deleted]]
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Stepwise-regression-for-multivariate-case-in-R-tp4665505p4665526.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hosmer Lemeshow test

2013-04-23 Thread Frank Harrell
The H-L test is now considered to be obsolete by many.  One replacement is
the Hosmer - le Cessie test as implemented in the rms package residuals.lrm
function.  This is a one degree of freedom test.

One problem with H-L is its use of arbitrary binning and suboptimal power. 
The new test requires no binning at all.  Probably better still are directed
tests such as tests of linearity and additivity.
Frank

David Carlson wrote
 The warning is commented out or it would have been printed:
#   warning(Some expected counts are less than 5. Use smaller number
 of
 groups)
 
 For 23 data points, the default of 10 bins is too many since one of the
 bins
 is 0. Eight bins gives the warning, but prints results. You didn't
 indicate
 how many bins SPSS used. 
 
 -
 David L Carlson
 Associate Professor of Anthropology
 Texas AM University
 College Station, TX 77840-4352
 
 -Original Message-
 From: 

 r-help-bounces@

  [mailto:

 r-help-bounces@

 ] On
 Behalf Of Endy BlackEndy
 Sent: Tuesday, April 23, 2013 10:31 AM
 To: 

 r-help@

 Subject: [R] Hosmer Lemeshow test
 
 Hi to everybody. I use the following routine (i found it in the internet)
 to
 compute the Hosmer-Lemeshow test in the framework of logistic regression.
 
 hosmerlemeshow = function(obj, g=10) {
 # first, check to see if we fed in the right kind of object
stopifnot(family(obj)$family==binomial  family(obj)$link==logit)
y = obj$model[[1]]
# the double bracket (above) gets the index of items within an object
if (is.factor(y))
   y = as.numeric(y)==2
yhat = obj$fitted.values
cutyhat=cut(yhat,quantile(yhat,0:g/g),include.lowest=TRUE)
 
obs = xtabs(cbind(1 - y, y) ~ cutyhat)
expect = xtabs(cbind(1 - yhat, yhat) ~ cutyhat)
if (any(expect  5))
#   warning(Some expected counts are less than 5. Use smaller number
 of
 groups)
chisq = sum((obs - expect)^2/expect)
P = 1 - pchisq(chisq, g - 2)
cat(H-L2 statistic,round(chisq,4), and its p value,P,\n)
# by returning an object of class htest, the function will perform
 like
 the
# built-in hypothesis tests
return(structure(list(
   method = c(paste(Hosmer and Lemeshow goodness-of-fit test with, g,
 bins, sep= )),
   data.name = deparse(substitute(obj)),
   statistic = c(X2=chisq),
   parameter = c(df=g-2),
   p.value = P
), class='htest'))
return(list(chisq=chisq,p.value=P))
 }
 
 However when i run it with the data set below i get the value NaN. Using
 the
 same data set with SPSS i get a specific value.
 
 FlightNo Temp ThermalDisast
 1 66 0
 2 70 1
 3 69 0
 4 68 0
 5 67 0
 6 72 0
 7 73 0
 8 70 0
 9 57 1
 10 63 1
 11 70 1
 12 78 0
 13 67 0
 14 53 1
 15 67 0
 16 75 0
 17 70 0
 18 81 0
 19 76 0
 20 79 0
 21 75 1
 22 76 0
 23 58 1
 
 Any suggestions will greatly appreciated.
 With regards
 Endy
 
   [[alternative HTML version deleted]]
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Hosmer-Lemeshow-test-tp4665115p4665154.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] NAMESPACE and imports

2013-04-19 Thread Frank Harrell
I am cleaning up the rms package to not export functions not to be called
directly by users.  rms uses generic functions defined in other packages. 
For example there is a latex method in the Hmisc package, and rms has a
latex method for objects of class anova.rms so there are anova.rms and
latex.anova.rms functions in rms.  I use:

export(asis,bj,bjplot,bootBCa,bootcov,bootplot,bplot,calibrate,cph,catg,combineRelatedPredictors,confplot,contrast,coxphFit,cph,cr.setup,datadist,effective.df,fastbw,formatNP,gendata,gIndex,GiniMd,Glm,Gls,groupkm,Hazard,hazard.ratio.plot,histdensity,%ia%,ie.setup,interactions.containing,legend.nomabbrev,lm.pfit,lrm,lrtest,lsp,matinv,matrx,Newlabels,Newlevels,nomogram,num.intercepts,ols,ols.influence,oos.loglik,pantext,Penalty.matrix,Penalty.setup,pentrace,perimeter,perlcode,plot.xmean.ordinaly,pol,pphsm,predab.resample,Predict,psm,rcs,related.predictors,reVector,robcov,Rq,sascode,scored,sensuc,setPb,show.influence,specs,strat,Surv,[.Surv,survdiffplot,survest,Survival,survplot,univarLR,validate,val.prob,val.probg,val.surv,vif,which.influence)

importFrom(Hmisc)
S3method(anova, rms)
S3method(latex, anova.rms, bj, cph, Glm, Gls, lrm, naprint.delete, ols,
pphsm, psm, rms, Rq, summary.rms, validate)

When doing R CMD INSTALL I get: [using R 2.15.3]

** preparing package for lazy loading
Error : c(bad 'S3method' directive: S3method(latex, anova.rms, bj, cph,
Glm, Gls, lrm, naprint.delete, , bad 'S3method' directive: ols, pphsm,
psm, rms, Rq, summary.rms, validate))
ERROR: lazy loading failed for package ‘rms’

Any advice appreciated.
Frank




-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/NAMESPACE-and-imports-tp4664718.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NAMESPACE and imports

2013-04-19 Thread Frank Harrell
Right, I should have said import(Hmisc) instead of importFrom(Hmisc), but
that does not explain the error message.

Blaser Nello wrote
 Not sure this fixes your problem, but as far as I can know (and can tell
 from the manual:
 http://cran.r-project.org/doc/manuals/r-release/R-exts.pdf), importFrom
 needs to know what functions you are importing [e.g. importFrom(Hmisc,
 latex) importFrom(stats, anova)]. 
 
 
 -Original Message-
 From: 

 r-help-bounces@

  [mailto:

 r-help-bounces@

 ] On Behalf Of Frank Harrell
 Sent: Freitag, 19. April 2013 16:26
 To: 

 r-help@

 Subject: [R] NAMESPACE and imports
 
 I am cleaning up the rms package to not export functions not to be called
 directly by users.  rms uses generic functions defined in other packages. 
 For example there is a latex method in the Hmisc package, and rms has a
 latex method for objects of class anova.rms so there are anova.rms and
 latex.anova.rms functions in rms.  I use:
 
 export(asis,bj,bjplot,bootBCa,bootcov,bootplot,bplot,calibrate,cph,catg,combineRelatedPredictors,confplot,contrast,coxphFit,cph,cr.setup,datadist,effective.df,fastbw,formatNP,gendata,gIndex,GiniMd,Glm,Gls,groupkm,Hazard,hazard.ratio.plot,histdensity,%ia%,ie.setup,interactions.containing,legend.nomabbrev,lm.pfit,lrm,lrtest,lsp,matinv,matrx,Newlabels,Newlevels,nomogram,num.intercepts,ols,ols.influence,oos.loglik,pantext,Penalty.matrix,Penalty.setup,pentrace,perimeter,perlcode,plot.xmean.ordinaly,pol,pphsm,predab.resample,Predict,psm,rcs,related.predictors,reVector,robcov,Rq,sascode,scored,sensuc,setPb,show.influence,specs,strat,Surv,[.Surv,survdiffplot,survest,Survival,survplot,univarLR,validate,val.prob,val.probg,val.surv,vif,which.influence)
 
 importFrom(Hmisc)
 S3method(anova, rms)
 S3method(latex, anova.rms, bj, cph, Glm, Gls, lrm, naprint.delete, ols,
 pphsm, psm, rms, Rq, summary.rms, validate)
 
 When doing R CMD INSTALL I get: [using R 2.15.3]
 
 ** preparing package for lazy loading
 Error : c(bad 'S3method' directive: S3method(latex, anova.rms, bj, cph,
 Glm, Gls, lrm, naprint.delete, , bad 'S3method' directive: ols,
 pphsm,
 psm, rms, Rq, summary.rms, validate))
 ERROR: lazy loading failed for package ‘rms’
 
 Any advice appreciated.
 Frank
 
 
 
 
 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 --
 View this message in context:
 http://r.789695.n4.nabble.com/NAMESPACE-and-imports-tp4664718.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/NAMESPACE-and-imports-tp4664718p4664728.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NAMESPACE and imports

2013-04-19 Thread Frank Harrell
Now I see it.  S3method() wants two arguments so I need to create multiple
S3method() statements for each generic.
Frank

Frank Harrell wrote
 Right, I should have said import(Hmisc) instead of importFrom(Hmisc), but
 that does not explain the error message.
 Blaser Nello wrote
 Not sure this fixes your problem, but as far as I can know (and can tell
 from the manual:
 http://cran.r-project.org/doc/manuals/r-release/R-exts.pdf), importFrom
 needs to know what functions you are importing [e.g. importFrom(Hmisc,
 latex) importFrom(stats, anova)]. 
 
 
 -Original Message-
 From: 

 r-help-bounces@

  [mailto:

 r-help-bounces@

 ] On Behalf Of Frank Harrell
 Sent: Freitag, 19. April 2013 16:26
 To: 

 r-help@

 Subject: [R] NAMESPACE and imports
 
 I am cleaning up the rms package to not export functions not to be called
 directly by users.  rms uses generic functions defined in other packages. 
 For example there is a latex method in the Hmisc package, and rms has a
 latex method for objects of class anova.rms so there are anova.rms and
 latex.anova.rms functions in rms.  I use:
 
 export(asis,bj,bjplot,bootBCa,bootcov,bootplot,bplot,calibrate,cph,catg,combineRelatedPredictors,confplot,contrast,coxphFit,cph,cr.setup,datadist,effective.df,fastbw,formatNP,gendata,gIndex,GiniMd,Glm,Gls,groupkm,Hazard,hazard.ratio.plot,histdensity,%ia%,ie.setup,interactions.containing,legend.nomabbrev,lm.pfit,lrm,lrtest,lsp,matinv,matrx,Newlabels,Newlevels,nomogram,num.intercepts,ols,ols.influence,oos.loglik,pantext,Penalty.matrix,Penalty.setup,pentrace,perimeter,perlcode,plot.xmean.ordinaly,pol,pphsm,predab.resample,Predict,psm,rcs,related.predictors,reVector,robcov,Rq,sascode,scored,sensuc,setPb,show.influence,specs,strat,Surv,[.Surv,survdiffplot,survest,Survival,survplot,univarLR,validate,val.prob,val.probg,val.surv,vif,which.influence)
 
 importFrom(Hmisc)
 S3method(anova, rms)
 S3method(latex, anova.rms, bj, cph, Glm, Gls, lrm, naprint.delete, ols,
 pphsm, psm, rms, Rq, summary.rms, validate)
 
 When doing R CMD INSTALL I get: [using R 2.15.3]
 
 ** preparing package for lazy loading
 Error : c(bad 'S3method' directive: S3method(latex, anova.rms, bj, cph,
 Glm, Gls, lrm, naprint.delete, , bad 'S3method' directive: ols,
 pphsm,
 psm, rms, Rq, summary.rms, validate))
 ERROR: lazy loading failed for package ‘rms’
 
 Any advice appreciated.
 Frank
 
 
 
 
 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 --
 View this message in context:
 http://r.789695.n4.nabble.com/NAMESPACE-and-imports-tp4664718.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/NAMESPACE-and-imports-tp4664718p4664744.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the joy of spreadsheets (off-topic)

2013-04-16 Thread Frank Harrell
What a terrific article.  Thanks for sharing!  The more we critically 
examine how research is actually done the more frightened we become.


Frank

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Composite Quantile Regression

2013-04-05 Thread Frank Harrell
Does anyone know of R functions for doing composite quantile regression (Hou
and Yuan Ann Stat 36:1108, 2008)?  The paper's authors do not talk about
software in their paper or on their web sites.
Thanks
Frank




-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Composite-Quantile-Regression-tp4663463.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Model Selection based on individual t-values with the largest possible number of variables in regression

2013-04-03 Thread Frank Harrell
To say that these strategies represent bad statistical practice is to put it
mildly.
Frank

mister_O wrote
 Dear R-Community,
 
 When writing my master thesis, I faced with difficult issue. Analyzing the
 capital structure determinants I have one dependent variable (Total debt
 ratio = TD) and 15 independent ones. At the first stage  I normalized my
 data by deleting outliers from each variable (Pairwise deletion) and in
 the result I got every variable to be  with different length. Now when
 selecting relevant variables for the best model, neither stepwise nor
 forward or backward procedures don't work perfectly since there are a
 number of other combinations of variables wich have also high t-values.
 Thus, wichever model I pick, you never know whether this model is
 trustworthy. I tried to calculate all possible combinations of independent
 variables, but since I have 15 ones, there are thousands of such
 combinations and R simply refuses to calculate them! (computer crashes) I
 wonder if you can help me to write the code in R in order to find the
 model wich include as many variables as it possible with significant
 t-values? 
 
 cheers, Oleg





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Model-Selection-based-on-individual-t-values-with-the-largest-possible-number-of-variables-in-regresn-tp4663174p4663202.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Re-order variables listed in nomogram?

2013-03-21 Thread Frank Harrell
nomogram does not have an option for specifying a new order for the
variables.  With some exceptions you should be able to reorder the variables
in the model formula to achieve the desired result.
Frank

Eleni Rapsomaniki-3 wrote
 Hi,
 I am using the function nomogram in the rms package for survival
 analysis.
 How is the order in which variables determined and how can I change it? I
 use it with a cph() model.
 
 Many Thanks
 Eleni Rapsomaniki
 
 Clinical Epidemiology Group
 Department of Epidemiology and Public Health 
 University College London
 
 
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Re-order-variables-listed-in-nomogram-tp4662070p4662139.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Export R generated tables and figures to MS Word

2013-03-13 Thread Frank Harrell
The best rendering of advanced tables is done by converting from pdf to Word. 
See http://biostat.mc.vanderbilt.edu/SweaveConvert

Frank

Robert Baer wrote
 R2wd (http://cran.r-project.org/web/packages/R2wd/R2wd.pdf 
 lt;http://cran.r-project.org/web/packages/R2wd/R2wd.pdfgt;) might do
 what 
 you want.
 
 Rob
 
 On 3/12/2013 7:02 PM, Santosh wrote:
 Dear Rxperts,
 I am aware of Sweave that generates reports into a pdf, but do know of
 any
 tools to generate to export to a MS Word document...

 Is there  a way to use R to generate and export report/publication
 quality
 tables and figures and export them to MS word (for reporting purposes)?

 Thanks so much,
 Santosh

  [[alternative HTML version deleted]]

 __
 

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 -- 
 
 Robert W. Baer, Ph.D.
 Professor of Physiology
 Kirksille College of Osteopathic Medicine
 A. T. Still University of Health Sciences
 Kirksville, MO 63501 USA
 
 
   [[alternative HTML version deleted]]
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Export-R-generated-tables-and-figures-to-MS-Word-tp4661132p4661180.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Accuracy some classifiers

2013-03-13 Thread Frank Harrell
Which accuracy score are you using?  If it is proportion correctly
classified, that is a misleading improper scoring rule.
Frank

rkok wrote
 I am using machine learning for one researching.  I am using some
 classifiers with 5-fold CV . I would like to know how it is possible to
 extract the accuracy, for example, for KNN,neural networks and J48,  for
 each one of 5-fold because when I apply CV to my classifier, I obtain the
 mean accuracy of 5-fold  but each accuracy/error of each fold is not
 returned.
 
 Any help is welcome and grateful.  Thanks in advance!
 
 Regards!!





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Accuracy-some-classifiers-tp4661109p4661181.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Bootstrap BCa confidence limits with your own resamples

2013-03-12 Thread Frank Harrell
I like to bootstrap regression models, saving the entire set of bootstrapped
regression coefficients for later use so that I can get confidence limits
for a whole set of contrasts derived from the coefficients.  I'm finding
that ordinary bootstrap percentile confidence limits can provide poor
coverage for odds ratios for binary logistic models with small N.  So I'm
exploring BCa confidence intervals.

Because the matrix of bootstrapped regression coefficients is generated
outside of the boot packages' boot() function, I need to see if it is
possible to compute BCa intervals using boot's boot.ci() function after
constructing a suitable boot object.  The function below almost works, but
it seems to return bootstrap percentile confidence limits for BCa limits. 
The failure is probably due to my being new to BCa limits and not
understanding the inner workings.   Has anyone successfully done something
similar to this?

Thanks very much,
Frank

bootBCa - function(estimate, estimates, n, conf.int=0.95) {
  require(boot)
  if(!exists('.Random.seed')) runif(1)
  w - list(sim= 'ordinary',
stype = 'i',
t0 = estimate,
t  = as.matrix(estimates),
R  = length(estimates),
data= 1:n,
strata  = rep(1, n),
weights = rep(1/n, n),
seed = .Random.seed,
statistic = function(...) 1e10,
call = list('rigged', weights='junk'))
  np - boot.ci(w, type='perc', conf=conf.int)$percent
  m  - length(np)
  np - np[c(m-1, m)]
  bca - tryCatch(boot.ci(w, type='bca', conf=conf.int),
  error=function(...) list(fail=TRUE))
  if(length(bca$fail)  bca$fail) {
bca - NULL
warning('could not obtain BCa bootstrap confidence interval')
  } else {
bca - bca$bca
m - length(bca)
bca - bca[c(m-1, m)]
  }
  list(np=np, bca=bca)
}




-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Bootstrap-BCa-confidence-limits-with-your-own-resamples-tp4661045.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Bootstrap BCa confidence limits with your own resamples

2013-03-12 Thread Frank Harrell
That's very helpful John.  Did you happen to run a test to make sure that
boot.ci(..., type='bca') in fact gives the BCa intervals or that they at
least disagree with percentile intervals?

Frank


John Fox wrote
 Dear Frank,
 
 I'm not sure that it will help, but you might look at the bootSem()
 function
 in the sem package, which creates objects that inherit from boot. Here's
 an artificial example:
 
 -- snip --
 
 library(sem)
 for (x in names(CNES)) CNES[[x]] - as.numeric(CNES[[x]])
 model.cnes - specifyModel()
 F - MBSA2, lam1
 F - MBSA7, lam2
 F - MBSA8, lam3
 F - MBSA9, lam4
 F - F, NA, 1
 MBSA2 - MBSA2, the1
 MBSA7 - MBSA7, the2
 MBSA8 - MBSA8, the3
 MBSA9 - MBSA9, the4
 
 sem.cnes - sem(model.cnes, data=CNES)
 summary(sem.cnes)
 
 set.seed(12345) # for reproducibility
 system.time(boot.cnes - bootSem(sem.cnes, R=5000))
 class(boot.cnes)
 boot.ci(boot.cnes)
 
 -- snip --
 
 I hope this helps,
  John
 
 -Original Message-
 From: 

 r-help-bounces@

  [mailto:r-help-bounces@r-
 project.org] On Behalf Of Frank Harrell
 Sent: Tuesday, March 12, 2013 11:27 AM
 To: 

 r-help@

 Subject: [R] Bootstrap BCa confidence limits with your own resamples
 
 I like to bootstrap regression models, saving the entire set of
 bootstrapped
 regression coefficients for later use so that I can get confidence
 limits
 for a whole set of contrasts derived from the coefficients.  I'm
 finding
 that ordinary bootstrap percentile confidence limits can provide poor
 coverage for odds ratios for binary logistic models with small N.  So
 I'm
 exploring BCa confidence intervals.
 
 Because the matrix of bootstrapped regression coefficients is generated
 outside of the boot packages' boot() function, I need to see if it is
 possible to compute BCa intervals using boot's boot.ci() function after
 constructing a suitable boot object.  The function below almost works,
 but
 it seems to return bootstrap percentile confidence limits for BCa
 limits.
 The failure is probably due to my being new to BCa limits and not
 understanding the inner workings.   Has anyone successfully done
 something
 similar to this?
 
 Thanks very much,
 Frank
 
 bootBCa - function(estimate, estimates, n, conf.int=0.95) {
   require(boot)
   if(!exists('.Random.seed')) runif(1)
   w - list(sim= 'ordinary',
 stype = 'i',
 t0 = estimate,
 t  = as.matrix(estimates),
 R  = length(estimates),
 data= 1:n,
 strata  = rep(1, n),
 weights = rep(1/n, n),
 seed = .Random.seed,
 statistic = function(...) 1e10,
 call = list('rigged', weights='junk'))
   np - boot.ci(w, type='perc', conf=conf.int)$percent
   m  - length(np)
   np - np[c(m-1, m)]
   bca - tryCatch(boot.ci(w, type='bca', conf=conf.int),
   error=function(...) list(fail=TRUE))
   if(length(bca$fail)  bca$fail) {
 bca - NULL
 warning('could not obtain BCa bootstrap confidence interval')
   } else {
 bca - bca$bca
 m - length(bca)
 bca - bca[c(m-1, m)]
   }
   list(np=np, bca=bca)
 }
 
 
 
 
 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 --
 View this message in context: http://r.789695.n4.nabble.com/Bootstrap-
 BCa-confidence-limits-with-your-own-resamples-tp4661045.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Bootstrap-BCa-confidence-limits-with-your-own-resamples-tp4661045p4661077.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SAS and R complement each other

2013-03-03 Thread Frank Harrell
I'm not sure why you posted the original note.  I quit using SAS in 1991 and
haven't needed it yet.
Frank

RogerJDeAngelis wrote
 Sorry about the double post. But I keep getting 'post' rejections, so I
 resubmitted about an hour later.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/SAS-and-R-complement-each-other-tp4660157p4660190.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple left hand side variables in a formula

2013-03-02 Thread Frank Harrell
Achim this is perfect.  I had not seen Formula before.  Thanks for writing
it!
Frank

Achim Zeileis-4 wrote
 On Fri, 1 Mar 2013, Frank Harrell wrote:
 
 Thank you Bill.  A temporary re-arrangement of the formula will allow me
 to
 do the usual subset= na.action= processing afterwards.  Nice idea.  I
 don't
 need the dot notation very often for this application.
 
 That's what the Formula package provides. It allows for multiple parts 
 and multiple responses on both side of the ~. Internally, the formula is 
 decomposed and separate terms are produced. And using an auxiliary formula 
 it is assured that a single model frame (with unified NA processing). And 
 all of this is hidden from the user by providing methods that are as 
 standard as possible, see:
 
 vignette(Formula, package = Formula)
 
 hth,
 Z
 
 Frank

 William Dunlap wrote
 I don't know how much of the information that model.frame supplies you
 need,
 but you could make a data.frame containing all the variables on both
 sides
 of them
 formula by changing lhs~rhs into ~lhs+rsh before calling model.frame.
 E.g.,

 f - function (formula)  {
 if (length(formula) == 3) { # has left hand side
 envir - environment(formula)
 formula - formula(call(~, call(+, formula[[2]],
 formula[[3]])))
 environment(formula) - envir
 }
 formula
 }

 This doesn't quite take care of the wild-card dot in the formula:
 straight
 variables are omitted from dot's expansion but functions of variables
 are
 not:
 colnames(model.frame(f(log(mpg)+hp ~ .), data=mtcars))
  [1] log(mpg) hp   mpg  cyl
  [5] disp drat wt   qsec
  [9] vs   am   gear carb

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com


 -Original Message-
 From:

 r-help-bounces@

  [mailto:

 r-help-bounces@

 ] On Behalf
 Of Frank Harrell
 Sent: Friday, March 01, 2013 4:17 PM
 To:

 r-help@

 Subject: [R] Multiple left hand side variables in a formula

 The lattice package uses special logic to allow for multiple
 left-hand-side
 variables in a formula, e.g. y1 + y2 ~ x.  Is there an elegant way to
 do
 this outside of lattice?  I'm trying to implement a data summarization
 function that logically takes multiple dependent variables.  The usual
 invocation of model.frame( ) causes R to try to do arithmetic addition
 to
 create a single dependent variable.

 Thanks
 Frank



 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 --
 View this message in context:
 http://r.789695.n4.nabble.com/Multiple-left-hand-side-
 variables-in-a-formula-tp4660060.html
 Sent from the R help mailing list archive at Nabble.com.

 __


 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 --
 View this message in context:
 http://r.789695.n4.nabble.com/Multiple-left-hand-side-variables-in-a-formula-tp4660060p4660065.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Multiple-left-hand-side-variables-in-a-formula-tp4660060p4660080.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple left hand side variables in a formula

2013-03-02 Thread Frank Harrell
Hi Gabor,

This is not for a regression function but for a major update I'm working on
for the summary.formula function in the Hmisc package.  So I need to handle
several data types in the formula.

Thanks
Frank

Gabor Grothendieck wrote
 Gabor Grothendieck wrote
 On Fri, Mar 1, 2013 at 7:16 PM, Frank Harrell lt;

 f.harrell@

 gt; wrote:
 The lattice package uses special logic to allow for multiple
 left-hand-side
 variables in a formula, e.g. y1 + y2 ~ x.  Is there an elegant way to
 do
 this outside of lattice?  I'm trying to implement a data summarization
 function that logically takes multiple dependent variables.  The usual
 invocation of model.frame( ) causes R to try to do arithmetic addition
 to
 create a single dependent variable.


 Try:

 lm( cbind(Sepal.Length, Sepal.Width) ~., iris)

 
 On Fri, Mar 1, 2013 at 8:02 PM, Frank Harrell lt;

 f.harrell@

 gt; wrote:
 Thanks for your reply Gabor.  That doesn't handle a mixture of factor and
 numeric variables on the left hand side.
 Frank

 
 It can handle 2 level factors
 
lm(cbind(Sepal.Length, setosa = Species == setosa) ~ ., iris)
 
 and more with some manual effort:
 
lm(cbind(virginica = Species == virginica, setosa = Species ==
 setosa) ~ ., iris)
 
 Typically you don't see more than that as a dependent variable.  Do
 you actually need more?
 
 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Multiple-left-hand-side-variables-in-a-formula-tp4660060p4660081.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Expressions in lattice conditional variables

2013-03-02 Thread Frank Harrell
I would like to have a lattice conditioning ( | var ) variable have
expression() as values because I want panel labels to be able to use
plotmath notation for subscripts, etc.   lattice barks at this.  Does anyone
know of a trick workaround?  An attempted example program is below.  Thanks
-Frank

require(lattice)
set.seed(1)
var - c(rep('A', 100), rep('B', 100))
trt - sample(c('T1','T2'), 200, TRUE)
x - c(runif(100), 10*runif(100))
y - x + c(runif(100)/10, runif(100))
N - tapply(x, llist(var, trt), function(x) sum(!is.na(x)))
print(N)

vn - vector('expression', length(var))
for(v in unique(var)) {
  i - var == v
  n - tapply(!is.na(x[i]), trt[i], sum)
  nam - names(n)
  w - sprintf('paste(%s, (, n[%s]==%g,~~n[%s]==%g,))',
   v, nam[1], n[1], nam[2], n[2])
  cat(w, '\n')
  vn[var == v] - parse(text=w)
  n - sprintf('%s  (n%s=%g, n%s=%g)', v, nam[1],n[1], nam[2],n[2])
  vn[var == v] - n
}
trt - factor(trt)

xyplot(as.integer(trt) ~ x | vn, panel=panel.bpplot, ylim=c(0,3),
   scale=list(y=list(at=1:2, labels=levels(trt)),
 x=list(relation='free', limits=list(c(0,1),c(0,13,
   ylab='Treatment', layout=c(1,2))

Error in unique.default(x) : 
  unimplemented type 'expression' in 'HashTableSetup'




-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Expressions-in-lattice-conditional-variables-tp4660089.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Expressions in lattice conditional variables

2013-03-02 Thread Frank Harrell
Whoops - these 2 lines should have been omitted from the program:

n - sprintf('%s  (n%s=%g, n%s=%g)', v, nam[1],n[1], nam[2],n[2]) 
 vn[var == v] - n 


Frank Harrell wrote
 I would like to have a lattice conditioning ( | var ) variable have
 expression() as values because I want panel labels to be able to use
 plotmath notation for subscripts, etc.   lattice barks at this.  Does
 anyone know of a trick workaround?  An attempted example program is below. 
 Thanks -Frank
 
 require(lattice)
 set.seed(1)
 var - c(rep('A', 100), rep('B', 100))
 trt - sample(c('T1','T2'), 200, TRUE)
 x - c(runif(100), 10*runif(100))
 y - x + c(runif(100)/10, runif(100))
 N - tapply(x, llist(var, trt), function(x) sum(!is.na(x)))
 print(N)
 
 vn - vector('expression', length(var))
 for(v in unique(var)) {
   i - var == v
   n - tapply(!is.na(x[i]), trt[i], sum)
   nam - names(n)
   w - sprintf('paste(%s, (, n[%s]==%g,~~n[%s]==%g,))',
v, nam[1], n[1], nam[2], n[2])
   cat(w, '\n')
   vn[var == v] - parse(text=w)
   n - sprintf('%s  (n%s=%g, n%s=%g)', v, nam[1],n[1], nam[2],n[2])
   vn[var == v] - n
 }
 trt - factor(trt)
 
 xyplot(as.integer(trt) ~ x | vn, panel=panel.bpplot, ylim=c(0,3),
scale=list(y=list(at=1:2, labels=levels(trt)),
  x=list(relation='free', limits=list(c(0,1),c(0,13,
ylab='Treatment', layout=c(1,2))
 
 Error in unique.default(x) : 
   unimplemented type 'expression' in 'HashTableSetup'





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Expressions-in-lattice-conditional-variables-tp4660089p4660090.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] xYplot help

2013-03-02 Thread Frank Harrell
xYplot has many options that are passed to panel.xYplot.  Did you read the
documentation?  You can suppress labels, use an automatically generated
Key() function to control where you want keys, and use several other
options.  Start with label.curve=FALSE and go from there.
Frank

bwr87 wrote
 I'm trying to work the lattice xYplot function to plot confidence
 intervals
 for two groups on the same plot.  The problem is that it automatically
 labels the groups (1 and 2) on the plot itself, and the labels get
 obscured
 by the CI lines and make it look bad.
 
 This is my code:
 
 xYplot(Cbind(Mean,L95,U95) ~ jitter(Time), type='b', data=A,main=Bouts
 Vs.
 Time,col=c('black', 'gray'), groups=Group,title=Bouts,xlab=Time,
 ylab=Bouts, method=bars, lwd=3, ylim=c(0,30))
 
 
 A is the data, with columns Time, Group, Mean, L95, U95.  L95 and U95 are
 the lower and upper confidence interval estimates. Group is either 1 or 2,
 each measured at the same number of time points.
 
 The goal is simply to take the labels off of the plot itself.  Then I can
 use an auto.key to put the labels on the side.
 
 Please help!
 
 -Ben
 
   [[alternative HTML version deleted]]
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/xYplot-help-tp4660102p4660108.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multiple left hand side variables in a formula

2013-03-01 Thread Frank Harrell
The lattice package uses special logic to allow for multiple left-hand-side
variables in a formula, e.g. y1 + y2 ~ x.  Is there an elegant way to do
this outside of lattice?  I'm trying to implement a data summarization
function that logically takes multiple dependent variables.  The usual
invocation of model.frame( ) causes R to try to do arithmetic addition to
create a single dependent variable.

Thanks
Frank



-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Multiple-left-hand-side-variables-in-a-formula-tp4660060.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple left hand side variables in a formula

2013-03-01 Thread Frank Harrell
Thanks for your reply Gabor.  That doesn't handle a mixture of factor and
numeric variables on the left hand side.
Frank

Gabor Grothendieck wrote
 On Fri, Mar 1, 2013 at 7:16 PM, Frank Harrell lt;

 f.harrell@

 gt; wrote:
 The lattice package uses special logic to allow for multiple
 left-hand-side
 variables in a formula, e.g. y1 + y2 ~ x.  Is there an elegant way to do
 this outside of lattice?  I'm trying to implement a data summarization
 function that logically takes multiple dependent variables.  The usual
 invocation of model.frame( ) causes R to try to do arithmetic addition to
 create a single dependent variable.

 
 Try:
 
 lm( cbind(Sepal.Length, Sepal.Width) ~., iris)
 
 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Multiple-left-hand-side-variables-in-a-formula-tp4660060p4660062.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple left hand side variables in a formula

2013-03-01 Thread Frank Harrell
Thank you Bill.  A temporary re-arrangement of the formula will allow me to
do the usual subset= na.action= processing afterwards.  Nice idea.  I don't
need the dot notation very often for this application.
Frank

William Dunlap wrote
 I don't know how much of the information that model.frame supplies you
 need,
 but you could make a data.frame containing all the variables on both sides
 of them
 formula by changing lhs~rhs into ~lhs+rsh before calling model.frame. 
 E.g.,
 
 f - function (formula)  {
 if (length(formula) == 3) { # has left hand side
 envir - environment(formula)
 formula - formula(call(~, call(+, formula[[2]], 
 formula[[3]])))
 environment(formula) - envir
 }
 formula
 }
 
 This doesn't quite take care of the wild-card dot in the formula: straight
 variables are omitted from dot's expansion but functions of variables are
 not:
 colnames(model.frame(f(log(mpg)+hp ~ .), data=mtcars))
  [1] log(mpg) hp   mpg  cyl 
  [5] disp drat wt   qsec
  [9] vs   am   gear carb
 
 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com
 
 
 -Original Message-
 From: 

 r-help-bounces@

  [mailto:

 r-help-bounces@

 ] On Behalf
 Of Frank Harrell
 Sent: Friday, March 01, 2013 4:17 PM
 To: 

 r-help@

 Subject: [R] Multiple left hand side variables in a formula
 
 The lattice package uses special logic to allow for multiple
 left-hand-side
 variables in a formula, e.g. y1 + y2 ~ x.  Is there an elegant way to do
 this outside of lattice?  I'm trying to implement a data summarization
 function that logically takes multiple dependent variables.  The usual
 invocation of model.frame( ) causes R to try to do arithmetic addition to
 create a single dependent variable.
 
 Thanks
 Frank
 
 
 
 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 --
 View this message in context:
 http://r.789695.n4.nabble.com/Multiple-left-hand-side-
 variables-in-a-formula-tp4660060.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Multiple-left-hand-side-variables-in-a-formula-tp4660060p4660065.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Selecting First Incidence from Longitudinal Data

2013-02-24 Thread Frank Harrell
=

 ID    COMPL  SEX  HEREDITY
 1    0      1      2
 1    0      1      2
 1    3      1      2
 2    0      0      1
 2    1      0      1
 2    2      0      1
 2    2      0      1
 3    0      0      1
 3    0      0      1
 3    0      0      1
 3    0      0      1
 3    2      0      1
 4    0      1      2
 4    0      1      2
 , header = TRUE)

 aggregate(. ~ ID, data = subset(dat, COMPL != 0), head, 1)


 Hope this helps,

 Rui Barradas

 Em 23-02-2013 14:28, Tasnuva Tabassum escreveu:

  I have a longitudinal competing risk data of the form:

 ID    COMPL  SEX   HEREDITY
 1     0       1      2
 1     0       1      2
 1     3       1      2
 2     0       0      1
 2     1       0      1
 2     2       0      1
 2     2       0      1
 3     0       0      1
 3     0       0      1
 3     0       0      1
 3     0       0      1
 3     2       0      1
 4     0       1      2
 4     0       1      2.

 Where, COMPL= health complication of diabetic patients which has
 value
 labels   as  0= no complication,1=coronary heart disease,
 2=retinopathy,
 3=
 nephropathy.


 I want to select only the first complication that occurred to each
 patient.
 What R function can I use?

         [[alternative HTML version deleted]]

 __**
 

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helplt;https://stat.ethz.ch/mailman/listinfo/r-helpgt;
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html
 lt;http://www.R-project.org/posting-guide.htmlgt;

 and provide commented, minimal, self-contained, reproducible code.


 __**
 

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helplt;https://stat.ethz.ch/mailman/listinfo/r-helpgt;
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html lt;http://www.R-project.org/posting-guide.htmlgt;

 and provide commented, minimal, self-contained, reproducible code.




--
==
Xiaogang Su, Ph.D.
Associate Professor  Statistician
School of Nursing, University of Alabama
Birmingham, AL 35294-1210
(205) 934-2355 [Office]


 xgsu@



 xiaogangsu@

https://sites.google.com/site/xgsu00/


    [[alternative HTML version deleted]]

__


 R-help@

  mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
==
Xiaogang Su, Ph.D.
Associate Professor  Statistician
School of Nursing, University of Alabama
Birmingham, AL 35294-1210
(205) 934-2355 [Office]


 xgsu@



 xiaogangsu@

  
https://sites.google.com/site/xgsu00/

 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Selecting-First-Incidence-from-Longitudinal-Data-tp4659455p4659530.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plotting survival curves after multiple imputation

2013-02-23 Thread Frank Harrell
I haven't seen anyone solve this.  I think it would be reasonable to do a
time point by time point averaging (over multiple imputations) of the
underlying survival curve, although there is some question about whether to
freeze the centering constant (sum of beta times covariate mean).  What
will be harder to get is confidence intervals for predicted probabilities
after multiple imputation.  I hope someone will take up the challenge.
Frank

Robert Long wrote
 I am working with some survival data with missing values.
 
 I am using the mice package to do multiple imputation.
 
 I have found code in this thread which handles pooling of the MI results:
 https://stat.ethz.ch/pipermail/r-help/2007-May/132180.html
 
 Now I would like to plot a survival curve using the pooled results.
 
 Here is a reproducible example:
 
 require(survival)
 require(mice)
 
 set.seed(2)
 
 dt - colon
 
 fit - coxph(Surv(time,etype)~rx + sex + age, data=colon)
 
 dummy -
 data.frame(sex=c(1,1,1),rx=c(Obs,Lev,Lev+5FU),age=c(40,40,40))
 plot(survfit(fit, newdata=dummy) )
 
 # now create some missing values in the data
 dt - colon
 dt$rx[sample(1:nrow(dt),50)] - NA
 dt$sex [sample(1:nrow(dt),50)] - NA
 dt$age[sample(1:nrow(dt),50)] - NA
 
 imp -mice(dt)
 
 fit.imp - coxph.mids(Surv(time,etype)~rx + sex + age,imp)
 # Note, this function is defined below...
 
 imputed=summary.impute(pool.impute(fit.imp))
 print(imputed)
 
 # now, how to plot a survival curve with the pooled results ?
 
 
 
 
 ## begin code from linked thread above
 
 coxph.mids - function (formula, data, ...) {
 
  call - match.call()
  if (!is.mids(data)) stop(The data must have class mids)
 
  analyses - as.list(1:data$m)
 
  for (i in 1:data$m) {
data.i- complete(data, i)
analyses[[i]] - coxph(formula, data = data.i, ...)
  }
 
  object - list(call = call, call1 = data$call,
 nmis = data$nmis, analyses = analyses)
 
  return(object)
 }
 
 pool.impute - function (object, method = smallsample) {
 
  if ((m - length(object$analyses))  2)
stop(At least two imputations are needed for pooling.\n)
 
  analyses - object$analyses
 
  k - length(coef(analyses[[1]]))
  names - names(coef(analyses[[1]]))
  qhat  - matrix(NA, nrow = m, ncol = k, dimnames = list(1:m,names))
  u - array(NA, dim = c(m, k, k),
 dimnames = list(1:m, names, names))
 
  for (i in 1:m) {
fit   - analyses[[i]]
qhat[i, ] - coef(fit)
u[i, , ]  - vcov(fit)
  }
 
  qbar - apply(qhat, 2, mean)
  ubar - apply(u, c(2, 3), mean)
  e - qhat - matrix(qbar, nrow = m, ncol = k, byrow = TRUE)
  b - (t(e) %*% e)/(m - 1)
  t - ubar + (1 + 1/m) * b
  r - (1 + 1/m) * diag(b/ubar)
  f - (1 + 1/m) * diag(b/t)
  df - (m - 1) * (1 + 1/r)^2
 
  if (method == smallsample) {
 
if( any( class(fit) == coxph ) ){
 
  ### this loop is the hack for survival analysis ###
 
  status   - fit$y[ , 2]
  n.events - sum(status == max(status))
  p- length( coefficients( fit )  )
  dfc  - n.events - p
 
} else {
 
  dfc - fit$df.residual
}
 
df - dfc/((1 - (f/(m + 1)))/(1 - f) + dfc/df)
  }
 
  names(r) - names(df) - names(f) - names
  fit - list(call = call, call1 = object$call, call2 = object$call1,
  nmis = object$nmis, m = m, qhat = qhat, u = u,
  qbar = qbar, ubar = ubar, b = b, t = t, r = r, df = df,
  f = f)
 
  return(fit)
 }
 
 summary.impute - function(object){
 
   if (!is.null(object$call1)){
 cat(Call: )
 dput(object$call1)
   }
 
   est  - object$qbar
   se   - sqrt(diag(object$t))
   tval - est/se
   df   - object$df
   pval - 2 * pt(abs(tval), df, lower.tail = FALSE)
 
   coefmat - cbind(est, se, tval, pval)
   colnames(coefmat) - c(Estimate, Std. Error,
t value, Pr(|t|))
 
   cat(\nCoefficients:\n)
   printCoefmat( coefmat, P.values=T, has.Pvalue=T, signif.legend=T )
 
   cat(\nFraction of information about the coefficients
   missing due to nonresponse:,\n)
   print(object$f)
 
   ans - list( coefficients=coefmat, df=df,
call=object$call1, fracinfo.miss=object$f )
   invisible( ans )
   }
 
 ### end code from linked thread above
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Plotting-survival-curves-after-multiple-imputation

Re: [R] Nomogram after Cox Random Effect (frailty) model

2013-02-14 Thread Frank Harrell
This can't be done directly, but if you can output predicted values from
coxme and fit an rms ols model that predicts these predicted values with an
R^2 of 1.0 you can use nomogram() on the ols fit to get what you want.
Frank

george boul wrote
 Dear R-users,
 I am a novice R-user with some experience in using the RMS package for
 taking nomograms after various survival models.
 This time, I am trying to plot a nomogram after a Random Effects Cox,
 implemented by the coxme package. My questions are:
 1. Is it possible to take a nomogram directly after the coxme survival
 function?
 2. If not is there a way to take the linear predictor results and plug
 them into the RMS cox model and get the nomogram I want?
 Any instructions/suggestions would be highly appreciable? 
 Thank you in advance
 George
 
 Dr. George Bouliotis
 Research Fellow in BiostatisticsMRC-MHTMR (Midland Hub for Trials
 Methodology Research) University of BirminghamPublic Health
 BuildingBirmingham B115 2TTUnited Kingdom
 
   [[alternative HTML version deleted]]
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Nomogram-after-Cox-Random-Effect-frailty-model-tp4658544p4658545.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Relative Risk in logistic regression

2013-01-31 Thread Frank Harrell
I am curious why one would want risk ratios.  Unlike odds ratios, they are
not interpretable without reference to the base risk.  For example a risk
ratio of 2 cannot possibly apply to anyone with a starting risk exceeding
1/2.

I think it is most helpful to use one of the existing nomograms to show
someone how the base risk and odds ratio translate to final risk, for a
range of base risk.

Frank

David Winsemius wrote
 On Jan 30, 2013, at 5:49 AM, aminreza Aamini wrote:
 
 Hi all,
 I am very grateful to all those who write to me
 1) how i  can  obtain relative risk (risk ratio) in logistic  
 regression in R.
 2) how to obtain  the predicted risk for a certain individual using
 fitted regression model in R.
 
 You obtain the predicted probabilities with something like:
 
 predict(model, data.frame(x1=a, x2=30), type = response)
 
 See ?predict.glm
 
 This would give the odds ratios (similar but larger than the risk  
 ratios):
 
 exp(coef(model))
 
 -- 
 David Winsemius, MD
 Alameda, CA, USA
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Relative-Risk-in-logistic-regression-tp4657040p4657163.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Regression Modeling Strategies 3-Day Short Course March 2013

2013-01-31 Thread Frank Harrell

*RMS Short Course 2013*
Frank E. Harrell, Jr., Ph.D., Professor and Chair
Department of Biostatistics, Vanderbilt University School of Medicine

*March 4, 5  6, 2013*
8:00am - 4:30pm
Student Life Center Board of Trust Room
Vanderbilt University
Nashville Tennessee USA

See http://biostat.mc.vanderbilt.edu/2013RMSShortCourse for details.

The course includes statistical methodology, case studies, and use of 
the R rms package.


Please email interest to Eve Anderson { eve.a.ander...@vanderbilt.edu }

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Difference between R and SAS in Corcordance index in ordinal logistic regression

2013-01-28 Thread Frank Harrell

For lrm fits, predict(fit, type='mean') predicts the mean Y, not a
probability.
Frank

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Difference between R and SAS in Corcordance index in ordinal logistic regression

2013-01-24 Thread Frank Harrell
lrm does some binning to make the calculations faster.  The exact calculation
is obtained by running

f - lrm(...)
rcorr.cens(predict(f), DA), which results in:

   C IndexDxy   S.D.  nmissing 
0.96814404 0.93628809 0.0380833632. 0. 
uncensored Relevant Pairs Concordant  Uncertain 
   32.   722.   699. 0. 

I.e., C=.968 instead of .963.  But this is even farther away than the value
from SAS you reported.

If you don't believe the rcorr.cens result, create a tiny example and do the
calculations by hand.
Frank


blackscorpio81 wrote
 Dear R users,
 
 Please allow to me ask for your help.
  I am currently using Frank Harrell Jr package rms to model ordinal
 logistic regression with proportional odds. In order to assess model
 predictive ability, C concordance index is displayed and equals to 0.963.
 
 This is the code I used with the data attached 
 data.csv http://r.789695.n4.nabble.com/file/n4656409/data.csv  
  :
 
require(rms)
a-read.csv2(/data.csv,row.names = 1,na.strings = c(, ),dec=.)
lrm(DA~SJ+TJ,data=a)
 
 Logistic Regression Model
 
 lrm(formula = DA~SJ+TJ, data = a)
 
 Frequencies of Responses
 
  1  2  3  4 
  6 13  9  4 
 
   Model Likelihood 
 Discrimination  Rank Discrim.
  Ratio Test   
 Indexes   Indexes   
 Obs32  LR chi2  53.14 R2  
 0.875  C   0.963
 max |deriv| 6e-06 d.f. 2g 
 
 8.690Dxy 0.925
  Pr( chi2) 0.0001 gr   
 5942.469gamma   0.960
   

 gp   0.486  tau-a   0.673
   

 Brier0.022 
 
 Coef  S.E.Wald  Z Pr(|Z|)
 y=2 -0.6161 0.6715-0.92   0.3589  
 y=3 -6.5949 2.3750-2.78  0.0055  
 y=4-16.23585.3737 -3.02 0.0025  
 SJ 1.4341  0.5180  2.77 0.0056  
 TJ  0.5312  0.2483 2.14  0.0324
 
 I wanted to compare the results with SAS. I found the same slopes and
 intercept with opposite signs, which is normal since R models the
 probabilities P(Y=k|X) whereas SAS models the probabilities P(Y=k|X) 
 (see pdf attached, page 2 , table Association des probabilités prédites
 et des réponses observées).
 SAS_Report_-_Logistic_Regression.pdf
 http://r.789695.n4.nabble.com/file/n4656409/SAS_Report_-_Logistic_Regression.pdf
   
 
 I chose the order for levels.
 
 I controlled that the corresponding probabilities P(Y=k|X)  are the same
 with both softwares. But I can't understand why in SAS the C index drops
 from 0.963 down to 0.332.
 
 I read a lot of things about this and it seems to me that both softwares
 use slightly different technique to compute the C index ; it is
 nevertheless surprising to me to observe such a shift in the results.
 
 Does anyone have a clue on this ?
 Thank you very much for you help
 Blackscorpio





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Difference-between-R-and-SAS-in-Corcordance-index-in-ordinal-logistic-regression-tp4656409p4656508.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Difference between R and SAS in Corcordance index in ordinal logistic regression

2013-01-24 Thread Frank Harrell

Please define 'mean probabilities'.

To compute the C-index or Dxy you need anything that is monotonically 
related to the prediction of interest, including the linear combination 
of covariates ignoring all intercepts.   In other words you don't need 
to go to the trouble of computing probabilities unless you are binning, 
as the binning is usually done on a controllable 0-1 scale.   When I bin 
I just choose the middle intercept, I seem to recall.  Also try running 
SAS with a very tiny BINWIDTH and see if you get 1 - .968 as the answer 
for C.  [I wrote the original algorithm SAS uses for this in the old SAS 
PROC LOGIST.  Binning was just for speed.]


You might also re-run SAS after negating the response variable.
Frank

blackscorpio wrote

Dear Dr Harrell,
Thank you very much for your answer. Actually I also tried to found the C
index by hand on these data using the mean probabilities and I found
0.968, as you just showed.
I understand now why I had a slight difference with the outpout of lrm. I
am thus convinced that this result is correct.

I read on the SAS help that the procedure logistic also proceed to some
binning (BINWIDTH option) :

http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect010.htm

But I cannot explain why the difference between the two softwares is that
huge, especially since the class probabilities are the same.

Do you think it could be due to the fact that mean probabilities are
computed differently ?

Thank for your help and best regards,
OC



Date: Thu, 24 Jan 2013 05:28:13 -0800
From:



f.harrell@



To:



r-help@



Subject: Re: [R] Difference between R and SAS in Corcordance index in
ordinal logistic regression

lrm does some binning to make the calculations faster.  The exact
calculation
is obtained by running

f - lrm(...)
rcorr.cens(predict(f), DA), which results in:

   C IndexDxy   S.D.  n
missing
0.96814404 0.93628809 0.0380833632.
0.
uncensored Relevant Pairs Concordant  Uncertain
   32.   722.   699. 0.

I.e., C=68 instead of .963.  But this is even farther away than the
value
from SAS you reported.

If you don't believe the rcorr.cens result, create a tiny example and do
the
calculations by hand.
Frank


blackscorpio81 wrote
 Dear R users,

 Please allow to me ask for your help.
  I am currently using Frank Harrell Jr package rms to model ordinal
 logistic regression with proportional odds. In order to assess model
 predictive ability, C concordance index is displayed and equals to
0.963.

 This is the code I used with the data attached
 data.csv lt;http://r.789695.n4.nabble.com/file/n4656409/data.csvgt;
  :

require(rms)
a-read.csv2(/data.csv,row.names =,na.strings = c(, ),dec=.)
lrm(DA~SJ+TJ,data=

 Logistic Regression Model

 lrm(formula =A~SJ+TJ, data = a)

 Frequencies of Responses

  1  2  3  4
  6 13  9  4

   Model Likelihood
 Discrimination  Rank Discrim.
  Ratio Test
 Indexes   Indexes
 Obs32  LR chi2  53.14
R2
 0.875  C   0.963
 max |deriv| 6e-06 d.f. 2g
 8.690Dxy 0.925
  Pr( chi2) 0.0001
gr
 5942.469gamma   0.960

 gp   0.486  tau-a   0.673

 Brier0.022

 Coef  S.E.Wald  Z
Pr(|Z|)
 y=-0.6161 0.6715-0.92   0.3589
 y=-6.5949 2.3750-2.78  0.0055
 y=   -16.23585.3737 -3.02 0.0025
 SJ 1.4341  0.5180  2.77 0.0056
 TJ  0.5312  0.2483 2.14  0.0324

 I wanted to compare the results with SAS. I found the same slopes and
 intercept with opposite signs, which is normal since R models the
 probabilities P(Y=X) whereas SAS models the probabilities P(Y=k|X)
 (see pdf attached, page 2 , table Association des probabilités
prédites
 et des réponses observées).
 SAS_Report_-_Logistic_Regression.pdf

lt;http://r.789695.n4.nabble.com/file/n4656409/SAS_Report_-_Logistic_Regression.pdfgt;

 I chose the order for levels.

 I controlled that the corresponding probabilities P(Y=X)  are the
same
 with both softwares. But I can't understand why in SAS the C index
drops
 from 0.963 down to 0.332.

 I read a lot of things about this and it seems to me that both
softwares
 use slightly different technique to compute the C index ; it is
 nevertheless surprising to me to observe such a shift in the results.

 Does anyone have a clue on this ?
 Thank you very much for you help
 Blackscorpio






__
R-help@r-project.org

Re: [R] Does psm::Surv handle interval2 data?

2013-01-20 Thread Frank Harrell
Chris,

I've fixed Surv in rms.  The fix will be in the next release.  For now you
can do source('http://biostat.mc.vanderbilt.edu/tmp/Surv.s') after issuing
require(rms).
Frank


Frank Harrell wrote
 Chris,
 
 Thanks for sending the specifics.  It appears that I've let Surv in rms
 fall behind recent versions of Surv in survival.  It will take me a few
 days to get this fixed.  I'll send a follow-up note then.
 Frank
 Andrews, Chris wrote
 Does Surv in psm handle interval2 data?  The argument list seems to
 indicate it does but I get an error.
 
 Thanks,
 Chris
 
 # code
 library('survival')
 left - c(1, 3, 5, NA)
 right -c(2, 3, NA, 4)
 Surv(left, right, type='interval2')
 survreg(Surv(left, right, type='interval2') ~ 1)
 
 library('rms')
 Surv(left, right, type='interval2') # error
 args(Surv)
 psm(Surv(left, right, type='interval2') ~ 1) # same error (of course)
 psm(survival::Surv(left, right, type='interval2') ~ 1) # runs
 
 
 # output
 
 R version 2.15.2 (2012-10-26) -- Trick or Treat
 
 Copyright (C) 2012 The R Foundation for Statistical Computing
 
 ISBN 3-900051-07-0
 
 Platform: x86_64-w64-mingw32/x64 (64-bit)
 
 
 
 R is free software and comes with ABSOLUTELY NO WARRANTY.
 
 You are welcome to redistribute it under certain conditions.
 
 Type 'license()' or 'licence()' for distribution details.
 
 
 
   Natural language support but running in an English locale
 
 
 
 R is a collaborative project with many contributors.
 
 Type 'contributors()' for more information and
 
 'citation()' on how to cite R or R packages in publications.
 
 
 
 Type 'demo()' for some demos, 'help()' for on-line help, or
 
 'help.start()' for an HTML browser interface to help.
 
 Type 'q()' to quit R.
 
 
 
 library('survival')
 
 Loading required package: splines
 
 left - c(1, 3, 5, NA)
 
 right -c(2, 3, NA, 4)
 
 Surv(left, right, type='interval2')
 
 [1] [1, 2] 3  5+ 4-
 
 survreg(Surv(left, right, type='interval2') ~ 1)
 
 Call:
 
 survreg(formula = Surv(left, right, type = interval2) ~ 1)
 
 
 
 Coefficients:
 
 (Intercept)
 
1.317943
 
 
 
 Scale= 0.6098782
 
 
 
 Loglik(model)= -5.3   Loglik(intercept only)= -5.3
 
 n= 4
 
 library('rms')
 
 Loading required package: Hmisc
 
 Hmisc library by Frank E Harrell Jr
 
 
 
 Type library(help='Hmisc'), ?Overview, or ?Hmisc.Overview')
 
 to see overall documentation.
 
 
 
 NOTE:Hmisc no longer redefines [.factor to drop unused levels when
 
 subsetting.  To get the old behavior of Hmisc type dropUnusedLevels().
 
 
 
 
 
 Attaching package: 'Hmisc'
 
 
 
 The following object(s) are masked from 'package:survival':
 
 
 
 untangle.specials
 
 
 
 The following object(s) are masked from 'package:base':
 
 
 
 format.pval, round.POSIXt, trunc.POSIXt, units
 
 
 
 
 
 Attaching package: 'rms'
 
 
 
 The following object(s) are masked from 'package:survival':
 
 
 
 Surv
 
 
 
 Surv(left, right, type='interval2') # error
 
 Error in Surv(left, right, type = interval2) :
 
   argument event is missing, with no default
 
 args(Surv)
 
 function (time, time2, event, type = c(right, left, interval,
 
 counting, interval2), origin = 0)
 
 NULL
 
 psm(Surv(left, right, type='interval2') ~ 1) # same error (of course)
 
 Error in Surv(left, right, type = interval2) :
 
   argument event is missing, with no default
 
 psm(survival::Surv(left, right, type='interval2') ~ 1) # runs
 
 
 
 Parametric Survival Model: Weibull Distribution
 
 
 
 psm(formula = survival::Surv(left, right, type = interval2) ~
 
 1)
 
 
 
 Model LikelihoodDiscrimination
 
Ratio Test  Indexes
 
 Obs4LR chi2 0.00R2   0.000
 
 Events 6d.f.   0g0.000
 
 sigma 0.6099gr   1.000
 
 
 
 CoefS.E.   Wald Z Pr(|Z|)
 
 (Intercept)  1.3179 0.3598  3.66  0.0002
 
 Log(scale)  -0.4945 0.5977 -0.83  0.4081
 
 
 
 **
 Electronic Mail is not secure, may not be read every day, and should not
 be used for urgent or sensitive issues 
 
  [[alternative HTML version deleted]]
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Does-psm-Surv-handle-interval2-data-tp4655472p4656092.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Does psm::Surv handle interval2 data?

2013-01-14 Thread Frank Harrell
Chris,

Thanks for sending the specifics.  It appears that I've let Surv in rms fall
behind recent versions of Surv in survival.  It will take me a few days to
get this fixed.  I'll send a follow-up note then.
Frank

Andrews, Chris wrote
 Does Surv in psm handle interval2 data?  The argument list seems to
 indicate it does but I get an error.
 
 Thanks,
 Chris
 
 # code
 library('survival')
 left - c(1, 3, 5, NA)
 right -c(2, 3, NA, 4)
 Surv(left, right, type='interval2')
 survreg(Surv(left, right, type='interval2') ~ 1)
 
 library('rms')
 Surv(left, right, type='interval2') # error
 args(Surv)
 psm(Surv(left, right, type='interval2') ~ 1) # same error (of course)
 psm(survival::Surv(left, right, type='interval2') ~ 1) # runs
 
 
 # output
 
 R version 2.15.2 (2012-10-26) -- Trick or Treat
 
 Copyright (C) 2012 The R Foundation for Statistical Computing
 
 ISBN 3-900051-07-0
 
 Platform: x86_64-w64-mingw32/x64 (64-bit)
 
 
 
 R is free software and comes with ABSOLUTELY NO WARRANTY.
 
 You are welcome to redistribute it under certain conditions.
 
 Type 'license()' or 'licence()' for distribution details.
 
 
 
   Natural language support but running in an English locale
 
 
 
 R is a collaborative project with many contributors.
 
 Type 'contributors()' for more information and
 
 'citation()' on how to cite R or R packages in publications.
 
 
 
 Type 'demo()' for some demos, 'help()' for on-line help, or
 
 'help.start()' for an HTML browser interface to help.
 
 Type 'q()' to quit R.
 
 
 
 library('survival')
 
 Loading required package: splines
 
 left - c(1, 3, 5, NA)
 
 right -c(2, 3, NA, 4)
 
 Surv(left, right, type='interval2')
 
 [1] [1, 2] 3  5+ 4-
 
 survreg(Surv(left, right, type='interval2') ~ 1)
 
 Call:
 
 survreg(formula = Surv(left, right, type = interval2) ~ 1)
 
 
 
 Coefficients:
 
 (Intercept)
 
1.317943
 
 
 
 Scale= 0.6098782
 
 
 
 Loglik(model)= -5.3   Loglik(intercept only)= -5.3
 
 n= 4
 
 library('rms')
 
 Loading required package: Hmisc
 
 Hmisc library by Frank E Harrell Jr
 
 
 
 Type library(help='Hmisc'), ?Overview, or ?Hmisc.Overview')
 
 to see overall documentation.
 
 
 
 NOTE:Hmisc no longer redefines [.factor to drop unused levels when
 
 subsetting.  To get the old behavior of Hmisc type dropUnusedLevels().
 
 
 
 
 
 Attaching package: 'Hmisc'
 
 
 
 The following object(s) are masked from 'package:survival':
 
 
 
 untangle.specials
 
 
 
 The following object(s) are masked from 'package:base':
 
 
 
 format.pval, round.POSIXt, trunc.POSIXt, units
 
 
 
 
 
 Attaching package: 'rms'
 
 
 
 The following object(s) are masked from 'package:survival':
 
 
 
 Surv
 
 
 
 Surv(left, right, type='interval2') # error
 
 Error in Surv(left, right, type = interval2) :
 
   argument event is missing, with no default
 
 args(Surv)
 
 function (time, time2, event, type = c(right, left, interval,
 
 counting, interval2), origin = 0)
 
 NULL
 
 psm(Surv(left, right, type='interval2') ~ 1) # same error (of course)
 
 Error in Surv(left, right, type = interval2) :
 
   argument event is missing, with no default
 
 psm(survival::Surv(left, right, type='interval2') ~ 1) # runs
 
 
 
 Parametric Survival Model: Weibull Distribution
 
 
 
 psm(formula = survival::Surv(left, right, type = interval2) ~
 
 1)
 
 
 
 Model LikelihoodDiscrimination
 
Ratio Test  Indexes
 
 Obs4LR chi2 0.00R2   0.000
 
 Events 6d.f.   0g0.000
 
 sigma 0.6099gr   1.000
 
 
 
 CoefS.E.   Wald Z Pr(|Z|)
 
 (Intercept)  1.3179 0.3598  3.66  0.0002
 
 Log(scale)  -0.4945 0.5977 -0.83  0.4081
 
 
 
 **
 Electronic Mail is not secure, may not be read every day, and should not
 be used for urgent or sensitive issues 
 
   [[alternative HTML version deleted]]
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Does-psm-Surv-handle-interval2-data-tp4655472p4655475.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] suggestions about import SAS results to R.

2013-01-03 Thread Frank Harrell
SAS' export to csv is buggy (improper quote matching, etc.).  Instead
consider having SAS create a version 5 transport file, then use the Hmisc
package's sasxport.get function to handle labels, etc.
Frank

Ista Zahn wrote
 Well it was worth a try. If you haven't solved the problem some other
 way I suggest exporting from SAS to a plain text format like .csv and
 read into R with read.table.
 
 Best,
 Ista
 
 On Wed, Jan 2, 2013 at 5:44 PM, Yuan, Rebecca
 lt;

 rebecca.yuan@

 gt; wrote:
 Hello Ista,

 Thanks for mention this package, however, it will not support big file,
 where I got the error message:

 --
 Error in read.sas7bdat(C:/Documents and Settings/test.sas7bdat) :
   big endian files are not supported
 --

 Cheers,

 Rebecca

 -Original Message-
 From: Ista Zahn [mailto:

 istazahn@

 ]
 Sent: Wednesday, January 02, 2013 5:25 PM
 To: Yuan, Rebecca
 Cc: R help; 

 sas-l@.uga

 Subject: Re: [R] suggestions about import SAS results to R.

 Hi Rebecca,

 I've had success with the sas7bdat package (
 http://cran.r-project.org/web/packages/sas7bdat/ )

 Best,
 Ista

 On Wed, Jan 2, 2013 at 5:15 PM, Yuan, Rebecca lt;

 rebecca.yuan@

 gt; wrote:
 Hello all,

 Thanks for the suggestions.

 I tried to import data from sas to R using foreign package. In the
 first step, I tried to use read.ssd() to convert the sas file
 sales.sas7bdat to SAS Transport formats. But I got an error message as
 ---
 source(testss.r)
 Error in file.symlink(oldPath, linkPath) :
   symbolic links are not supported on this version of Windows
 ---

 While in testss.r,

 ---
 require(foreign)
 rm(list=ls())
 sashome - C:/Program Files/SAS/SASFoundation/9.2/
 checkfile - read.ssd(C:/Documents and Settings/test,sales,sascmd
 = file.path(sashome, sas.exe))
 ---

 I am using windows xp and R 2.15.2.

 Thanks!

 Rebecca




 -Original Message-
 From: 

 mehmet.suzen@

  [mailto:

 mehmet.suzen@

 ] On Behalf
 Of Suzen, Mehmet
 Sent: Wednesday, January 02, 2013 4:14 PM
 To: David Winsemius
 Cc: Yuan, Rebecca; R help; 

 sas-l@.uga

 Subject: Re: [R] suggestions about import SAS results to R.

 On Jan 2, 2013, at 12:37 PM, Yuan, Rebecca wrote:
 I am wondering if there is an efficient way to read SAS data directly
 in R, or what would be a better connection between SAS and R if I need
 to use R to deal with data achieved from SAS?


 You may try foreign and SASxport packages:
 http://cran.r-project.org/web/packages/foreign/index.html
 http://cran.r-project.org/web/packages/SASxport/index.html

 But I heard that there might be some issues with the newest SAS version
 but give it a try...

 --
 This message, and any attachments, is for the intended
 r...{{dropped:2}}

 __
 

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 --
 This message, and any attachments, is for the intended...{{dropped:6}}
 
 __

 R-help@

  mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/suggestions-about-import-SAS-results-to-R-tp465p4654521.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] suggestions about import SAS results to R.

2013-01-03 Thread Frank Harrell
Not clear why you responded to my note since you were not using the solution
I suggested.
Frank

By the way if you need to read SAS7BDAT FILES frequently it is worth the
licensing fee for STAT/Transfer, which can create binary R .rda objects.  I
have code that finishes the job by putting labels on individual variables.
Frank




-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/suggestions-about-import-SAS-results-to-R-tp465p4654554.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   4   5   >