from:"Frank Harrell"

Re: [R] bwplot change whiskers position to percentile 5 and P95

2010-10-16 Thread Frank Harrell


See also the panel.bpplot function in the Hmisc package.  This gives you many
options for extended box plots.

Frank


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/bwplot-change-whiskers-position-to-percentile-5-and-P95-tp2993755p2998578.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Frank Harrell's 2011 RMS Short Course-March 9,10 and 11 (fwd)

2010-10-23 Thread Frank Harrell



2011 Regression Modeling Strategies Short Course by Frank E. Harrell, Jr., 
Ph.D., Professor and Chair, Department of Biostatistics, Vanderbilt University 
School of Medicine.

2011 Dates: Wednesday, March 9 (8:00AM-11:00AM), Thursday, March 10 
(8:00AM-4:00PM) and Friday, March 11, 2011 (8:00AM-4:00PM).

Location: Vanderbilt Campus; Sarratt Student Center (SARR) 216/220.
Requirements: strong competence in multiple regression models.
Target audience: statisticians and related quantitative researchers who want to 
learn some general model development
strategies, including approaches to missing data imputation, data reduction, 
model validation, and relaxing linearity assumptions.

Registration: please email interest to Audrey Carvajal 
(audrey.carva...@vanderbilt.edu) no later than December 31, 2010.
Cancellation policy: February 15, 2011; send cancellation notification to 
Audrey via email.

For more  information including lodging, hotels, course schedules and fees, 
please visit
http://biostat.mc.vanderbilt.edu/RmS  or respond to this email with questions.

Audrey R. Carvajal
Department Of Biostatistics
School of Medicine
Vanderbilt University
(615) 322-2001

Flyer_RMS2011.pdf
Description: Flyer_RMS2011.pdf
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error message in fit.mult.impute (Hmisc package)

2010-11-02 Thread Frank Harrell


I tried your code with the rms package (replacement for the Design package;
see http://biostat.mc.vanderbilt.edu/Rrms) and it worked fine.

Note that multiple imputation needs the outcome variable in the imputation
model.

Frank


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Error-message-in-fit-mult-impute-Hmisc-package-tp3022817p3023563.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple imputation for nominal data

2010-11-03 Thread Frank Harrell


The aregImpute function in the Hmisc package can do this through predictive
mean matching and canonical variates (Fisher's optimum scoring algorithm).

Frank


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Multiple-imputation-for-nominal-data-tp3024276p3025181.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] An example for predab.resample in r

2010-11-07 Thread Frank Harrell


predab.resample can be called by users but usually is not.  It is called by
calibrate.* and validate.* functions in the rms and Design packages (you
neglected to say which package you were using; update to rms if using
Design).  The calibrate and validate functions make it easy to use .632.  

Note that .632 only seems to help when you are using an improper scoring
rule (e.g., proportion classified correctly), not when using smoother
scoring rules such as Brier score, ROC area, etc.

Frank


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/An-example-for-predab-resample-in-r-tp3031305p3031404.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bootstrap confidence intervals using bootcov from the rms package

2010-11-10 Thread Frank Harrell


Unfortunately, bootcov is not meant to operate on fit objects produced by
fit.mult.impute.  bootcov gets there too late in the process and does not
know how to penalize for imputation.

Frank


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Bootstrap-confidence-intervals-using-bootcov-from-the-rms-package-tp3035194p3036102.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] leaps::regsubsets p-value

2010-11-10 Thread Frank Harrell


P-values have little meaning in this context.  Nor do regression coefficient
estimates and especially standard errors (they are biased).

Frank


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/leaps-regsubsets-p-value-tp3036640p3036740.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fit a distribution to an ECDF

2010-11-16 Thread Frank Harrell


Fitting curves to an ECDF will result in a fit that has the same precision as
the ECDF if variances are calculated correctly.  So why not stop with the
ECDF as your estimator?

Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Fit-a-distribution-to-an-ECDF-tp3045793p3046091.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting number of patients at risk below survival curve

2010-11-19 Thread Frank Harrell


Use the rms package, a replacement for of Design

Frank


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Plotting-number-of-patients-at-risk-below-survival-curve-tp3048900p3050315.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calculating martingale residual on new data using

2010-11-21 Thread Frank Harrell


The tendency is to use residual-like diagnostics on the entire dataset that
was available for model development.  For test data we typically run
predictive accuracy analyses.  For example, one of the strongest validations
is to show, in a high-resolution calibration plot, that absolute predictions
(e.g., probability of survival at 2 years) are accurate.

Frank


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/calculating-martingale-residual-on-new-data-using-predict-coxph-tp3050712p3052377.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how do remove those predictor which have p value greater than 0.05 in GLM?

2010-11-22 Thread Frank Harrell


What would make you want to delete a variable because P  0.05?  That will
invalidate every aspect of statistical inference for the model.

Frank


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-do-remove-those-predictor-which-have-p-value-greater-than-0-05-in-GLM-tp3053921p3054478.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] xYplot error

2010-07-27 Thread Frank Harrell

If the x-axis variable is really a factor, xYplot will not handle it. 
You probably need a dot chart instead (see Hmisc's Dotplot).


Note that it is unlikely that the confidence intervals are really 
symmetric.

Frank

On Tue, 27 Jul 2010, Kang Min wrote:


Hi,

I'm trying to plot a graph with error bars using xYplot in the Hmisc
package. My data looks like this.

mortstand sitetype
0.042512776   0.017854525   Plot A   ST
0.010459803   0.005573305 PF  ST
0.005188321   0.006842107MSFST
0.004276068   0.011592129YSF ST
0.044586495   0.035225266   Plot A   LD
0.038810662   0.037355408 PF LD
0.027567430   0.020523820MSF   LD
0.024698872   0.020320976YSF   LD

Having read previous posts on xYplot being unable to plot x-axis as
factors, I used numericScale, but I still get this error.

Error in label.default(xv, units = TRUE, plot = TRUE, default =
as.character(xvname),  :
 the default string cannot be of length greater then one

I used:

xYplot(cbind(mort, mort + stand, mort - stand) ~ numericScale(site) |
type, method=bars)

Am I missing something or doing something wrong?

Thanks.
KM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to generate a random data from a empirical distribition

2010-07-27 Thread Frank Harrell

Easiest thing is to sample with replacement from the original data. 
This is the idea behind the bootstrap, which is sampling from the 
empirical CDF.


Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Tue, 27 Jul 2010, Greg Snow wrote:


Another option for fitting a smooth distribution to data (and generating future 
observations from the smooth distribution) is to use the logspline package.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
project.org] On Behalf Of xin wei
Sent: Monday, July 26, 2010 12:36 PM
To: r-help@r-project.org
Subject: [R] how to generate a random data from a empirical
distribition


hi, this is more a statistical question than a R question. but I do
want to
know how to implement this in R.
I have 10,000 data points. Is there any way to generate a empirical
probablity distribution from it (the problem is that I do not know what
exactly this distribution follows, normal, beta?). My ultimate goal is
to
generate addition 20,000 data point from this empirical distribution
created
from the existing 10,000 data points.
thank you all in advance.


--
View this message in context: http://r.789695.n4.nabble.com/how-to-
generate-a-random-data-from-a-empirical-distribition-
tp2302716p2302716.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Rank ANCOVA

2010-07-27 Thread Frank Harrell

This has been shown to yield unreliable analyses.  Use the more formal 
proportional odds ordinal logistic model.  This is a generalization of 
the Wiloxon-Mann-Whitney-Kruskal-Wallis statistic.  This is 
implemented in the rms package and elsewhere.


Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Tue, 27 Jul 2010, Dennis Fisher wrote:


Colleagues

I am trying to replicate some analyses performing in SAS using a rank ANCOVA 
model.  I assume that this is a non-parametric ANOVA with covariates but I have not 
been able to get confirmation from the people who did the original analyses.  Can anyone 
direct me to the comparable function for R?  Thanks in advance.

Not that it matters in this situation but the platform is OS X; R version is 
2.11.

Dennis

Dennis Fisher MD
P  (The P Less Than Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] xYplot error

2010-07-28 Thread Frank Harrell




Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Wed, 28 Jul 2010, Kang Min wrote:


Hi Frank,

Thanks for the suggestion. using numericScale() does work for Dotplot,
but there're still a few issues:

1. My factor names are Plot A, PF, MSF, and YSF, so numericScale turns
that into 3, 2, 1, 4 and the x-axis is plotted 1, 2, 3, 4. Is there
any way I can retain the same order on the graph?


Not sure why you are using numericScale.  You can use the original 
factor variable.  If you need to re-order its levels for plotting 
using reorder.factor.


 

2. I can't get the error bars displayed even after using
method=bars, only the mean, lower and upper bounds of the data as
points.

This the line I used: Dotplot(cbind(mort, mort + stand, mort - stand)
~ numericScale(site) | type, data = mort, method=bands)


That looks OK but I can't test it right now.  Please continue to have 
a look, and if you still don't see the problem provide a tiny 
reproducible example with self-contained data I can access.


Frank

 

Thanks for your help.

KM

On Jul 27, 9:58 pm, Frank Harrell f.harr...@vanderbilt.edu wrote:

If the x-axis variable is really a factor, xYplot will not handle it.
You probably need a dot chart instead (see Hmisc's Dotplot).

Note that it is unlikely that the confidence intervals are really
symmetric.
Frank



On Tue, 27 Jul 2010, Kang Min wrote:

Hi,



I'm trying to plot a graph with error bars using xYplot in the Hmisc
package. My data looks like this.



mort                    stand         site    type
0.042512776   0.017854525   Plot A   ST
0.010459803   0.005573305     PF      ST
0.005188321   0.006842107    MSF    ST
0.004276068   0.011592129    YSF     ST
0.044586495   0.035225266   Plot A   LD
0.038810662   0.037355408     PF     LD
0.027567430   0.020523820    MSF   LD
0.024698872   0.020320976    YSF   LD



Having read previous posts on xYplot being unable to plot x-axis as
factors, I used numericScale, but I still get this error.



Error in label.default(xv, units = TRUE, plot = TRUE, default =
as.character(xvname),  :
 the default string cannot be of length greater then one



I used:



xYplot(cbind(mort, mort + stand, mort - stand) ~ numericScale(site) |
type, method=bars)



Am I missing something or doing something wrong?



Thanks.
KM



__
r-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to generate a random data from a empirical distribition

2010-07-28 Thread Frank Harrell


This is true by definition.

Read about the bootstrap which may give you some good background 
information.


Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Wed, 28 Jul 2010, xin wei wrote:



hi, Frank:
how can we make sure the randomly sampled data follow the same distribution
as the original dataset? i assume each data point has the same prabability
to be selected in a simple random sampling scheme.

thanks
--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-generate-a-random-data-from-a-empirical-distribition-tp2302716p2305275.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with normality req. for ANOVA

2010-08-02 Thread Frank Harrell

To add to David's note, the Kruskal-Wallis test is the nonparametric 
counterpart to one-way ANOVA.  You can get a series of K-W tests for 
several grouping or continuous independent variables (but note these 
are SEPARATE analyses) using the Hmisc package's spearman2 function. 
The generalization of K-W to the case of multiple independent 
variables is the proportional odds ordinal logistic model (see e.g. 
the rms package lrm function).


Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Mon, 2 Aug 2010, David Winsemius wrote:



On Aug 2, 2010, at 9:33 AM, wwreith wrote:



I am conducting an experiment with four independent variables each
of which
has three or more factor levels. The sample size is quite large i.e.
several
thousand. The dependent variable data does not pass a normality test
but
visually looks close to normal so is there a way to compute the
affect
this would have on the p-value for ANOVA or is there a way to
perform an
nonparametric test in R that will handle this many independent
variables.
Simply saying ANOVA is robust to small departures from normality is
not
going to be good enough for my client.


The statistical assumption of normality for linear models do not apply
to the distribution of the dependent variable, but rather to the
residuals after a model is estimated. Furthermore, it is the
homoskedasticity assumption that is more commonly violated and also
greater threat to validity. (And if you don't already know both of
these points, then you desperately need to review your basic modeling
practices.)


 I need to compute an error amount for
ANOVA or find a nonparametric equivalent.


You might get a better answer if you expressed the first part of that
question in unambiguous terminology.  What is error amount?

For the second part, there is an entire Task View on Robust
Statistical Methods.

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problems with normality req. for ANOVA

2010-08-02 Thread Frank Harrell

In addition the poster did not tell us what is wrong with a 
nonparametric test.


Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Mon, 2 Aug 2010, Bert Gunter wrote:


My sympathies, but I don't think it's the business of list
contributors to facilitate stupidity.

Confidence interval for the p-value is nonsense. You could try
sensitivity analyses via simulation, though.

Cheers,

Bert Gunter
Genentech Nonclinical Biostatistics

On Mon, Aug 2, 2010 at 11:31 AM, wwreith reith_will...@bah.com wrote:


I am testing normality on the studetized residuals that are generated after 
performing ANOVA and yes I used Levene's test to see if the variances can be 
assumed equal. They infact are not, but I have found a formula for determining 
whether the p-value for ANOVA will become larger or smaller as a result of 
unequal variances and unequal sample sizes. Fortuneately it turns out the 
p-value is greater. Despite this the ANOVA test is still significant with 
p=.000.

The problem I have is that I am expected, by my client, to find a similiar 
formula that states which way the p-value would be pushed by a lack of 
normality. Despite numerous citations that ANOVA is robust to departures of 
normality my client does not care. They want numerical proof. This lead to 
looking for a method for estimating the effects non normality would have on the 
p-value for ANOVA. In other words can I build a confidence interval for the 
p-value? Hence the error term I am speaking of would be a the margin or error 
for p-value confidence interval.

William W. Reith III

Business Analytics
J9 SAC (757)-203-3400  Best Contact From 7:00am-4:00pm
J9 Office (757)-203-3772
Booz Office (757) 466-3253
Mobile (434)-989-7948


From: David Winsemius [via R] [ml-node+2310616-1859960724-371...@n4.nabble.com]
Sent: Monday, August 02, 2010 1:33 PM
To: Reith, William [USA]
Subject: Re: Problems with normality req. for ANOVA


On Aug 2, 2010, at 9:33 AM, wwreith wrote:



I am conducting an experiment with four independent variables each
of which
has three or more factor levels. The sample size is quite large i.e.
several
thousand. The dependent variable data does not pass a normality test
but
visually looks close to normal so is there a way to compute the
affect
this would have on the p-value for ANOVA or is there a way to
perform an
nonparametric test in R that will handle this many independent
variables.
Simply saying ANOVA is robust to small departures from normality is
not
going to be good enough for my client.


The statistical assumption of normality for linear models do not apply
to the distribution of the dependent variable, but rather to the
residuals after a model is estimated. Furthermore, it is the
homoskedasticity assumption that is more commonly violated and also
greater threat to validity. (And if you don't already know both of
these points, then you desperately need to review your basic modeling
practices.)


 I need to compute an error amount for
ANOVA or find a nonparametric equivalent.


You might get a better answer if you expressed the first part of that
question in unambiguous terminology.  What is error amount?

For the second part, there is an entire Task View on Robust
Statistical Methods.

--

David Winsemius, MD
West Hartford, CT

__
[hidden email]https://webmail.bah.com/OWA/UrlBlockedError.aspx mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



View message @ 
http://r.789695.n4.nabble.com/Problems-with-normality-req-for-ANOVA-tp2310275p2310616.html
To unsubscribe from Problems with normality req. for ANOVA, click here (link 
removed) =.


--
View this message in context: 
http://r.789695.n4.nabble.com/Problems-with-normality-req-for-ANOVA-tp2310275p2310738.html
Sent from the R help mailing list archive at Nabble.com.

       [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Specifying interactions in rms package... error

2010-08-02 Thread Frank Harrell


Hi Rob,

rms wants symmetry in the sense that the interactions need to use the 
same number and location of spline knots as the main effects.   So if 
using the * notation omit the main effects (which are generated 
automatically) and live with the equal knots.  Or use the restricted 
interaction operator %ia% to not use doubly nonlinear terms in forming 
the interactions, which is a sensible restriction.

Frank

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Mon, 2 Aug 2010, Rob James wrote:


I am encountering an error I do not know how to debug. The error arises
when I try to add an  interaction term involving two continuous
variables (defined using rcs functions) to an existing (and working) model.

The new model reads:

 model5 - lrm( B_fainting ~   gender+  rcs(exactage, 7) +
rcs(DW_nadler_bv, 7) +   rcs(drawtimefrom8am, 7)+  DW_firsttime+
  DW_race_eth +
 rcs(DW_BPS, 5)+  rcs(DW_BPD, 5)+  rcs(pulse3, 5)+  locat2+
(rcs(exactage, 3) * rcs(drawtimefrom8am, 3))
  )

This gives rise to the following error:


Error in dimnames(X)[[2]] - atr$colnames :
  length of 'dimnames' [2] not equal to array extent
In addition: Warning message:
In names(nmiss)[jz] - fname[asm != 9] :
  number of items to replace is not a multiple of replacement length


The sessionInfo() output is below, but I think relatively unremarkable.
The dataset is very large (about 550K observations, although the outcome
(B_fainting)  is relatively sparse with only 1500 events in this
sample.   I do not get this error when the interaction term is replaced
by two categorical variables (data not shown).  I can replicate this
error when there is just one continuous variable and one categorical
variable.


I'd welcome suggestions. Thank you in advance.





sessionInfo()
R version 2.11.0 (2010-04-22)
x86_64-pc-linux-gnu

locale:
 [1] LC_CTYPE=en_CA.UTF-8   LC_NUMERIC=C
LC_TIME=en_CA.UTF-8
 [4] LC_COLLATE=en_CA.UTF-8 LC_MONETARY=C
LC_MESSAGES=en_CA.UTF-8
 [7] LC_PAPER=en_CA.UTF-8   LC_NAME=C  LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_CA.UTF-8
LC_IDENTIFICATION=C

attached base packages:
[1] splines   stats graphics  grDevices utils datasets
methods   base

other attached packages:
[1] gmodels_2.15.0  rms_3.0-0   Hmisc_3.8-2 survival_2.35-8
odfWeave_0.7.14 XML_3.1-0
[7] lattice_0.18-8

loaded via a namespace (and not attached):
[1] cluster_1.13.1 gdata_2.8.0grid_2.11.0gtools_2.6.2   MASS_7.3-7

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract se(coef) from cph?

2010-08-06 Thread Frank Harrell

In an upcoming release of the rms package, all fit objects can be 
printed using LaTeX if putting LaTeX code directly to the console 
(this is optimized for Sweave).  You will be able to say print(fit, 
latex=TRUE).


Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Thu, 5 Aug 2010, David Winsemius wrote:



On Aug 5, 2010, at 4:03 PM, Biau David wrote:


Hello,

I am modeling some survival data wih cph (Design). I have modeled a
predictor
which showed non linear effect with restricted cubic splines. I
would like to
retrieve the se(coef) for other, linear, predictors.


The cph object has a var. The vcov function is an extractor
function. You would probably be using something like:

diag(vcov(fit))^(1/2)


This is just to make nice
LateX tables automatically.


Are you sure Frank has not already programed that for you somewhere?
Perhaps latex.cph?


I have the coefficients with coef().

How do I do that?

Thanks,

David Biau.



--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot the dependent variable against one of the predictors with other predictors as constant

2010-08-07 Thread Frank Harrell



There are many ways to do this.  Here is one.

install.packages('rms')
require(rms)
dd - datadist(x, y); options(datadist='dd')
f - ols(z ~ x + y)
plot(Predict(f))# plot all partial effects
plot(Predict(f, x)) # plot only the effect of x
plot(Predict(f, y)) # plot only the effect of y
f - ols(z ~ pol(x,2)*pol(y,2) # repeat, not assuming linearity

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Sat, 7 Aug 2010, Yi wrote:


Hi, folks,

Happy work in weekends _

My question is how to plot the dependent variable against one of the
predictors with other predictors as constant. Not for the original data, but
after prediction. It means y is the predicted value of the dependent
variables. The constane value of the other predictors may be the average or
some fixed value.

###
y=1:10
x=10:1
z=2:11
lin_model=lm(z~x+y)
x_new=11:2
###


How to plot predicted value of z from the regression model with x takes
x_new and y as a constant (let's say y=1)


I am thinking about using 'predict' command to generate the prediction of z
with the new data.frame but there should be a better way.

Thanks all.

Yi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic Regression in R (SAS -like output)

2010-08-09 Thread Frank Harrell




Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Mon, 9 Aug 2010, Harsh wrote:


Hello useRs,

I have a problem at hand which I'd think is fairly common amongst
groups were R is being adopted for Analytics in place of SAS.
Users would like to obtain results for logistic regression in R that
they have become accustomed to in SAS.

Towards this end, I was able to propose the Design package in R which
contains many functions to extract the various metrics that SAS
reports.


The replacement for Design, rms, has some new indexes.



If you have suggestions pertaining to other packages, or sample code
that replicates some of the SAS outputs for logistic regression, I
would be glad to hear of them.

Some of the requirements are:
- Stepwise variable selection for logistic regression


 an invalid procedure


- Choose base level for factor variables


 not relevant - get what you need from predicted values and 
differences in predicted values or contrasts - this automatically 
takes care of reference cells


  - The Hosmer-Lemeshow statistic

 obsolete: low power and sensitive to choice of binning


- concordant and discordant


see Hmisc's rcorr.cens


- Tau C statistic


Those are two different statistics.  tau and C are obtained by lrm in 
rms/Design.


Frank



Thank you for your suggestions.
Regards,
Harsh Singhal

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic Regression in R (SAS -like output)

2010-08-09 Thread Frank Harrell



Note that stepwise variale selection based on AIC has all the problems 
of stepwise variable selection based on P-values.  AIC is just a 
restatement of the P-Value.


Frank

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Mon, 9 Aug 2010, Gabor Grothendieck wrote:


On Mon, Aug 9, 2010 at 6:43 AM, Harsh singhal...@gmail.com wrote:

Hello useRs,

I have a problem at hand which I'd think is fairly common amongst
groups were R is being adopted for Analytics in place of SAS.
Users would like to obtain results for logistic regression in R that
they have become accustomed to in SAS.

Towards this end, I was able to propose the Design package in R which
contains many functions to extract the various metrics that SAS
reports.

If you have suggestions pertaining to other packages, or sample code
that replicates some of the SAS outputs for logistic regression, I
would be glad to hear of them.

Some of the requirements are:
- Stepwise variable selection for logistic regression
- Choose base level for factor variables
- The Hosmer-Lemeshow statistic
- concordant and discordant
- Tau C statistic



For stepwise logistic regression using AIC see:

library(MASS)
?stepAIC

For specifying reference level:

?relevel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic Regression in R (SAS -like output)

2010-08-09 Thread Frank Harrell



In the trivial case where all candidate predictors have one degree of 
freedom (which is unlikely as some things will be nonlinear or have  
2 categories), adding a variable if it increases AIC is the same as 
adding it if its chi-square exceeds 2.  This corresponds to an alpha 
level of 0.157 for a chi-square with 1 d.f.   At least AIC leads 
people to use a more realistic alpha (small alpha in stepwise 
regression leads to more bias in the retained regression 
coefficients).  But you still have serious multiplicity problems, and 
non-replicable models.


Things are different if you have a pre-defined group of variables you 
are thinking of adding.  Suppose that this group of 10 variables 
required 15 d.f.   Adding the group if AIC (based on 15 d.f.) 
increases wouldn't be a bad strategy.  This avoids the multiplicities 
of single-variable looks.


Frank

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Mon, 9 Aug 2010, Kingsford Jones wrote:


On Mon, Aug 9, 2010 at 10:27 AM, Frank Harrell f.harr...@vanderbilt.edu wrote:


Note that stepwise variale selection based on AIC has all the problems of
stepwise variable selection based on P-values.  AIC is just a restatement of
the P-Value.


I find the above statement very interesting, particularly because
there are common misconceptions in the ecological community that AIC
is a panacea for model selection problems and the theory behind
P-values is deeply flawed.  Can you direct me toward a reference for
better understanding the relation?

best,

Kingsford Jones




Frank

Frank E Harrell Jr   Professor and Chairman        School of Medicine
                    Department of Biostatistics   Vanderbilt University

On Mon, 9 Aug 2010, Gabor Grothendieck wrote:


On Mon, Aug 9, 2010 at 6:43 AM, Harsh singhal...@gmail.com wrote:


Hello useRs,

I have a problem at hand which I'd think is fairly common amongst
groups were R is being adopted for Analytics in place of SAS.
Users would like to obtain results for logistic regression in R that
they have become accustomed to in SAS.

Towards this end, I was able to propose the Design package in R which
contains many functions to extract the various metrics that SAS
reports.

If you have suggestions pertaining to other packages, or sample code
that replicates some of the SAS outputs for logistic regression, I
would be glad to hear of them.

Some of the requirements are:
- Stepwise variable selection for logistic regression
- Choose base level for factor variables
- The Hosmer-Lemeshow statistic
- concordant and discordant
- Tau C statistic



For stepwise logistic regression using AIC see:

library(MASS)
?stepAIC

For specifying reference level:

?relevel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Identification of Outliners and Extraction of Samples

2010-08-09 Thread Frank Harrell





On Mon, 9 Aug 2010, Alexander Eggel wrote:


Hello everybody,

I need to know which samples (S1-S6) contain a value that is bigger than the
median + five standard deviations of the column he is in. This is just an


Why not the 70th percentile plus 6 times the difference in the 85th 
and 75th percentiles  :-)


Frank

P.S.  See

@Article{fin06cal,
  author =   {Finney, David J.},
  title = 		 {Calibration guidelines challenge outlier 
practices},

  journal =  The American Statistician,
  year = 2006,
  volume =   60,
  pages ={309-313},
  annote =   {anticoagulant
therapy;bias;causation;ethics;objectivity;outliers;guidelines for
treatment of outliers;overview of types of outliers;letter to the 
editor and reply 61:187 May 2007}

}



example. Command should be applied to a data frame wich is a lot bigger
(over 100 columns). Any solutions? Thank you very much for your help!!!


s

   Samples A BCE
1 S1   1 2 3 7
2 S2   4NA   6 6
3 S3   7 8 9NA
4 S4   4 5NA   6
5 S5   2 5 6 7
6 S6   2 3 4 5

This loop works fine for a column without NA values. However it doesn't work
for the other columns. I should have a loop that I could apply to all
columns ideally in one command.

o - data.frame();
for (i in 1:nrow(s))
{
  dd - s[i,];
  if (dd$A = median(s$A, na.rm=TRUE) + 5 * sd(s$A, na.rm=TRUE)) o -
rbind(o,dd)

}

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple imputation, especially in rms/Hmisc packages

2010-08-09 Thread Frank Harrell



On Mon, 9 Aug 2010, Mark Seeto wrote:


Hello, I have a general question about combining imputations as well as a
question specific to the rms and Hmisc packages.

The situation is multiple regression on a data set where multiple
imputation has been used to give M imputed data sets. I know how to get
the combined estimate of the covariance matrix of the estimated
coefficients (average the M covariance matrices from the individual
imputations and add (M+1)/M times the between-imputation covariance
matrix), and I know how to use this to get p-values and confidence
intervals for the individual coefficients.

What I don't know is how to combine imputations to obtain the multiple
degree-of-freedom tests to test whether a set of coefficients are all 0.
If there were no missing values, this would be done using the general
linear test which involves fitting the full model and the reduced
model and getting an F statistic based on the residual sums of squares and
degrees of freedom. How does one correctly do this when there are multiple
imputations? In the language of Hmisc, I'm asking how the values in
anova(fmi) are calculated if fmi is the result of fit.mult.impute (I've
looked at the anova.rms code but couldn't understand it).


In ordinary regression, the leave some variables out and compare 
R^2-based F-test and the quicker contrast tests are equivalent.   In 
general we use the latter, which is expedient since it is easy to 
estimate the imputation-corrected covariance matrix.  In the rms and 
Hmisc packages, fit.mult.impute creates the needed fit object, then 
anova, contrast, summary, etc. give the correct Wald tests.



 

The second question I have relates to rms/Hmisc specifically. When I use
fit.mult.impute, the fitted values don't appear to match the values given
by the predict function on the original data. Instead, they match the
fitted values of the last imputation. For example,

library(rms)
set.seed(1)
x1 - rnorm(40)
x2 - 0.5*x1 + rnorm(40,0,3)

y - x1^2 + x2 + rnorm(40,0,3)
x2[40] - NA   # create 1 missing value in x2
a - aregImpute(~ y + x1 + x2, n.impute=2, nk=3) # 2 imputations

x2.imp1 - x2; x2.imp1[40] - a$imputed$x2[,1]  # first imputed x2 vector
x2.imp2 - x2; x2.imp2[40] - a$imputed$x2[,2]  # second imputed x2 vector

ols.imp1 - ols(y ~ rcs(x1,3) + x2.imp1)  # model on imputation 1
ols.imp2 - ols(y ~ rcs(x1,3) + x2.imp2)  # model on imputation 2

d - data.frame(y, x1, x2)
fmi - fit.mult.impute(y ~ rcs(x1,3) + x2, ols, a, data=d)

fmi$fitted
predict(fmi, d)

I would have expected the results of fmi$fitted and predict(fmi,d) to be
the same except for the final NA, but they are not. Instead, fmi$fitted is
the same as ols.imp2$fitted. Also,

anova(ols.imp2)
anova(fmi)

unexpectedly give the same result in the ERROR line. I would be most
grateful if anyone could explain this to me.


I need to add some more warnings.  For many of the calculations, the 
last imputation is used.


Frank



Thanks,
Mark
--
Mark Seeto
Statistician

National Acoustic Laboratories http://www.nal.gov.au/
A Division of Australian Hearing

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting confidence bands around regression line

2010-08-10 Thread Frank Harrell



Please give the prescription.  The article is not available on our 
extensive online library.  I wonder if the method can compete with the 
bootstrap.


Frank

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Tue, 10 Aug 2010, Michal Figurski wrote:


David,

I would consider myself intermediate in R, but a beginner in statistics.
I need a formula that would allow me to calculate confidence boundaries
of the regression line given the slope, intercept and their CIs (and
*any* range).

Passing-Bablok regression doesn't yet exist in R - I am developing it.
Therefore I am sure there is no predict method for it ;)

I believe I have provided sufficient data to address this problem, but
if that would help anyone, here is more:

# data frame
 a - structure(list(x = c(0.1, 1.43, 4.21, 3.67, 3.23, 7.72, 5.99,
9.16, 10.6, 9.84, 11.94, 12.03, 12.89, 11.26, 15.54, 15.58, 17.32,
17.65, 19.52, 20.48, 20.44, 20.51, 22.27, 23.58, 25.83, 26.04, 26.92,
28.44, 30.73, 28.78), y = c(1.08, 1.39, 1.84, 0.56, 7.23, 4.91, 3.35,
7.09, 3.16, 8.98, 16.37, 7.46, 15.46, 23.2, 4.63, 11.13, 15.68, 13.92,
26.44, 21.65, 21.01, 20.22, 22.69, 22.21, 23.6, 17.24, 45.24, 30.09, 40,
49.6)), .Names = c(x, y), row.names = c(NA, -30L), class = data.frame)

Then I run the regression procedure (in development - now part of the
'MethComp' package):
 print(PBreg(a))

# And the result of the Passing-Bablok regression on this data frame:
   Estimate  5%CI 95%CI
Intercept -4.306197 -9.948438 -1.374663
Slope  1.257584  1.052696  1.679290

The original Passing  Bablok article on this method has an easy
prescription for CIs on coefficients, so I implemented that. Now I need
a way to calculate CI boundaries for individual points - this may be a
basic handbook stuff - I just don't know it (I'm not a statistician). I
would appreciate if anyone could point me to a handbook or website where
it is described.

Regarding 2 - the predict method for 'nls' class currently *ignores* the
interval parameter - as it is stated in documentation.

Regards

--
Michal J. Figurski, PhD
HUP, Pathology  Laboratory Medicine
Biomarker Research Laboratory
3400 Spruce St. 7 Maloney
Philadelphia, PA 19104
tel. (215) 662-3413

On 2010-08-10 11:38, David Winsemius wrote:


On Aug 10, 2010, at 11:23 AM, Michal Figurski wrote:


David,

I may have stated my problem incorrectly - my problem is to *obtain
the coordinates* for confidence boundary lines. As input data I have
only CIs for slope and intercept.


Wouldn't you also need to specify the range over which these estimates
might be valid and to offer the means for the X values? What level of R
knowledge are you at? You have provided no data or code. Many R methods
offer predict methods that return CI's.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting confidence bands around regression line

2010-08-10 Thread Frank Harrell




Thanks Michael,

That's the method that Dana Quade taught me in his intro 
nonparametrics course at UNC in the mid 1970s, at least for a single 
predictor.  His method did not incorporate the shift you mentioned 
though.  The method looks robust.  Not sure about efficiency.


Frank

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Tue, 10 Aug 2010, Michal Figurski wrote:


Frank,

I had to order this article through Inter-Library Loan and wait for it
for a week!

I'll try to make it short. In Passing-Bablok the principle is to
calculate slopes between all possible pairs of points in the dataset,
and then to take a shifted median of those slopes, where the offset is
the number of slopes of value (-1). Because of this, bootstrap is out
of question - it would take too much time.

Let n be the number of data points, N - the number of slopes and K -
the offset. The locations of CI boundaries in the set of N slopes are
then calculated with formulas:
M1 - N - qnorm(1 - conf.level/2) * sqrt((n*(n-1)*(2*n+5))/18))/2
M2 - N - M1 + 1

CIs for intercept are calculated as medians of y(i) - slopes[M1,M2]*x(i)

I hope I don't confuse anyone.

The article has a mathematical derivations section and justification
for these formulas. It looks solid to me, although after 25 years the
flaws may be apparent to some of you.

--
Michal J. Figurski, PhD
HUP, Pathology  Laboratory Medicine
Biomarker Research Laboratory
3400 Spruce St. 7 Maloney
Philadelphia, PA 19104
tel. (215) 662-3413

On 2010-08-10 12:29, Frank Harrell wrote:


Please give the prescription. The article is not available on our
extensive online library. I wonder if the method can compete with the
bootstrap.

Frank

Frank E Harrell Jr Professor and Chairman School of Medicine
Department of Biostatistics Vanderbilt University

On Tue, 10 Aug 2010, Michal Figurski wrote:


David,

I would consider myself intermediate in R, but a beginner in statistics.
I need a formula that would allow me to calculate confidence boundaries
of the regression line given the slope, intercept and their CIs (and
*any* range).

Passing-Bablok regression doesn't yet exist in R - I am developing it.
Therefore I am sure there is no predict method for it ;)

I believe I have provided sufficient data to address this problem, but
if that would help anyone, here is more:

# data frame

a - structure(list(x = c(0.1, 1.43, 4.21, 3.67, 3.23, 7.72, 5.99,

9.16, 10.6, 9.84, 11.94, 12.03, 12.89, 11.26, 15.54, 15.58, 17.32,
17.65, 19.52, 20.48, 20.44, 20.51, 22.27, 23.58, 25.83, 26.04, 26.92,
28.44, 30.73, 28.78), y = c(1.08, 1.39, 1.84, 0.56, 7.23, 4.91, 3.35,
7.09, 3.16, 8.98, 16.37, 7.46, 15.46, 23.2, 4.63, 11.13, 15.68, 13.92,
26.44, 21.65, 21.01, 20.22, 22.69, 22.21, 23.6, 17.24, 45.24, 30.09, 40,
49.6)), .Names = c(x, y), row.names = c(NA, -30L), class =
data.frame)

Then I run the regression procedure (in development - now part of the
'MethComp' package):

print(PBreg(a))


# And the result of the Passing-Bablok regression on this data frame:
Estimate 5%CI 95%CI
Intercept -4.306197 -9.948438 -1.374663
Slope 1.257584 1.052696 1.679290

The original Passing  Bablok article on this method has an easy
prescription for CIs on coefficients, so I implemented that. Now I need
a way to calculate CI boundaries for individual points - this may be a
basic handbook stuff - I just don't know it (I'm not a statistician). I
would appreciate if anyone could point me to a handbook or website where
it is described.

Regarding 2 - the predict method for 'nls' class currently *ignores* the
interval parameter - as it is stated in documentation.

Regards

--
Michal J. Figurski, PhD
HUP, Pathology  Laboratory Medicine
Biomarker Research Laboratory
3400 Spruce St. 7 Maloney
Philadelphia, PA 19104
tel. (215) 662-3413

On 2010-08-10 11:38, David Winsemius wrote:


On Aug 10, 2010, at 11:23 AM, Michal Figurski wrote:


David,

I may have stated my problem incorrectly - my problem is to *obtain
the coordinates* for confidence boundary lines. As input data I have
only CIs for slope and intercept.


Wouldn't you also need to specify the range over which these estimates
might be valid and to offer the means for the X values? What level of R
knowledge are you at? You have provided no data or code. Many R methods
offer predict methods that return CI's.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] extracting the standard error in lrm

2010-08-11 Thread Frank Harrell





On Wed, 11 Aug 2010, david dav wrote:


Hi,
I would like to extract the coefficients of a logistic regression
(estimates and standard error as well) in lrm as in glm with

summary(fit.glm)$coef

Thanks
David


coef(fit)
sqrt(diag(vcov(fit)))

But these will not be very helpful except in the trivial case where 
everything is linear, nothing interacts, and factors have two levels.


Frank

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Multiple imputation, especially in rms/Hmisc packages

2010-08-11 Thread Frank Harrell




Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Tue, 10 Aug 2010, Mark Seeto wrote:


Hi Frank (and others),

Thank you for your comments in reply to my questions. I had not
encountered contrast tests before. I've looked in a few texts, but the
only place I could find anything about contrast tests was your
Regression Modeling Strategies book.

You wrote that the leave some variables out F-test and the contrast
tests are equivalent in ordinary regression, so I have tried to verify
this. However, I got different results for the two tests, so I'm hoping
you can point out what I'm doing incorrectly.

The null hypothesis is CB = 0, where C is the contrast matrix and B is
the vector of regression coefficients. I'm using the test statistic on
p. 189 of RMS:
W = (Cb)'(CVC')^{-1}(Cb), with b being the vector of coefficient
estimates and V being its covariance matrix.

The model is y ~ rcs(x1,3) + x2, and I want to test the null hypothesis
that the two x1 coefficients are both zero. This is equivalent to CB =
0, with C being the matrix
[0 1 0 0 (first row); 0 0 1 0 (second row)].

library(rms)
set.seed(1)
x1 - rnorm(50)
x2 - rnorm(50)
y - 1 + x1 + x2 + rnorm(50,0,3)

ols.full - ols(y ~ rcs(x1,3) + x2) # full model
ols.red - ols(y ~ x2) # reduced model

RSS.full - sum((ols.full$fitted - y)^2) # residual sums of squares
RSS.red - sum((ols.red$fitted - y)^2)

F - ((RSS.red - RSS.full)/(48 - 46))/(RSS.full/46)  # test statistic
for F-test
F
[1] 2.576335

1 - pf(F, 2, 46)
[1] 0.08698798

anova(ols.full)
   Analysis of Variance  Response: y

Factor d.f. Partial SS   MS  FP
x1  239.95283222 19.97641611 2.58 0.0870
 Nonlinear  1 0.07360774  0.07360774 0.01 0.9228
x2  147.17322335 47.17322335 6.08 0.0174
REGRESSION  383.79617922 27.93205974 3.60 0.0203
ERROR  46   356.67539200  7.75381287

anova(ols.full)[x1, F]
[1] 2.576335# This confirms that the result of anova is the same as
that of the leave some variables out F-test.

V - ols.full$var
C - matrix(c(0,1,0,0, 0,0,1,0), nrow=2, byrow=TRUE)
b - ols.full$coef
W - t(C %*% b) %*% solve(C %*% V %*% t(C)) %*% (C %*% b)

1 - pchisq(W, 2)
  [,1]
[1,] 0.07605226

Have I used the correct distribution for W? Should I expect the p-values
from the two tests to be exactly the same?


Mark - you can see the code for this at the bottom of anova.rms. 
Compute W, divide by numerator d.f., then compute P-value using F with 
numerator and error d.f.


Frank

 

Thanks in advance; I really appreciate any help you can give.

Mark

Frank Harrell wrote:


On Mon, 9 Aug 2010, Mark Seeto wrote:


Hello, I have a general question about combining imputations as well
as a
question specific to the rms and Hmisc packages.

The situation is multiple regression on a data set where multiple
imputation has been used to give M imputed data sets. I know how to get
the combined estimate of the covariance matrix of the estimated
coefficients (average the M covariance matrices from the individual
imputations and add (M+1)/M times the between-imputation covariance
matrix), and I know how to use this to get p-values and confidence
intervals for the individual coefficients.

What I don't know is how to combine imputations to obtain the multiple
degree-of-freedom tests to test whether a set of coefficients are all 0.
If there were no missing values, this would be done using the general
linear test which involves fitting the full model and the reduced
model and getting an F statistic based on the residual sums of
squares and
degrees of freedom. How does one correctly do this when there are
multiple
imputations? In the language of Hmisc, I'm asking how the values in
anova(fmi) are calculated if fmi is the result of fit.mult.impute (I've
looked at the anova.rms code but couldn't understand it).


In ordinary regression, the leave some variables out and compare
R^2-based F-test and the quicker contrast tests are equivalent.   In
general we use the latter, which is expedient since it is easy to
estimate the imputation-corrected covariance matrix.  In the rms and
Hmisc packages, fit.mult.impute creates the needed fit object, then
anova, contrast, summary, etc. give the correct Wald tests.




The second question I have relates to rms/Hmisc specifically. When I use
fit.mult.impute, the fitted values don't appear to match the values
given
by the predict function on the original data. Instead, they match the
fitted values of the last imputation. For example,

library(rms)
set.seed(1)
x1 - rnorm(40)
x2 - 0.5*x1 + rnorm(40,0,3)

y - x1^2 + x2 + rnorm(40,0,3)
x2[40] - NA   # create 1 missing value in x2
a - aregImpute(~ y + x1 + x2, n.impute=2, nk=3) # 2 imputations

x2.imp1 - x2; x2.imp1[40] - a$imputed$x2[,1]  # first imputed x2
vector
x2.imp2 - x2; x2.imp2[40] - a$imputed$x2[,2]  # second imputed x2
vector

ols.imp1 - ols(y ~ rcs(x1,3) + x2

Re: [R] Plotting confidence bands around regression line

2010-08-11 Thread Frank Harrell




Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Wed, 11 Aug 2010, Michal Figurski wrote:


Peter, Frank, David and others,

Thank you all for your ideas. I understand your lack of trust in PB
method. Setting that aside (it's beyond me anyways), please see below
what I have finally came up with to calculate the CI boundaries. Given
the slope and intercept with their 05%  95% CIs, and a range of x = 1:50 :

   ints= c(-1, 0, 1)  # c(int05%, int, int95%)
   slos= c(0.9, 1, 1.1)   # c(slo05%, slo, slo95%)
   CIs = data.frame(x=1:50, lo=NA, hi=NA)
   for (i in 1:50) {
   CIs$lo[i] = min(ints + slos * CIs$x[i])
   CIs$hi[i] = max(ints + slos * CIs$x[i])
   }

It looks like it works to me. Does it make sense?


Doesn't look like it takes the correlation of slope and intercept into 
account but I may not understand.




Now, what about a 4-parameter 'nls' model? Do you guys think I could use
the same approach?


This problem seems to cry out for one of the many available robust 
regression methods in R.


Frank



Best regards,

--
Michal J. Figurski, PhD
HUP, Pathology  Laboratory Medicine
Biomarker Research Laboratory
3400 Spruce St. 7 Maloney
Philadelphia, PA 19104
tel. (215) 662-3413

On 2010-08-10 13:12, Peter Dalgaard wrote:

Michal Figurski wrote:


# And the result of the Passing-Bablok regression on this data frame:
 Estimate  5%CI 95%CI
Intercept -4.306197 -9.948438 -1.374663
Slope  1.257584  1.052696  1.679290

The original Passing  Bablok article on this method has an easy
prescription for CIs on coefficients, so I implemented that. Now I need
a way to calculate CI boundaries for individual points - this may be a
basic handbook stuff - I just don't know it (I'm not a statistician).


The answer is that you can't. You can't even do it with ordinary linear
regression without knowing the correlation between slope and intercept.
However, if you can get a CI for the intercept then you could subtract
x0 from all the x and get a CI for the value at x0.

(This brings echos from a distant past. My master's thesis was about
some similar median-type estimators. I can't remember whether I looked
at the Passing-Bablok paper at the time (1985!!) but my general
recollection is that this group of methods is littered with unstated
assumptions.)





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Plotting confidence bands around regression line

2010-08-11 Thread Frank Harrell



I may be missing something but I don't see how PB handles errors in 
variables any differently than other regression methods that ignore 
this problem.


Frank

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Wed, 11 Aug 2010, S Ellison wrote:





Frank Harrell f.harr...@vanderbilt.edu 11/08/2010 17:02:03 



This problem seems to cry out for one of the many available robust
regression methods in R.


Not sure that would be much more appropriate, although it would
_appear_ to work. The PB method is a sort of nonparametric/robust
approach to an errors-in-variables problem, intended to provide an
indication of consistency of results between two different measurement
methods, often with similar error variance. So the aim is to handle the
error-in-variable problem at least consistently, to avoid the bias that
results from assuming no error in predictors. The M-estimator and
related robust regression methods in things like MASS and robustbase
don't handle errors in the predictors. Of course, with small errors in
predictors that won't matter much; rlm and the like will be pretty much
as defensible then as they ever are.

But perhaps one could construct a more formal robust equivalent of
error-in-variable regression by using a max likelihood functional
relationship model with bivariate t (choosing arbitrarily low df)
instead of bivariate gaussian errors? Unfortunately I haven't tried
that, so no help beyond the thought ...

Steve Ellison

***
This email and any attachments are confidential. Any u...{{dropped:9}}


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Growth Curves with lmer

2010-08-11 Thread Frank Harrell



Classification accuracy is an improper scoring rule, and one of the 
problems with it is that the proportional classified correctly can be 
quite good even if the model uses no predictors.  [Hence omitting the 
intercept is also potentially problematic.]


Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Wed, 11 Aug 2010, Michael Scharkow wrote:


Dear all,

I have some growth curve data from an experiment that I try to fit using
lm and lmer. The curves describe the growth of classification accuracy
with the amount of training data t, so basically

y ~ 0 + t (there is no intercept because y=0 at t0)

Since the growth is somewhat nonlinear *and* in order to estimate the
treatment effect on the growth curve, the final model is

y ~ 0 + t + t.squared + t:treat + t,squared:treat

this yields:
  t   t.sq   t:treat   t.sq:treat
  1.08   -0.0070.39   -0.0060

This fits the data fairly well, but I have replicated data for 12
different classifiers. First, I tried 12 separate regressions which
yielded results with different positive values for t and t:treat.

Finally, I tried to estimate a varying intercept model using lmer

lmer(y ~ 0+t+t.squared+t:treat+t,squared:treat+(0+t+t.squared+t:treat
+t,squared:treat | classifier)

The fixed effects are similar to the pooled regression, but most of the
random effects for t and t:treat are implausible (negative). What's
wrong with the lmer model? Did I misspecify something?

Greetings,
Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to compare the effect of a variable across regression models?

2010-08-13 Thread Frank Harrell



David,

In the Cox and many other regression models, the effect of a variable 
is context-dependent.  There is an identifiability problem in what you 
are doing, as discussed by


@ARTICLE{for95mod,
  author = {Ford, Ian and Norrie, John and Ahmadi, Susan},
  year = 1995,
  title = {Model inconsistency, illustrated by the {Cox} proportional 
hazards

  model},
  journal = Stat in Med,
  volume = 14,
  pages = {735-746},
  annote = {covariable adjustment; adjusted estimates; baseline 
imbalances;

   RCT; model misspecification; model identification}
}

One possible remedy, which may not work for your goals, is to embed 
all models in a grand model that is used for inference.


When coefficients ARE comparable in some sense, you can use the 
bootstrap to get confidence bands for differences in regressor effects 
between models.


Frank

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Fri, 13 Aug 2010, Biau David wrote:


Hello,

I would like, if it is possible, to compare the effect of a variable across
regression models. I have looked around but I haven't found anything. Maybe
someone could help? Here is the problem:

I am studying the effect of a variable (age) on an outcome (local recurrence:
lr). I have built 3 models:
- model 1: lr ~ age  y = \beta_(a1).age
- model 2: lr ~ age +  presentation variables (X_p)y = \beta_(a2).age +
\BETA_(p2).X_p
- model 3: lr ~ age + presentation variables + treatment variables( X_t)
  y = \beta_(a3).age  + \BETA_(p3).X_(p) + \BETA_(t3).X_t

Presentation variables include variables such as tumor grade, tumor size, etc...
the physician cannot interfer with these variables.
Treatment variables include variables such as chemotherapy, radiation, surgical
margins (a surrogate for adequate surgery).

I have used cph for the models and restricted cubic splines (Design library) for
age. I have noted that the effect of age decreases from model 1 to 3.

I would like to compare the effect of age on the outcome across the different
models. A test of \beta_(a1) = \beta_(a2) = \beta_(a3) and then two by two
comparisons or a global trend test maybe? Is that possible?

Thank you for your help,


David Biau.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] val.prob in the Design package - Calibrated Brier Score

2010-08-13 Thread Frank Harrell



Please check the code.  I hope that Brier is on the uncalibrated 
probabilities.


Calibrated probabilities are from 1/(1+exp(-[a+b logit(uncalibrated 
probs)]) where a and b are maximum likelihood estimators (they will be 
0 and 1 in training data).


Frank

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Fri, 13 Aug 2010, Kim Fernandes wrote:


Hello,

I am using the val.prob function in the Design package.  I understand how
the Brier quadratic error score is calculated, but I do not know how the
Brier score computed on the calibrated rather than raw predicted
probabilities (B cal) is calculated.  My question is: how are the calibrated
probabilities calculated?  Any explanation of this, or references to
explanations of this, would be most greatly appreciated.

Thanks for your help,

Kim

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Re : Re : How to compare the effect of a variable across regression models?

2010-08-13 Thread Frank Harrell




Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Fri, 13 Aug 2010, Biau David wrote:


This is all very interesting indeed.

so I appreciate that the effect of a variable will depend on the 
presence of other variables and that the effect of this variable has 
a statistical meaning only in a specific model. With the 
particularity of inconsistency if data arise from non normal 
distribution as for the Cox model.


It's worse than that.  In the Cox PH model the degree of non 
proportional hazards depends on which variables are adjusted for.




I have two comments though; the first is statistical and the second more 
methodological:

- first, either I am strict on the inconsistency phenomenon and, given the fact 
that it is very unlikely that I even come close to
finding all relevant variables to a model, all models I may build are 
necessarily biased OR I have to be less strict. How to use tests
(score, Wald, LR) to compare models when, at most, only one of them is unbiased?

- second (relevance first depend on the answer to comment above I guess), I 
study the effect of age without adustment (model 1), after
adjusting for presentation variables (model 2) and after adjusting for 
presentation and treatment variables (model 3). I certainly do not
pretend that I am truly looking at the effect of age and I know that what I 
call age across the different models is not the same
(otherwise I would not compare it or I would hope not to find any difference). 
I am merely looking the effect of something that is
measured by 'age' and that emcompasses imbalances at presentation, treatment, 
and remaining unknown counfounders (first model),
imbalances in treatment and remaining unknown counfounders (second model), and 
imbalances in the remaining confounders (third model). The
remaining unknown confounders potentially include a true effect of age, if such 
a thing exists, and of other variables I don't know about
(which in fact are still presentation and treatment variables since there are 
no other possibilities!).

As everything fits in a cohesive interpretation: older patients presents with 
worse tumors and are undertreated (as shown by descriptive
statistics) AND as this is shown by the modelisation of the effect by an 
erosion of the effect of 'age' from model 1 to 3, I was tempted,
with all due precautions, to conclude that the increased risk for older 
patients was due in part because they present with worse tumors,
in part because the are undertreated, and in part because of something else. 
Would that be overinterpreting the data? Should I even drop
the idea of building and presenting these models?


I'm not sure.   But often we learn by predicting a predictor from the 
other predictors.


Frank



Thanks again,
 
David Biau.

_
De : Bert Gunter gunter.ber...@gene.com
À : Biau David djmb...@yahoo.fr
Cc : Frank Harrell f.harr...@vanderbilt.edu; r help list 
r-help@r-project.org
Envoyé le : Ven 13 août 2010, 18h 22min 58s
Objet : Re: [R] Re : How to compare the effect of a variable across regression 
models?

Just to amplify a bit on what Frank said...

Except in special circumstances (othogonal designs, say), regression
models are only guaranteed to produce useful predictions -- they may
not tell you anything meaningful about the relative effects of the
regressors because, as Frank said, that depends on both the study
design AND the true effects.  So, for example, typically if one
removes a regressor from a model and refits, the values of the
remaining coefficients will change.

Moreover, it is difficult to understand even conceptually what the
effect of a variable should mean if, as is typical, it interacts with
other variables: the effect of the variable then depends on what the
values of the other variables are.

A very simple example of this that I used to use in teaching was the
effect of barbituates and alcohol on sleepiness. In the absence of
barbituates, the effect of a few drinks is to make you modestly
sleepy; in the presence of barbituates, the effect of a few drinks is
to make you modestly dead!

The point is that we live in a multivariate interactive world. Terms
like the effect of a variable derive from univariate thinking and
need to be very carefully defined to make sense in such a world -- and
then studies need to be carefully designed -- happenstance data are
rarely sufficient -- to quantify them.

.. Not what scientists like to hear, I think, but this is the reality.

Further comments and criticisms welcome, of course.

Cheers,
Bert Gunter
Genentech Nonclinical Statistics

On Fri, Aug 13, 2010 at 6:59 AM, Biau David djmb...@yahoo.fr wrote:
 OK,

 thank you very much for the answer.I will look into that. Hopefully I'll find
 smoething that will work out

Re: [R] Stepwise Regression + entry/exit significance level

2010-08-14 Thread Frank Harrell

The values of slentry and slstay that will avoid ruining the 
statistical properties of the result are slentry=1.0 and slstay=1.0.


Frank

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Sat, 14 Aug 2010, Shubha Vishwanath Karanth wrote:


Hi R,



Does the step function used to perform stepwise regression has the
option to specify the entry/exit significance levels for the independent
variables? (This is similar to the 'slentry' and 'slstay' option in
'Proc reg' of SAS.). Or do we have any other package which does the
above? Thanks.





Thanks and Regards,

Shubha





This e-mail may contain confidential and/or privileged i...{{dropped:13}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to add lines to lattice plot produced by rms::bplot

2010-08-14 Thread Frank Harrell



Once you guys figure all this out, I'm glad to modify bplot to pass 
more arguments lattice if needed.


Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Fri, 13 Aug 2010, David Winsemius wrote:



On Aug 13, 2010, at 11:25 PM, Duncan Mackay wrote:


Hi David

I do not know if you have done something like this.


I had tried a few efforts like that, starting with an examination of
str(bp.plot) as you demonstrate.


I tried str(bp.plot) which gave the section about the regions (for
colours) as:
$ panel.args.common:List of 8
 ..$ x  : num [1:2500] 27 28 29 29.9 30.9 ...
 ..$ y  : num [1:2500] 141 141 141 141 141 ...
 ..$ z  : num [1:2500] -1.43 -1.41 -1.39 -1.36 -1.34 ...
 ..$ at : num [1:10] -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
 ..$ region : logi FALSE
 ..$ zlab   :List of 3
 .. ..$ label: chr log odds
 .. ..$ rot  : num 90
 .. ..$ cex  : num 1
 ..$ labels : logi TRUE
 ..$ contour: logi TRUE


I tried (with a bplot object named bldLT40):

bldLT40$legend$right$args$key$at - c(-0.233, seq(.50, 2.25, by=0.25),
seq(2.5, 5.0, by=0.5), 6:10)

... and then tried

bldLT40$panel.args$at - c(-0.233, seq(.50, 2.25, by=0.25), seq(2.5,
5.0, by=0.5), 6:10)

Neither of these efforts changed the boundaries beteen colors in the
plot area. The first effort changed the legend scal,e but that just
created a misalignment of the colors of plot area and the legend. I
would be interested in either a strategy that lets one alter the color
level changes of the z variable (which in bplot-created objects is
zhat, or lets one specify the values at which contour lines are
drawn in contourplot.

Thanks for your efforts.

--
David.




I added the col.region and colours from a plot levelplot that I had
done to see what would occur to the trellis parameters. No colours
were produced when plotted.
bp.plot - bplot(p, lfun=contourplot, color.key = TRUE, col.regions
=
c
(#FF
,#00,#A9E2FF,#8080FF,#FF,#FFD18F,#FF) )

$ panel.args.common:List of 10
 ..$ x  : num [1:2500] 27 28 29 29.9 30.9 ...
 ..$ y  : num [1:2500] 141 141 141 141 141 ...
 ..$ z  : num [1:2500] -1.43 -1.41 -1.39 -1.36 -1.34 ...
 ..$ at : num [1:10] -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
 ..$ region : logi FALSE
 ..$ color.key  : logi TRUE
 ..$ zlab   :List of 3
 .. ..$ label: chr log odds
 .. ..$ rot  : num 90
 .. ..$ cex  : num 1
 ..$ labels : logi TRUE
 ..$ contour: logi TRUE
 ..$ col.regions: chr [1:7] #FF #00 #A9E2FF
#8080FF ...

So it has been added to the panel.args.common, whether you can
access these are another matter.

I then tried
bp.plot - bplot(p, lfun=contourplot, par.settings = list(axis.text
= list(cex = 0.65)), color.key = TRUE, col.regions =
c
(#FF
,#00,#A9E2FF,#8080FF,#FF,#FFD18F,#FF) )

which changed the size of the axis text so it may be the case of
having to add the col.regions etc to the appropriate list in
par.settings

I'll leave you to amend and access the colours.
You may have to add values for the wireframe/levelplot arguments
like at  etc. and col.regions (I think that is the function) to
produce an appropriate colour range of your choice

It is a while since I have delved into these sorts of plots.
Need some sustenance.

Regards

Duncan

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
ARMIDALE NSW 2351
Email home: mac...@northnet.com.au



At 10:33 14/08/2010, you wrote:


I have a plot produced by function bplot (package = rms) that is
really a lattice plot (class=trellis). It is similar to this plot
produced by a very minor modification of the first example on the
bplot help page:

require(rms)
n - 1000# define sample size
set.seed(17) # so can reproduce the results
age- rnorm(n, 50, 10)
blood.pressure - rnorm(n, 120, 15)
cholesterol- rnorm(n, 200, 25)
sex- factor(sample(c('female','male'), n,TRUE))
label(age)- 'Age'  # label is in Hmisc
label(cholesterol)- 'Total Cholesterol'
label(blood.pressure) - 'Systolic Blood Pressure'
label(sex)- 'Sex'
units(cholesterol)- 'mg/dl'   # uses units.default in Hmisc
units(blood.pressure) - 'mmHg'

# Specify population model for log odds that Y=1
L - .4*(sex=='male') + .045*(age-50) +
 (log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male'))
# Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)]
y - ifelse(runif(n)  plogis(L), 1, 0)

ddist - datadist(age, blood.pressure, cholesterol, sex)
options(datadist='ddist')

fit - lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4)),
  x=TRUE, y=TRUE)
p - Predict(fit, age, cholesterol, sex='male', np=50) # vary sex
last
bp.plot - bplot(p, lfun=contourplot)
bp.plot

I have tried a variety of efforts at using update (which I assume
is a
lattice function although I can find no help page for it. It does
appear in some of the lattice hep pages and my

Re: [R] confidence Intervals for predictions in GLS

2010-08-14 Thread Frank Harrell



install.packages('rms')
require(rms)
?Gls
?plot.Predict

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Sat, 14 Aug 2010, Camilo Mora wrote:


Hi everyone:

Is there a function in R to calculate the confidence intervals for the
predictions of a GLS(Generalized Least Square) model?

The function predict gives confidence intervals for the predictions
of other types of models (lm, glm, etc) but not gls.

Any input will be much appreciated,

Best,

Camilo

Camilo Mora, Ph.D.
Department of Biology
Dalhouisie University
Halifax, Canada
Phone: (902) 494-7720
http://as01.ucis.dal.ca/fmap/people.php?pid=53

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to add lines to lattice plot produced by rms::bplot

2010-08-14 Thread Frank Harrell




Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Sat, 14 Aug 2010, David Winsemius wrote:



On Aug 14, 2010, at 9:59 AM, Frank Harrell wrote:



Once you guys figure all this out, I'm glad to modify bplot to pass
more arguments lattice if needed.


As always, Frank, I appreciate your support. In this case I think it's
not needed. What seems to be needed is simply the correct use of the
at argument. (I thought I had tried this before but apparently
mucked it up somehow.)  This gives the desired color levels separation
and labeling of the default levelplot version of bplot output:

  bldLT40 - bplot(mdl.pred , perim=boundaries, at=c(-0.233, seq(.50,
2.25, by=0.25), seq(2.5, 5.0, by=0.5), 6:10) )
 bldLT40

And this produces the expected output with its contouplot version:

bldLT40c - bplot(mdl.pred , perim=boundaries, lfun=contourplot,
at=c(-0.233, seq(.50, 2.25, by=0.25), seq(2.5, 5.0, by=0.5), 6:10) )
 bldLT40c

My only quibble with that last one was that the labels had three
digits to the right of the decimal pt but that happily went away when
I changed the low end to 0.25. All is good here.

--
David.


Excellent.  Thanks David.  If you think of anything that would be a 
good addition to the examples or other parts of the help file, shoot 
me a few lines and I'll paste them in.


Frank



Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
Department of Biostatistics   Vanderbilt
University

On Fri, 13 Aug 2010, David Winsemius wrote:



On Aug 13, 2010, at 11:25 PM, Duncan Mackay wrote:


Hi David

I do not know if you have done something like this.


I had tried a few efforts like that, starting with an examination of
str(bp.plot) as you demonstrate.


I tried str(bp.plot) which gave the section about the regions (for
colours) as:
$ panel.args.common:List of 8
..$ x  : num [1:2500] 27 28 29 29.9 30.9 ...
..$ y  : num [1:2500] 141 141 141 141 141 ...
..$ z  : num [1:2500] -1.43 -1.41 -1.39 -1.36 -1.34 ...
..$ at : num [1:10] -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
..$ region : logi FALSE
..$ zlab   :List of 3
.. ..$ label: chr log odds
.. ..$ rot  : num 90
.. ..$ cex  : num 1
..$ labels : logi TRUE
..$ contour: logi TRUE


I tried (with a bplot object named bldLT40):

bldLT40$legend$right$args$key$at - c(-0.233, seq(.50, 2.25,
by=0.25),
seq(2.5, 5.0, by=0.5), 6:10)

... and then tried

bldLT40$panel.args$at - c(-0.233, seq(.50, 2.25, by=0.25), seq(2.5,
5.0, by=0.5), 6:10)

Neither of these efforts changed the boundaries beteen colors in the
plot area. The first effort changed the legend scal,e but that just
created a misalignment of the colors of plot area and the legend. I
would be interested in either a strategy that lets one alter the
color
level changes of the z variable (which in bplot-created objects is
zhat, or lets one specify the values at which contour lines are
drawn in contourplot.

Thanks for your efforts.

--
David.




I added the col.region and colours from a plot levelplot that I had
done to see what would occur to the trellis parameters. No colours
were produced when plotted.
bp.plot - bplot(p, lfun=contourplot, color.key = TRUE, col.regions
=
c
(#FF
,#00,#A9E2FF,#8080FF,#FF,#FFD18F,#FF) )

$ panel.args.common:List of 10
..$ x  : num [1:2500] 27 28 29 29.9 30.9 ...
..$ y  : num [1:2500] 141 141 141 141 141 ...
..$ z  : num [1:2500] -1.43 -1.41 -1.39 -1.36 -1.34 ...
..$ at : num [1:10] -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
..$ region : logi FALSE
..$ color.key  : logi TRUE
..$ zlab   :List of 3
.. ..$ label: chr log odds
.. ..$ rot  : num 90
.. ..$ cex  : num 1
..$ labels : logi TRUE
..$ contour: logi TRUE
..$ col.regions: chr [1:7] #FF #00 #A9E2FF
#8080FF ...

So it has been added to the panel.args.common, whether you can
access these are another matter.

I then tried
bp.plot - bplot(p, lfun=contourplot, par.settings = list(axis.text
= list(cex = 0.65)), color.key = TRUE, col.regions =
c
(#FF
,#00,#A9E2FF,#8080FF,#FF,#FFD18F,#FF) )

which changed the size of the axis text so it may be the case of
having to add the col.regions etc to the appropriate list in
par.settings

I'll leave you to amend and access the colours.
You may have to add values for the wireframe/levelplot arguments
like at  etc. and col.regions (I think that is the function) to
produce an appropriate colour range of your choice

It is a while since I have delved into these sorts of plots.
Need some sustenance.

Regards

Duncan

Duncan Mackay
Department of Agronomy and Soil Science
University of New England
ARMIDALE NSW 2351
Email home: mac...@northnet.com.au



At 10:33 14/08/2010, you wrote:


I have a plot produced by function bplot (package = rms) that is
really a lattice plot (class=trellis). It is similar to this plot
produced by a very minor modification of the first example

Re: [R] HMisc/rms package questions

2010-08-17 Thread Frank Harrell




Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Tue, 17 Aug 2010, Rob James wrote:


1) How does one capture the plots from the plsmo procedure? Simply
inserting a routing call to a graphical device (such as jpeg, png, etc)
and then running the plsmo procedure (and then dev.off()) does not route
the output to the file system. 1b) Related to above, has anyone thought
of revising the plsmo procedure to use ggplot? I'd like to capture
several such graphs into a faceted arrangement.


Hi Rob,

plsmo in Hmisc uses base graphics, and I have captured its output many 
times using pdf() or postscript().


I'll bet that Hadley Wickham has an example that will help.  For 
lattice there is panel.plsmo.




2) The 2nd issue is more about communications than software. I have
developed a model using lrm() and am using plot to display the model.
All that is  fairly easy. However, my coauthors are used to traditional
methods, where baseline categories are rather broadly defined (e.g.
males, age 25-40, height 170-180cm, BP 120-140, etc) and results are
reported as odds-ratios, not as probabilities of outcomes.

Therefore, and understandably, they are finding the graphs which arise
from lrm-Predict-plot difficult to interpret. Specifically, in one
graph, the adjusted to population is defined one way, and in another
graph of the same model (displaying new predictors) there will be a new
adjusted to population. Sometimes the adjusted populations are
substantially distinct, giving rise to event rates that vary
dramatically across graphs. This can prove challenging when trying to
present the set of graphs as parts of a whole.  It all makes sense; it
just adds complexity to introducing these new methods.


I very simple example might help us with this one.

But odds ratios resulting from categorizing continuous variables are 
invalid.  They do not have the claimed interpretation.  In fact they 
have no interpretation in the sense that their interpretation is a 
function of the entire set of sample values.  You can get whatever 
odds ratios you need (with exact interpretations) using summary or 
contrast.  You can also modify plot to plot relative odds, relative to 
something of your choosing.


Frank

 

One strategy might be to manually define the baseline population across
graphs; this way I could attempt to impose some content-specific
coherence to the graphs, by selecting the baseline populations. Clearly
this is do-able, but I have yet to see it done. I'd welcome suggestions
and comments.

Thanks,

Rob

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R reports

2010-08-19 Thread Frank Harrell



What do low level proc print and proc report have on Sweave or
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/StatReport/summary.pdf?
If proc print and proc report are 4G, let's move back a generation.

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Thu, 19 Aug 2010, Donald Paul Winston wrote:



I don't see much in the way of an ability to write reports in R the way you
can with SAS. You basically have to write a program with R in a 3G way
unlike SAS with it's 4G proc print and proc report.

Are there similar R functions and packages?
--
View this message in context: 
http://r.789695.n4.nabble.com/R-reports-tp2330733p2330733.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ROCR predictions

2010-08-19 Thread Frank Harrell



At the heart of this you have a problem in incomplete conditioning. 
You are computing things like Prob(X  x) when you know X=x.  Working 
with a statistician who is well versed in probability models will 
undoubtedly help.


Frank

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Thu, 19 Aug 2010, Assa Yeroslaviz wrote:


Hello everybody,

yes I'm sorry. I can see it is not so easy to understand.
I'l try to explain a bit more. The experiment was used to compare two
(protein domain) data bases and find out whether or not the results founded
in one are comparable to the second DB.
the first column shows the list of the various inputs in the DB, the second
lists the various domains for each gene. the p-value column calculates the
probability that the found in column four (Expected) to be found by chance.
in column five the expected values was listed.
The calculation of the TP,TN,FP,FN was made many times, each time with a
different p-value (from p=1,...,p=10E-12) as a threshold to calculate the
various values of TP,TN, etc.

The goal of this calculation was to find the optimal p-value wit ha maximum
of TP and a minimum of FP.

To do so I thought about making the column of p-values my predictions and
the values in the column Is.Expected (TRUE,FALSE) to my labels.
This how I calculated my first ROC curve:

pValue - read.delim(file = p=1.txt, as.is= TRUE)
desc1 - pValue[[p.value]]
label1 - pValue[[Is.Expected]] # after changing the values of TRUE = 0,

FALSE = 1


pred - prediction(desc1, label1)
perf - performance(pred, tpr, fpr)
plot(perf, colorsize = TRUE)


my question are as follow:
1. Am I right in my way of thinkning, that the p-values here are
predictions?
I know you said I need to decided it for myself, but I'm not sure. If they
are, than I will have the same predictions for each and every calculation of
ROCR. Will it make any difference at the prediction?
2. how can i calculate the other p-values thresholds? Do I need to do each
separately, or is there a way of combining them?

I hope you can still help we with some hints or further advieces.

Thanks

Assa

On Wed, Aug 18, 2010 at 07:55, Claudia Beleites cbelei...@units.it wrote:


Dear Assa,

you need to call prediction with continuous predictions and a _binary_ true
class label.

You are the only one who can tell whether the p-values are actually
predictions  and what the class labels are. For the list readers p is just
the name of whatever variable, and you didn't even vaguely say what you try
to classify, nor did you offer any explanation of what the columns are.

The only information we get from your table is that p-value has small and
continuous values. From what I see the p-values could also be fitting errors
of the predictions (e.g. expressed as a probability that the similarity to
the predicted class is random).

Claudia

Assa Yeroslaviz wrote:


Dear Claudia,

thank you for your fast answer.
I add again the table of the data as an example.

Protein ID  Pfam Domain p-value ExpectedIs
Expected True Postive False Negative False Positive  True Negative
NP_11.2 APH 1.15E-05APH TRUE1   0   0
  0
NP_11.2 MutS_V  0.0173  APH FALSE   0   0   1   0
NP_62.1 CBS 9.40E-08CBS TRUE1   0   0
  0
NP_66.1 APH 3.83E-06APH TRUE1   0   0
  0
NP_66.1 CobU0.009   APH FALSE   0   0   1   0
NP_66.1 FeoA0.3975  APH FALSE   0   0   1   0
NP_66.1 Phage_integr_N  0.0219  APH FALSE   0   0   1
  0
NP_000161.2 Beta_elim_lyase 6.25E-12Beta_elim_lyase
  TRUE1   0   0   0
NP_000161.2 Glyco_hydro_6   0.002   Beta_elim_lyase FALSE   0
  0   1   0
NP_000161.2 SurE0.0059  Beta_elim_lyase FALSE   0   0
  1   0
NP_000161.2 SapB_2  0.0547  Beta_elim_lyase FALSE   0   0
  1   0
NP_000161.2 Runt0.1034  Beta_elim_lyase FALSE   0   0
  1   0
NP_000204.3 EGF 0.004666118 EGF TRUE1   0   0
  0
NP_000229.1 PAS 3.13E-06PAS TRUE1   0   0
  0
NP_000229.1 zf-CCCH 0.2067  PAS FALSE   0   1   1
  0
NP_000229.1 E_raikovi_mat   0.0206  PAS FALSE   0   0   0
  0
NP_000388.2 NAD_binding_1   8.21E-24NAD_binding_1   TRUE1
  0   0   0
NP_000388.2 ABM 1.40E-08NAD_binding_1   FALSE   0   0
  1   0
NP_000483.3 MMR_HSR11.98E-05MMR_HSR1TRUE1
  0   0   0
NP_000483.3 DEAD2.30E-05MMR_HSR1FALSE   0   0
  1   0
NP_000483.3 APS_kinase  1.80E-09MMR_HSR1

Re: [R] logistic regression tree

2010-08-20 Thread Frank Harrell



It would be good to tell us of the frequency of observations in each 
category of Y, and the number of continuous X's.  Recursive 
partitioning will require perhaps 50,000 observations in the less 
frequent Y category for its structure and predicted values to 
validate, depending on X and the signal:noise ratio.  Hence the use of 
combinations of trees nowadays as opposed to single trees.  Or 
logistic regression.


Frank

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Fri, 20 Aug 2010, Kay Cichini wrote:



hello gavin  achim,

thanks for responding.

by logistic regression tree i meant a regression tree for a binary response
variable.
but as you say i could also use a classification tree - in my case with only
two outcomes.

i'm not aware if there are substantial differences to expect for the two
approaches (logistic regression tree vs. classification tree with two
outcomes).

as i'm new to trees / boosting / etc. i also might be advised to use the
more comprehensible method / a function which argumentation is understood
without having to climb a steep learning ledder, respectively. at the moment
i don't know which this would be.

regarding the meaning of absences at stands: as these species are frequent
in the area and hence there is no limitation by propagules i guess absence
is really due to unfavourable conditions.

thanks a lot,
kay



-

Kay Cichini
Postgraduate student
Institute of Botany
Univ. of Innsbruck


--
View this message in context: 
http://r.789695.n4.nabble.com/logistic-regression-tree-tp2331847p2332447.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression tree

2010-08-20 Thread Frank Harrell




On Fri, 20 Aug 2010, Kay Cichini wrote:



hello,

my data-collection is not yet finished, but i though have started
investigating possible analysis methods.

below i give a very close simulation of my future data-set, however there
might be more nominal explanatory variables - there will be no continous at
all  (maybe some ordered nominal..).

i tried several packages today, but the one i fancied most was ctree of the
party package.
i can't see why the given no. of datapoints (n=100) might pose a problem
here - but please teach me better, as i might be naive..


See

http://biostat.mc.vanderbilt.edu/wiki/Main/ComplexDataJournalClub#Sebastiani_et_al_Nature_Genetics

The recursive partitioning simulation there will give you an idea - 
you can modify the R code to simulate a situation more like yours. 
When you simulate the true patterns and see how far the tree is from 
discovering the true patterns, you'll be surprised.


Frank

 

i'd be very glad about comments on the use of ctree on suchalike dataset and
if i oversee possible pitfalls

thank you all,
kay

##
# an example with 3 nominal explanatory variables:
# Y is presence of a certain invasive plant species
# introduced effect for fac1 and fac3, fac2 without effect.
# presence with prob. 0.75 in factor combination fac1=I (say fac1 is geogr.
region) and
# fac3 = a|b|c (say all richer substrates).
# presence is not influenced by fac2, which might be vegetation type, i.e.
##
library(party)
dat-cbind(
expand.grid(fac1=c(I,II),
   fac2=LETTERS[1:5],
   fac3=letters[1:10]))

print(dat-dat[order(dat$fac1,dat$fac2,dat$fac3),])

dat$fac13-paste(dat$fac1,dat$fac3,sep=)
for(i in 1:nrow(dat)){
ifelse(dat$fac13[i]==Ia|dat$fac13[i]==Ib|dat$fac13[i]==Ic,
  dat$Y[i]-rbinom(1,1,0.75),
  dat$Y[i]-rbinom(1,1,0))
}
dat$Y-as.factor(dat$Y)

tr-ctree(Y~fac1+fac2+fac3,data=dat)
plot(tr)
##


-

Kay Cichini
Postgraduate student
Institute of Botany
Univ. of Innsbruck


--
View this message in context: 
http://r.789695.n4.nabble.com/logistic-regression-tree-tp2331847p2333073.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R reports

2010-08-21 Thread Frank Harrell




Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Sat, 21 Aug 2010, Donald Paul Winston wrote:



Sweave and LaTex is way to much overhead to deal with. There should be a
built in standard report() function analogous to plot().

Something like the following is necessary if you want real people to take R
seriously:

report(data=aDataFrame, vars=vectorOfColumnNames,
label=vectorOfColumnNames, by=vectorOfColumnNames,
sum=vectorOfColumnNames, title=vectorOfStrings, footer=vectorOfStrings,
pageBy=vectorOfColumnNames, sumBy=vectorOfColumnNames,
filename=string, fileType=text|csv|pdf.etc)

Did I say real people?  I've been Palinized.
--
View this message in context: 
http://r.789695.n4.nabble.com/R-reports-tp2330733p2333264.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R reports

2010-08-21 Thread Frank Harrell



Your notes are not well thought out.

You'll find that r-help is a friendly place for new users that do not 
come in with an attitude.


I once used SAS (for 23 years) and know it very well.  I wrote the 
first SAS procedures for a graphics device, percentiles, 
logistic regression, and Cox regression.  Its reporting 
for the first 30 years of SAS' existence was quite primitive, and 
since then it is not what I'd call advanced and certainly not 
esthetically pleasing.  Considering that SAS has had tens of billions 
of $ at its disposal for research and development the result is quite 
unimpressive.  Look at the fake PROC EXPORT if you want to get a real 
laugh - it can't even put out valid csv files if there are any 
unmatched quotes inside of character variable values.  It is not even 
a procedure, just a front end to a trivial macro.


The report function you outlined is in many ways more primitive than 
many functions already available in R.   See the summary.formula 
function for example in the Hmisc package.  Among other things, some 
of the functions give you flexibility in specifying the criteria by 
which a variable is considered continuous vs. discrete numeric.  They 
also allow you to override statistical tests to include in the table 
with your own functions.  Now one of the functions even automatically 
creates micrographics inside of table cells.


You are welcome to write any R functions you'd like.  The language for 
doing so is richer than the SAS language by a significant margin.


Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Sat, 21 Aug 2010, Donald Paul Winston wrote:



People have been generating reports with a computer for many years. R is
supposed to be an analytical engine. Report writing is fundamental to any
kind of analysis tool. SAS has had several report procedures/functions since
the very beginning(1960's?). SAS stands for Statistical Analysis System. Do
you really expect users to have to piece together a half dozen or so bits of
R code to create a report?

It's not like it's difficult to do! I see this new company called Revolution
Analytics who thinks R is the next big thing. Good grief. Maybe they can
rescue it from the ghettoized academic world.
--
View this message in context: 
http://r.789695.n4.nabble.com/R-reports-tp2330733p2333267.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R reports

2010-08-21 Thread Frank Harrell



On Sat, 21 Aug 2010, Donald Winston wrote:


The point is SAS has had simple reporting for 30 years. R apparently doesn't 
have any. Why is it so hard to accept that a report function analogous to a 
plot function would be a good thing?


R has had more advanced reporting features that SAS since not long 
after R's first production release in the 1990s.  Users who are 
unwilling to study R documentation and examples will not see that of 
course.


Let's see if you want to put your money where your mouth is.  How much 
are you willing to pay every year for SAS for its rigid approach to 
reporting?  What is the current licensing fee to your organization?


You are wrong on yet another piece of this discussion.  SAS legally 
removed the phrase Statistical Analysis System from the company more 
than 15 years ago.  SAS doesn't stand for anything.


Frank

 

On Aug 21, 2010, at 8:38 AM, Frank Harrell wrote:



Your notes are not well thought out.

You'll find that r-help is a friendly place for new users that do not come in 
with an attitude.

I once used SAS (for 23 years) and know it very well.  I wrote the first SAS 
procedures for a graphics device, percentiles, logistic regression, and Cox 
regression.  Its reporting for the first 30 years of SAS' existence was quite 
primitive, and since then it is not what I'd call advanced and certainly not 
esthetically pleasing.  Considering that SAS has had tens of billions of $ at 
its disposal for research and development the result is quite unimpressive.  
Look at the fake PROC EXPORT if you want to get a real laugh - it can't even 
put out valid csv files if there are any unmatched quotes inside of character 
variable values.  It is not even a procedure, just a front end to a trivial 
macro.

The report function you outlined is in many ways more primitive than many 
functions already available in R.   See the summary.formula function for example in the 
Hmisc package.  Among other things, some of the functions give you flexibility in 
specifying the criteria by which a variable is considered continuous vs. discrete 
numeric.  They also allow you to override statistical tests to include in the table with 
your own functions.  Now one of the functions even automatically creates micrographics 
inside of table cells.

You are welcome to write any R functions you'd like.  The language for doing so 
is richer than the SAS language by a significant margin.

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
Department of Biostatistics   Vanderbilt University

On Sat, 21 Aug 2010, Donald Paul Winston wrote:



People have been generating reports with a computer for many years. R is
supposed to be an analytical engine. Report writing is fundamental to any
kind of analysis tool. SAS has had several report procedures/functions since
the very beginning(1960's?). SAS stands for Statistical Analysis System. Do
you really expect users to have to piece together a half dozen or so bits of
R code to create a report?

It's not like it's difficult to do! I see this new company called Revolution
Analytics who thinks R is the next big thing. Good grief. Maybe they can
rescue it from the ghettoized academic world.
--
View this message in context: 
http://r.789695.n4.nabble.com/R-reports-tp2330733p2333267.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R reports

2010-08-21 Thread Frank Harrell




On Sat, 21 Aug 2010, Donald Paul Winston wrote:



Good grief. Adding a report function is not going to make R less flexible. Don't
you want to use a tool that's relevant to the rest of the world? That world is
much bigger then your world. This is ridiculous.

Looks like some people are complaining about me criticizing R and the people who
defend it. Good grief again.


I think your philosophy is now more clear Donald.  People who don't like 
learning new things place themselves at a competitive disadvantage. 
The R community, where learning is highly valued, may be fundamentally 
incompatible with your philosophy.  You may do well to stay with SAS.


Frank






From: David Hajage-2 [via R] ml-node+248-1227050197-138...@n4.nabble.com

Sent: Sat, August 21, 2010 4:54:12 AM
Subject: Re: R reports

Just show us what is the kind of report you want to do, and you will perhaps
get a solution to reproduce it. Then, if you don't like the way to do that,
write your own code or don't use R, noone force you. The majority of R users
are satisfied with the way to generate reports, because it is flexible.
There is ABSOLUTELY *NO WARRANTY with R, this means also you have no
warranty to find exactly what you want, and what you can find in SAS. Just
deal with it.*

2010/8/21 Donald Paul Winston [hidden email]




Sweave and LaTex is way to much overhead to deal with. There should be a
built in standard report() function analogous to plot().

Something like the following is necessary if you want real people to take R
seriously:

report(data=aDataFrame, vars=vectorOfColumnNames,
label=vectorOfColumnNames, by=vectorOfColumnNames,
sum=vectorOfColumnNames, title=vectorOfStrings,
footer=vectorOfStrings,
pageBy=vectorOfColumnNames, sumBy=vectorOfColumnNames,
filename=string, fileType=text|csv|pdf.etc)

Did I say real people?  I've been Palinized.
--
View this message in context:
http://r.789695.n4.nabble.com/R-reports-tp2330733p2333264.html?by-user=t
Sent from the R help mailing list archive at Nabble.com.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] AUC

2010-08-23 Thread Frank Harrell




Samuel,

Since the difference in AUCs has insufficient power and doesn't really 
take into account the pairing of predictions, I recommend the Hmisc 
package's rcorrp.cens function.  Its method has good power and asks 
the question is one predictor more concordant than the other in the 
same pairs of observations?.


Frank

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Mon, 23 Aug 2010, Samuel Okoye wrote:


Hello,

Is there is any R function computes the AUC for paired data?

Many thanks,
Samuel




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] modify a nomogram

2010-08-25 Thread Frank Harrell



Update to the rms package which is the version now being actively 
supported.  New features will not be added to Design.  The nomogram 
function in rms separates the plotting into a plot method for easier 
understanding.  You can control all axes - some experimentation can 
tell you if you can do what you want as I haven't tried exactly what 
you are doing.


Frank

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Wed, 25 Aug 2010, david dav wrote:


Hi,
I would like to emphasize (zoom) the zone of a nomogram where the
probability are  0.01
(nomogram built with nomogram, Design).
As a consequence, I don't need to draw the part of the Total points
axis with score  60 equivalent in my case to a linear predictor  4.5

- As far as I know, this is not possible with the arguments of the function.
- Changing the code of the function is beyond my abilities
-- can not even create a nomo.f function with the same body:
 body(nomo) - expression({
 conf.lp - match.arg(conf.lp) . rest of the function body
   this does not even work
-- I am not sure to find the piece of code responsible for
defining the axis

nomogram(logistic.fit, fun = plogis, fun.at = c(seq(0.01,1,by=.2)),
lmgp=.2, lp = T,maxscale = 100,   total.sep.page = T )

Thanks
David

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Speeding up prediction of survival estimates when using `survifit'

2010-08-31 Thread Frank Harrell




Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Mon, 30 Aug 2010, Ravi Varadhan wrote:


Hi,

I fit a Cox PH model to estimate the cause-specific hazards (in a competing 
risks setting).  Then , I compute the survival estimates for all the 
individuals in my data set using the `survfit' function.  I am currently 
playing with a data set that has about 6000 observations and 12 covariates.  I 
am finding that the survfit function is very slow.

Here is a simple simulation example (modified from Frank Harrell's example for 
`cph') that illustrates the problem:

#n - 500
set.seed(4321)

age - 50 + 12*rnorm(n)

sex - factor(sample(c('Male','Female'), n, rep=TRUE, prob=c(.6, .4)))

cens - 5 * runif(n)

h - 0.02 * exp(0.04 * (age-50) + 0.8 * (sex=='Female'))

dt - -log(runif(n))/h

e - ifelse(dt = cens, 1, 0)

dt - pmin(dt, cens)

Srv - Surv(dt, e)

f - coxph(Srv ~ age + sex, x=TRUE, y=TRUE)

system.time(ans - survfit(f, type=aalen, se.fit=FALSE, newdata=f$x))


When I run the above code with sample sizes, n, taking on values of 500, 1000, 
2000, and 4000, the time it takes for survfit to run are as follows:

# n - 500

system.time(ans - survfit(f, type=aalen, se.fit=FALSE, newdata=f$x))

  user  system elapsed
  0.160.000.15


# n - 1000

system.time(ans - survfit(f, type=aalen, se.fit=FALSE, newdata=f$x))

  user  system elapsed
  1.450.001.48


# n - 2000

system.time(ans - survfit(f, type=aalen, se.fit=FALSE, newdata=f$x))

  user  system elapsed
 10.190.00   10.25


# n - 4000

system.time(ans - survfit(f, type=aalen, se.fit=FALSE, newdata=f$x))

  user  system elapsed
 72.870.05   74.87


I eventually want to use `survfit' on a data set with roughly 50K observations, 
which I am afraid is going to be painfully slow.  I would much appreciate 
hints/suggestions on how to make `survfit' faster or any other faster 
alternatives.


Ravi,

If you don't need standard errors/confidence limits, the rms package's 
survest and related functions can speed things up greatly if you fit 
the model using cph(, surv=TRUE).  [cph calls coxph, and calls 
survfit once to estimate the underlying survival curve].


Frank



Thanks.

Best,
Ravi.


Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [Q] Goodness-of-fit test of a logistic regression model using rms package

2010-09-01 Thread Frank Harrell

On Wed, 1 Sep 2010, GMail (KU) wrote:

Hello,

I was looking for a way to evaluate the goodness-of-fit of a logistic regression model. After googling, I
found that I could use resid(fit, 'gof') method implemented in the rms package. However, since I
am not used to the le Cessie-van Houwelingen normal test statistic, I do not know which statistic
from the returned from the resid(fit, 'gof') call that I could use to evaluate the goodness of
fit.

When I ran the resid(fit, 'gof'), I got the following results:
##
Sum of squared errors Expected value|H0SD
6844.684594 6805.672315 2.790969
Z P
13.978043 0.00
##

I tried to read the le Cessie and van Houwelingen's original paper, but I found
that it required prerequisite knowledge I don't current have.
Could someone explain how to interpret the results from resid(fit, 'gof') call?

Any help would be much appreciated.

Young-Jin Lee

Young-Jin,

I think everyone has trouble interpreting omnibus tests of lack of
fit, so don't feel bad. You just know that something somewhere is
probably wrong with the model. I focus on directed tests such as
allowing all continuous variables to have nonlinear effects or
allowing selected interactions, and finding out how important the
complex model terms are.

Frank Harrell

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Is there any package or function perform stepwise variable selection under GEE method?

2010-09-02 Thread Frank Harrell



The absence of stepwise methods works to your advantage, as these 
yield invalid statistical inference and inflated regression 
coefficients.


Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Thu, 2 Sep 2010, 黃敏慈 wrote:


Hi ,

I use library(gee),library(geepack),library(yags) perform GEE data analysis
, but all of them cannot do variable selection!

Both step and stepAIC can do variable selection based on AIC criterion under
linear regression and glm,
 but they  cannot work when model is based on GEE.

I want to ask whether any variable selection function or package under GEE
model avaliable now?
Thanks!

Best,

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Comparing COXPH models, one with age as a continuous variable, one with age as a three-level factor

2010-09-02 Thread Frank Harrell


On Thu, 2 Sep 2010, stephenb wrote:



sorry to bump in late, but I am doing similar things now and was browsing.

IMHO anova is not appropriate here. it applies when the richer model has p
more variables than the simpler model. this is not the case here. the
competing models use different variables.


A simple approach is to have the factor variable in the model and to 
formally test for added information given by the continuous variable 
(linear, quadratic, spline, etc).  AIC could also be used.


 

you are left with IC.

by transforming a continuous variable into categorical you are smoothing,
which is the idea of GAM. if you look at what is offered in GAMs you may
find better approximations f(age) as well as tools for testing among
different f(age) transformations.


I don't follow that comment.  Smoothing uses the full continuous 
variable to begin with.


A restricted cubic spline function in age is a simple approach.  E.g.:

require(rms)
dd - datadist(mydata); options(datadist='dd')
f - cph(Surv(dtime,death) ~ rcs(age,4) + sex, data=mydata)
plot(Predict(f, age))

Note that you can always expect the categorized version of age not to 
fit the data except sometimes when behavior is dictated by law 
(driving, drinking, military service, medicare).


Frank



regards.
S.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Linear Logistic Regression - Understanding the output (and possibly the test to use!)

2010-09-05 Thread Frank Harrell


On Sun, 5 Sep 2010, st...@wittongilbert.free-online.co.uk wrote:


David Winsemius wrote:


1. is glm the right thing to use before I waste my time


Yes, but if your outcome variable is binomial then the family argument
should be  binomial. (And if you thought it should be poisson,
then why below did you use gaussian???

Used gaussian below because it was the example from the docs.  Thats not
my data, its example data which was not binomial.



and 2. how do I interpret the result!


Result? What result? I do see any description of your data, nor any code.

I didn't provide MY DATA because I thought that would complicate things
even further.  So I was hoping for some advice on how to interpret the
result of the example data so that I could then apply that to my data.
I haven't even tried to run my data as I couldn't see what the output of
the examples was trying to tell me.

However, as you've snipped it because it was not relevant thats useful
to know.  I often find this problem with the examples in the R doc's
they suddenly take a dataset that I have no knowledege of and play with
it and produce an 'answer'.  The examples are presumably provided to
enable me to work through how the code works etc.  So what I was hoping
for was someone to point to somewhere on-line that documents how to use
the function for logistic regression and to explain what all that table
of data it spits out actually meant.  Someone has VERY KINDLY posted me
something off list which I believe helps.


I think you need to consult a statistician or someone who has taken
the time to read that statistical mumbo jumbo you don't want to
learn. This mailing list is not set up to be a tutorial site.

I have access to stats advice, but I don't (a) want to turn up to them
with a pile of paper from R and them say glm() may be the wrong
analaysis (b) they don't do R so they can't tell me if I've used R
wrongly and (c) I completely expect they'd say which of the values in
the table matter since no paper I've ever seen published showed a
logistic regression with a table of numbers.


Clearly the time to consult a statistician is before you have done any 
statistical analysis.


Frank Harrell



I have a couple of Kleinbaum's (et al) other texts and find them to be
well written and reasoned, so I suspect the citation above would be as
accessible as any.


Thank you, that is useful.  There is a real problem when buying R text
books.  None of the bookshops round here stock any which means you can't
tell if they are much good.  I've looked at some and they seem to be
re-writes of the help files.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] some questions about longitudinal study with baseline

2010-09-07 Thread Frank Harrell

Baseline should appear only as a baseline and should be removed from 
the set of longitudinal responses.  This is often done with a merge( ) 
operation.


Frank

Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

On Tue, 7 Sep 2010, array chip wrote:


Hi all,

I asked this before the holiday, didn't get any response. So would like to
resend the message, hope to get any fresh attention. Since this is not purely
lme technical question, so I also cc-ed R general mailing list, hope to get some
suggestions from there as well.


I asked some questions on how to analyze longitudinal study with only 2 time
points (baseline and a follow-up) previously. I appreciate many useful comments
from some members, especially Dennis Murphy and Marc Schwartz who refered the
following paper addressing specifically this type of study with only 2 time
points:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1121605/

Basically, with only 2 time points (baseline and one follow-up), ANCOVA with
follow-up as dependent variable and baseline as covariate should be used:

follow-up = a + b*baseline + treatment

Now I have a regular longitudinal study with 6 time points, 7 treatments
(vehicle, A, B, C, D, F, G), measuring a response variable y. The dataset is
attached. I have some questions, and appreciate any suggestions on how to
analyze the dataset.

dat-read.table(dat.txt,sep='\t',header=T,row.names=NULL)
library(MASS)
dat$trt-relevel(dat$trt,'vehicle')

xyplot(y~time, groups=trt, data=dat,
ylim=c(3,10),col=c(1:6,8),lwd=2,type=c('g','a'),xlab='Days',ylab=response,
key = list(lines=list(col=c(1:6,8),lty=1,lwd=2),
 text = list(lab = levels(dat$trt)),
 columns = 3, title = Treatment))

So as you can see that there is some curvature between glucose level and time,
so a quadratic fit might be needed.



dat$time2-dat$time*dat$time

A straight fit like below seems reasonable:

fit-lmer(y~trt*time+trt*time2+(time|id),dat)

Checking on random effects, it appears that variance component for random slope
is very small, so a simpler model with random intercept only may be sufficient:

fit-lmer(y~trt*time+trt*time2+(1|id),dat)

Now, I want to incorporate baseline response into the model in order to account
for any baseline imbalance. I need to generate a new variable baseline based
on glucose levels at time=0:

dat-merge(dat, dat[dat$time==0,c('id','y')], by.x='id',by.y='id',all.x=T)
colnames(dat)[c(4,6)]-c('y','baseline')

so the new fit adding baseline into the mixed model is:

fit-lmer(y~baseline+trt*time+trt*time2+(1|id),dat)

Now my question is 1). Is the above model a reasonable thing to do? 2) when
baseline is included as a covariate, should I remove the data points at baseline
from the dataset? I am kind of unsure if it's reasonable to use the baseline
both as a covariate and as part of the dependent variable values.

Next thing I want to do with this dataset is to do multiple comparisons between
each treatment (A, B, C, D, F, G) vs. vehicle at a given time point, say time=56
(the last time points) after adjusting the baseline imbalance. This seems to be
done using Dunnet test. When I say after adjusting baseline imbalance, I mean
the comparisons should be done based on the difference between time=56 and
time=0 (baseline), i.e. is there any difference in the change from baseline for
treatment A (or B, C, D, F, G) vs. vehicle?. How can we test this? Will glht()
in multcomp work for a lmer fit? If yes, how can I specify the syntax?

Finally, with the above model, how to estimate the difference (and the standard
error) between time=56 and time=0 (baseline) for each treatment groups?

Thank you all for your attention.

John





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Where to find R-help options web page

2010-09-10 Thread Frank Harrell


Dear Group:

I must be missing something obvious but I can't find from www.r-project.org
a link to a page for managing my r-help subscription.  Due to getting a new
smart phone I'm changing to nabble to manage r-help traffic.  I need to turn
off receiving mail directly to my e-mail address.  Nabble sent a link to
turn off e-mail but the r-help mail service rejected nabble's command.

Thanks
Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Where-to-find-R-help-options-web-page-tp2535123p2535123.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Where to find R-help options web page

2010-09-10 Thread Frank Harrell


Thanks for your note Ista.  I had seen that page.  I don't see on it where
you can modify your subscription characteristics.  I'd like to remain
subscribed but not have e-mail delivered; that way I can just see messages
in nabble.  It may be that I'm misunderstanding how to use nabble (and to be
able to post on it).

Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Where-to-find-R-help-options-web-page-tp2535123p2535148.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Where to find R-help options web page

2010-09-10 Thread Frank Harrell


Aha.  Thanks David.  I somehow thought that that whole section was for
administrators only.
Frank


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Where-to-find-R-help-options-web-page-tp2535123p2535172.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SF-8 (not 36) questionnaire scoring for R?

2010-09-13 Thread Frank Harrell


I know someone who has R code for SF-36 and perhaps SF-12.  Aren't there
copyright issues relating to SF-* even if it is reprogrammed?
Frank


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/SF-36-questionnaire-scoring-for-R-tp2537198p2537404.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SF-8 (not 36) questionnaire scoring for R?

2010-09-13 Thread Frank Harrell


Yes the company behind that probably received federal funds for some of the
research and has been very careful to minimize their contribution to the
community.

I didn't understand your parenthetical remark.
Frank


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/SF-36-questionnaire-scoring-for-R-tp2537198p2537480.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] New version of rms package on CRAN

2010-09-13 Thread Frank Harrell

CRAN has a significant update to rms.  Windows and unix/linux versions 
are available and I expect the Mac version to be available soon. 
The most significant improvement is addition of latex=TRUE 
arguments to model fitting print methods, made especially for use 
with Sweave.


Here is a summary of changes since the previous version.

Changes in version 3.1-0 (2010-09-12)
   * Fixed gIndex to not use scale for labeling unless character
   * Changed default na.action in Gls to na.omit and added a note in 
the help file that na.delete does not work with Gls

   * Added terms component to Gls fit object (latex was not working)
   * Added examples in robcov help file testing sandwich covariance 
estimator
   * Added reference related to the effects package under help file 
for plot.Predict

   * Added more examples and simulations to gIndex
   * Fixed ancient bug in lrm.fit Fortran code to handle case where 
initial estimates are nearly perfect (was trying to step halve); 
thanks: Dan Hogan
   * Changed survdiffplot to use gray(.85) for bands instead of 
gray(.95)

   * Fixed formatting problem in print.psm
   * Added prStats and reVector functions to rmsMisc.s
   * Changed formatting of all print.* functions for model fits to use 
new prStats function
   * Added latex=TRUE option to all model fit print methods; requires 
LaTeX package needspace

   * Re-wrote printing routines to make use of more general model
   * Removed long and scale options from cph printing-related routines
   * Prepare for version 2.36-1 of survival package by adding 
censor=FALSE argument to survfit.coxph

   * Added type=ccterms to various predict methods
   * Made type=ccterms the default for partial g-indexes in gIndex, 
i.e., combine all indirectly related (through interactions) terms

   * Added Spiegelhalter calibration test to val.prob
   * Added a check in cph to trigger an error if strata() is used in 
formula
   * Fixed drawing of polygon for shaded confidence bands for 
survplot.survfit (thanks to Patrick Breheny patrick.breh...@uky.edu)
   * Changed default adjust.subtitle in bplot to depend on ref.zero, 
thanks to David Winsemius dwinsem...@comcast.net
   * Used a namespace and simplified referenced to a few survival 
package functions that survival actually exports



Frank E Harrell Jr   Professor and ChairmanSchool of Medicine
 Department of Biostatistics   Vanderbilt University

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] predict.lrm ( Design package)

2010-09-20 Thread Frank Harrell


You sent a private note about this which I just took the time to answer. 
Please send only one note, and please post my reply to you to r-help.
Frank


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/predict-lrm-Design-package-tp2546894p2547011.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] predict.lrm ( Design package)

2010-09-20 Thread Frank Harrell


Thanks Peter.  I think you're right.
Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/predict-lrm-Design-package-tp2546894p2547146.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] predict.lrm ( Design package) poor performance?

2010-09-22 Thread Frank Harrell


% correct is an improper scoring rule and a discontinuous one to boot.  So it
will not always agree with more proper scoring rules.

When you have a more difficult task, e.g., discriminating more categories,
indexes such as the generalized c-index that utilize all the categories will
recognize the difficulty of the task and give a lower value.  No cause for
alarm.

Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/predict-lrm-Design-package-tp2546894p2550118.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Validation / Training - test data

2010-09-29 Thread Frank Harrell


Split sample validation is highly unstable with your sample size.

The rms package can help with bootstrapping or cross-validation, assuming
you have all modeling steps repreated for each resample.

Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Validation-Training-test-data-tp2718523p2718905.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Validation / Training - test data

2010-09-29 Thread Frank Harrell


It all depends on the ultimate use of the results.

Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Validation-Training-test-data-tp2718523p2719370.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-01 Thread Frank Harrell


John,

Don't conclude that one category is the most probable when its probability
of being equaled or exceeded is a maximum.  The first category would always
be the winner if that were the case.

When you say y=best remember that you are dealing with a probability model. 
Nothing is forcing you to classify an observation, and unless the category's
probability is high, this may be dangerous.  You might do well to consider a
more smooth approach such as using the generalized roc area (C-index) or its
related rank correlation measure Dxy.  Also there are odds ratios.

Frank

















-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2891623.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-01 Thread Frank Harrell


Why assign them at all?  Is this a forced choice at gunpoint problem? 
Remember what probabilities mean.

Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2909713.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-01 Thread Frank Harrell


Well put Greg.  The job of the statistician is to produce good estimates
(probabilities in this case).  Those cannot be translated into action
without subject-specific utility functions.  Classification during the
analysis or publication stage is not necessary.

Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2951976.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-03 Thread Frank Harrell


You still seem to be hung up on making arbitrary classifications.  Instead,
look at tendencies using odds ratios or rank correlation measures.  My book
Regression Modeling Strategies covers this.

Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2953220.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A problem about nomogram--thank you for you help

2010-10-03 Thread Frank Harrell


Please take the time to study the subject matter, and note that a nomogram is
just a graphical method.  It is not a statistical model or a process.

Frank


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/A-problem-about-nomogram-thank-you-for-you-help-tp2953209p2953221.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-04 Thread Frank Harrell


I may be missing a point, but the proportional odds model easily gives you
odds ratios for Y=j (independent of j by PO assumption).  Other options
include examining a rank correlation between the linear predictor and Y, or
(if Y is numeric and spacings between categories are meaningful) you can get
predicted mean Y (see the Mean.lrm in the R rms package, a replacement for
the Design package).

Frank 

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2954274.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiple moderated regression steps

2011-07-01 Thread Frank Harrell

How does it help to center the variables?  What statistical principles are
you using to fit your model and how are you ensuring statistical validity of
inferential estimates?  Why do it in multiple steps?  Why use a mixture of
software tools?  What is the utility of standardized regression
coefficients?
Frank

stamkiral wrote:
 
 hi,
 
 ım studying moderated effects of percieved social support and justice
 world belief on relationship between stress coping strategies and
 depression level. ı haver never run this analysis before soi ı want to
 check my steps whether correct or not. 
 
 first ı run regression 
 in step 1 
 centered independent variables and centered moderators
 
 in step2
 two way interactions
 
 instep 3
 three way interactions
 
 as results ı found significiant two way and three way interactions. It 
 ıs important after this
 
 for example two way interactions;
 ı run 2 slopes that fit in with Winnifred's suggestion
 (http://www.docstoc.com/docs/21151269/Moderated-Multiple-Regression-v5
 p.6) 
 my criteria for significence was the centered indipendent vairable (that
 has significant interaction level) in Block 2- is this way is
 correct?
 
 second to plot this slope ı used Dawson' s 2 way unstandardised excel
 spreadsheet (http://www.jeremydawson.co.uk/slopes.htm). In spreadsheet it
 is required to enter unstandardised regression coefficients of IV,
 moderator and interaction. ın which block the coefficients are true for
 entering IV and moderator co efficients?
 ın 1 block (that include main effects) or in 2 block (that include
 interactions)?
 
 one more question: In three way spreadshhet it is necessary to enter
 variance of IV*Moderator coefficient. ın matrıx which covariance block is
 correct to enter this value? should ı look at second covariance block or
 third covariance block.
 like this,for value of Covariance of IV*Mod1, IV*Mod2 coefficients which
 covariance block must be taken as correct??? 
 thaks so much
 
 Note: excuse me for my English 
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/multiple-moderated-regression-steps-tp3637807p3638186.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] multiple moderated regression steps

2011-07-01 Thread Frank Harrell

That concern has nothing to do with centering variables before including them
in a model.

Your multiple significance testing strategy is not based on statistical
principles and will distort all inferences you obtain from the final
model.
Frank

stamkiral wrote:
 
 variables were centered (except DV), because in social sciences zero is
 rarely a meaningful point on a scale (Cohen, Cohen, West, Aiken, 2006).
 for example in percieved social support questionnaire there is no value as
 zero. It is a Likert Type questionnaire and on questionnaire 1= strongly
 yes.7= strongly no. So Aiken and West (1991), suggested to centered
 predictors and moderators to appoint zero a meaningful value to count in
 regression equation. also multilevel steps were run to avoid
 multicollinearity. according to results .05 were accepted as significant
 (in the coefficient table) and slope test were done for interactions. 
 for plottin slopes unstandardised regression coefficient were used.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/multiple-moderated-regression-steps-tp3637807p3638669.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Speed Advice for R --- avoid data frames

2011-07-06 Thread Frank Harrell

Data frame access is about 5,000 times slower than matrix column
access, and 300 times slower than matrix row access. R's data frame
operational speed is an amazing 40 data accesses per seconds. I have
not seen access numbers this low for decades.

How to avoid it? Not easy. One way is to create multiple matrices,
and group them as an object. of course, this loses a lot of features
of R. Another way is to copy all data used in calculations out of the
data frame into a matrix, do the operations, and then copy them back.
not ideal, either.

In my opinion, this is an R design flow. Data frames are the
fundamental unit of much statistical analysis, and should be fast. I
think R lacks any indexing into data frames. Turning on indexing of
data frames should at least be an optional feature.

I hope this message post helps others.

/iaw

Ivo Welch (ivo.we...@gmail.com)
http://www.ivo-welch.info/

__**
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-helplt;https://stat.ethz.ch/mailman/listinfo/r-helpgt;
PLEASE do read the posting guide http://www.R-project.org/**
posting-guide.htmllt;http://www.R-project.org/posting-guide.htmlgt;
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context:
http://r.789695.n4.nabble.com/Speed-Advice-for-R-avoid-data-frames-tp3640932p3648681.html
Sent from the R help mailing list archive at Nabble.com.

Re: [R] GLS - Plotting Graphs with 95% conf interval

2011-07-11 Thread Frank Harrell

The Gls function in the rms package is a frontend to gls that allows you to
use all the graphics and other methods available in rms.
Frank

SamiC wrote:
 
 Hi, I am trying to plot the original data with the line of the model using
 the predict function.  I want to add SE to the graph, but not sure how to
 get them out as the predict function for gls does not appear to allow for
 SE=TRUE argument.  
 
 Here is my code so far:
 
 f1-formula(MaxNASC40_50~hu3+flcmax+TidalFlag)
 
 vf1Exp-varExp(form=~hu3)
 
 B1D-gls(f1,correlation=corGaus(form=Lat~Lon, nugget=TRUE),weights=vf1Exp
 , data=ocean)
 
 ochu3-seq(from=2.9,to=4,length=120)
 ocflc-seq(from=0,to=0.8,length=120)
 tidal1-rep(c(1),length=120)
 
 mydata1-data.frame(TidalFlag=factor(tidal1),hu3=ochu3,flcmax=ocflc)
 lineNASC1-predict(B1D,newdata=mydata1,type=response)
 lineNASC1-as.data.frame(lineNASC1)
 
 plot(ocean$MaxNASC40_50[ocean$TidalFlag==1]~ocean$flcmax[ocean$TidalFlag==1)
 lines(lineNASC1$lineNASC1~mydata1$flcmax)
 
 Tidal Flag is a factor (so i assume i have to plot seperate graphs for
 each level).
 
 When I have tried to use the effects package I get the error:  Error in
 x$formula : object of type 'symbol' is not subsettable.
 
 Also when i have been trying to predict values from a zero inflated
 negative binomial, I am getting the same line of fit regardless of what is
 on my X axis (depsite different variables have positive and negative
 relationships).
 
 any imput on any of these problems would be appreciated.
 
 Thanks
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/GLS-Plotting-Graphs-with-95-conf-interval-tp3659814p3660953.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Named numeric vectors with the same value but different names return different results when used as thresholds for calculating true positives

2011-07-12 Thread Frank Harrell

Also note that the statistical method you are using does not seem in line
with decision theory, and you are assuming that the threshold actually
exists.  It is seldom the case that the relationship of a predictor with the
response is flat on at least one side of the threshold.  A smooth prediction
model may be in order.
Frank

Eik Vettorazzi wrote:
 
 Hi,
 
 Am 11.07.2011 22:57, schrieb Lyndon Estes:
 ctch[ctch$threshold == 3.5, ]
 # [1] threshold val   tpfptnfntpr
  fpr   tnr   fnr
 #0 rows (or 0-length row.names)
 
 this is the very effective FAQ 7.31 trap.
 http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f
 
 Welcome to the first circle of Patrick Burns' R Inferno!
 
 Also, unname() is a more intuitive way of removing names.
 
 And I think your code is quite inefficient, because you calculate
 quantiles many times, which involves repeated ordering of x, and you may
 use a inefficient size of bin (either to small and therefore calculating
 the same split many times or to large and then missing some splits).
 I'm a bit puzzled what is x and y in your code, so any further advise is
 vague but you might have a look at any package that calculates
 ROC-curves such as ROCR or pROC (and many more).
 
 Hth
 
 -- 
 Eik Vettorazzi
 
 Department of Medical Biometry and Epidemiology
 University Medical Center Hamburg-Eppendorf
 
 Martinistr. 52
 20246 Hamburg
 
 T ++49/40/7410-58243
 F ++49/40/7410-57790
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Named-numeric-vectors-with-the-same-value-but-different-names-return-different-results-when-used-as-s-tp3660833p3662030.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sum weights of independent variables across models (AIC)

2011-07-14 Thread Frank Harrell

Why go to so much trouble?  Why not fit a single full model and use it?  Even
better why not use a quadratic penalty on the full model to get optimum
cross-validation?
Frank

nofunsally wrote:
 
 Hello,
 I'd like to sum the weights of each independent variable across linear
 models that have been evaluated using AIC.
 
 For example:
 
 library(MuMIn)
 data(Cement)
 lm1  -  lm(y  ~  .,  data  =  Cement)
 dd  -  dredge(lm1,  beta  = TRUE,  eval  =  TRUE,  rank  =  AICc)
 get.models(dd, subset = delta 4)
 
 There are 5 models with a Delta AIC Score of less than 4.  I would
 like to sum the weights for each of the independent variables across
 the five models.
 
 How can I do that?
 
 Thanks,
 Mike
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Sum-weights-of-independent-variables-across-models-AIC-tp3666306p3668513.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] validate survival with val.surv

2011-07-14 Thread Frank Harrell

The documentation for val.surv tried to be clear about that.

Note that val.surv is only for the case where the predictions were developed
on a separate dataset, i.e., that the validation is truly 'external'.
Frank

yz wrote:
 
 Dear R users:
 
 I want to externally validate a model with val.surv.
 
 Can I use only calculated survival (at 1 year) and actual survival?
 Or I needed the survival function and actual survival.
 
 Thanks
 
 *Yao Zhu*
 *Department of Urology
 Fudan University Shanghai Cancer Center
 Shanghai, China*
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/validate-survival-with-val-surv-tp3669022p3669051.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Design Survplot performance

2011-07-21 Thread Frank Harrell

Replace the soon-to-be Design with rms.

Specify surv=TRUE to cph so that approximate rather than fully correct
standard errors will be computed by survplot/survest.
Frank


christopher.a.hane wrote:
 
 Hi,
 
 I have a Cox PH model that's large for my server, 120K rows, ~300 factors
 with 3 levels each, so about 1000 columns. The 300 factors all pass a
 preliminary test of association with the outcome. Solving this with cph
 from
 Design takes about 3 hours. I have created the fit with x=T, y=T to save
 the
 model data.
 
 I want to validate the PH assumption by calling  survplot(fit, gender=NA,
 logt=TRUE, loglog=TRUE) for many of the factors (here gender is one column
 name).  Just creating this one plot takes 40m.
 
 I'd be happy to sample from the fitted model to create these tests, or
 figure out another way to check assumptions in the model.
 
 Has anyone done something similar, or have other suggestions for tests
 that
 scale better?
 
 thanks.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Design-Survplot-performance-tp3684580p3684626.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Error in plotmath

2011-07-31 Thread Frank Harrell

Under 

platform   x86_64-pc-linux-gnu  
arch   x86_64   
os linux-gnu
system x86_64, linux-gnu
status  
major  2
minor  13.1 
year   2011 
month  07   
day08   
svn rev56322
language   R
version.string R version 2.13.1 (2011-07-08)

I get a double quote mark in place of = in the y-axis label when I run the
following.

set.seed(6)
x - rnorm(30, 100, 20)
xs - seq(50, 150, length=150)
cdf - pnorm(xs, 100, 20)
plot(xs, cdf, type='l', ylim=c(0,1),
 xlab=expression(x),
 ylab=expression(paste(Prob[, X = x, ])))
lines(ecdf(x), cex=.5)

The problem also occurs if I use instead ylab=expression(prob(X = x)))

All is well if I remove = but I need =.

Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Error-in-plotmath-tp3708153p3708153.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to make a nomogam and Calibration plot

2011-08-01 Thread Frank Harrell

Kindly do not attach questions in a separate document.

Install and read the documentation for the R rms package, and see handouts
at http://biostat.mc.vanderbilt.edu/rms

Frank

sytangping wrote:

Dear R users,

I am a new R user and something stops me when I try to write a academic
article. I want to make a nomogram to predict the risk of prostate cancer
(PCa) using several factors which have been selected from the Logistic
regression run under the SPSS. Always, a calibration plot is needed to
validate the prediction accuracy of the nomogram.
However, I tried many times and read a lot of posts with respect to this
topic but I still couldn't figure out how to draw the nomogram and the
calibration plot. My dataset and questions in detail are shown in two
attached files. It will be very grateful if someone can save his/her time
to help for my questions.

Warmest regards!

Ping Tang http://r.789695.n4.nabble.com/file/n3710068/Dataset.xls
Dataset.xls http://r.789695.n4.nabble.com/file/n3710068/R_help.doc
R_help.doc

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context:
http://r.789695.n4.nabble.com/How-to-make-a-nomogam-and-Calibration-plot-tp3710068p3710126.html
Sent from the R help mailing list archive at Nabble.com.

Re: [R] How to make a nomogam and Calibration plot

2011-08-03 Thread Frank Harrell

The nomogram you included was produced by the Design package, the precursor
to the rms package. You will have to take the time to intensively read the
rms package documentation. Note that how you developed the model (e.g.,
allowing for non-linearity in log PSA, not using stepwise regression which
invalidates the results, making sure all clinically relevant predictors are
in the model, ...) is the most important step. The process you are going
through generally requires an M.S. in biostatistics.
Frank

sytangping wrote:

Dear Harrell,

Many thanks for your quick response!
However, after try and try, I still have difficulty to solve my questions.
I post my questions again. I hope someone can help me run the data and
draw the nomogram and calibration plot for me. I know that is not good but
indeed I have no way to go. The problems almost drove me mad!

Best regards!

Ping Tang

Dear R users,

I am a new R user and something stops me when I try to write a academic
article. I want to make a nomogram to predict the risk of prostate cancer
(PCa) using several factors which have been selected from the Logistic
regression run under the SPSS. Always, a calibration plot is needed to
validate the prediction accuracy of the nomogram.
However, I tried many times and read a lot of posts with respect to this
topic but I still couldn't figure out how to draw the nomogram and the
calibration plot. Attached file is the dataset for the research. It will
be very grateful if someone can save his/her time to help for my
questions.

Warmest regards!

Logistic Regression

Classification Tablea,b
ObservedPredicted
Pca-YN Percentage Correct
0 1
Step 0Pca-YN 0 295 0 100.0
1 218 0 .0
Overall Percentage 57.5

Variables in the Equation
B S.E.Walddf Sig.Exp(B) 95.0% C.I.for
EXP(B)
Lower Upper
Step 1a Age .031.0154.491 1 .0341.032 1.002
1.062
DRE 1.173 .26619.492 1 .0003.233 1.920 5.443
LogPV -2.857 .50931.532 1 .000.057.021.156
LogPSA 2.316 .24688.416 1 .00010.132 6.253 16.419
Constant-1.024 1.273 .6481 .421.359

The equation:

Probability =

e-1.024+0.31age+1.173DRE+-2.857LogPV+2.316LogPSA
1+e-1.024+0.31age+1.173DRE+-2.857LogPV+2.316LogPSA

My questions are,

1.How to draw a nomogram (similar to the below figure 1) to predict the
probability of cancer using R?

2. How to make the Calibration plot (similar to the below figure 2) which
used to validate the prediction accuracy of the nomogram using R? And how
to calculate the concordance index (C-index) ?

http://r.789695.n4.nabble.com/file/n3714477/untitled.jpg
http://r.789695.n4.nabble.com/file/n3714477/%E9%99%84%E4%BB%B62.jpg
http://r.789695.n4.nabble.com/file/n3714477/Dataset.xls Dataset.xls

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context:
http://r.789695.n4.nabble.com/How-to-make-a-nomogam-and-Calibration-plot-tp3710068p3715336.html
Sent from the R help mailing list archive at Nabble.com.

Re: [R] How to make a nomogam and Calibration plot

2011-08-03 Thread Frank Harrell

I want to add a comment related to the calibration plot that was presented in
a previous post (which probably cannot be done optimally in SPSS). The plot
lacks sufficient resolution in the x-axis values that are calibrated. It is
far better to use loess to estimate a smooth nonparametric calibration
curve, with no (arbitrary) binning of x. And it is not adequate to validate
only the 3-4 points that were plotted.
Frank

yz wrote:

Nomogram is user-friendly, but the equation is also acceptable. It should
be
kept in mind that the process of model development really counts.

BTW: You can calculate C-index (AUC) in SPSS. Calibration plot can also be
plotted (may be manually) from the result of SPSS.

*Yao Zhu*
*Department of Urology
Fudan University Shanghai Cancer Center
Shanghai, China*

2011/8/1 sytangping lt;surgeon666...@yahoo.com.cngt;

Dear R users,

Warmest regards!

Ping Tang http://r.789695.n4.nabble.com/file/n3710068/Dataset.xls
Dataset.xls http://r.789695.n4.nabble.com/file/n3710068/R_help.doc
R_help.doc

--
View this message in context:
http://r.789695.n4.nabble.com/How-to-make-a-nomogam-and-Calibration-plot-tp3710068p3710068.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context:
http://r.789695.n4.nabble.com/How-to-make-a-nomogam-and-Calibration-plot-tp3710068p3717519.html
Sent from the R help mailing list archive at Nabble.com.

Re: [R] Goodness of fit of binary logistic model

2011-08-05 Thread Frank Harrell

Please provide the data or better the R code for simulating the data that
shows the problem.  Then we can look further into this.
Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Goodness-of-fit-of-binary-logistic-model-tp3721242p3721997.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Goodness of fit of binary logistic model

2011-08-06 Thread Frank Harrell

Exactly right Peter. Thanks.

There should be some way for me to detect such situations so as to not
result in an impressive P-value. Ideas welcomed!

This is a great example why users should post a toy example on the first
posting, as we can immediately see that this model MUST fit the data, so
that any evidence for lack of fit has to be misleading.

Frank

Peter Dalgaard-2 wrote:

On Aug 5, 2011, at 23:16 , Paul Smith wrote:

Thanks, Frank. The following piece of code generate data, which
exhibit the problem I reported:

-
set.seed(123)
intercept = -1.32
beta = 1.36
xtest = rbinom(1000,1,0.5)
linpred = intercept + xtest*beta
prob = exp(linpred)/(1 + exp(linpred))
runis = runif(1000,0,1)
ytest = ifelse(runis prob,1,0)
xtest - as.factor(xtest)
ytest - as.factor(ytest)
require(rms)
model - lrm(ytest ~ xtest,x=T,y=T)
model
residuals.lrm(model,'gof')
-

Basically, what you have is zero divided by zero, except that floating
point inaccuracy turns it into the ratio of two small numbers. So the Z
statistic is effectively rubbish.
This comes about because the SSE minus its expectation has effectively
zero variance, which makes it rather useless for testing whether the model
fits.

Since the model is basically a full model for a 2x2 table, it is not
surprising to me that goodness of fit tests behave poorly. In fact, I
would conjecture that no sensible g.o.f. test exists for that case.

Paul

On Fri, Aug 5, 2011 at 7:58 PM, Frank Harrell
lt;f.harr...@vanderbilt.edugt; wrote:
Please provide the data or better the R code for simulating the data
that
shows the problem. Then we can look further into this.
Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context:
http://r.789695.n4.nabble.com/Goodness-of-fit-of-binary-logistic-model-tp3721242p3721997.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk Priv: pda...@gmail.com
Døden skal tape! --- Nordahl Grieg

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context:
http://r.789695.n4.nabble.com/Goodness-of-fit-of-binary-logistic-model-tp3721242p3723388.html
Sent from the R help mailing list archive at Nabble.com.

Re: [R] help with predict for cr model using rms package

2011-08-06 Thread Frank Harrell

The help files do not make this clear, but predict and Mean in rms do not
support computation of predicted means for the continuation ratio model. 
The help file for cr.setup shows you how to compute the unconditional
probabilities you would need for doing that.
Frank

apeer wrote:
 
 Dear list,
 
 I'm currently trying to use the rms package to get predicted ordinal
 responses from a conditional ratio model.  As you will see below, my
 model seems to fit well to the data, however, I'm having trouble
 getting predicted mean (or fitted) ordinal response values using the
 predict function.  I have a feeling I'm missing something simple,
 however I haven't been able to determine what that is.  Thanks in
 advance for your help.
 
 Adam
 
 
 dd - datadist(all.data2.stand)
 options(datadist='dd')
 bp.cat2 - all.data2.stand$bp.cat2
 u - cr.setup(bp.cat2)
 u
 
 b.mean -rep(all.data2.stand$b.mean, u$reps)
 r.mean -rep(all.data2.stand$r.mean, u$reps)
 mean.ova.energy - rep(all.data2.stand$mean.ova.energy, u$reps)
 
 y - (u$y)  # constructed binary response
 cohort - u$cohort
 attach(all.data2.stand[u$subs,])
 dd - datadist(dd, cohort)
 
 ord.cr - lrm(y ~ cohort + mean.ova.energy + b.mean + r.mean, x=TRUE,
 y=TRUE, na.action=na.delete)
 summary(ord.cr)
 
 
 
 p.cr - predict(ord.cr, all.data2.stand, type='mean', codes=TRUE)
 pred.mean2 - data.frame(p.cr)
 pred.mean2
 
 ord.cr - lrm(y ~ cohort + mean.ova.energy + b.mean + r.mean, x=TRUE,
 y=TRUE, na.action=na.delete)
 summary(ord.cr)
  Effects  Response : y
 
  Factor  Low  HighDiff.   EffectS.E.
  mean.ova.energy  0.36902 1.00810 0.63906 -2.732000e+01 11.74
   Odds Ratio  0.36902 1.00810 0.63906  0.00e+00NA
  b.mean  -0.98219 0.18109 1.16330 -6.76e+00  3.14
   Odds Ratio -0.98219 0.18109 1.16330  0.00e+00NA
  r.mean  -0.50416 0.89758 1.40170  1.175000e+01  4.84
   Odds Ratio -0.50416 0.89758 1.40170  1.270308e+05NA
  cohort - bp.cat2=2:all  1.0 2.0  NA  4.307000e+01 18.37
   Odds Ratio  1.0 2.0  NA  5.055545e+18NA
  cohort - bp.cat2=3:all  1.0 3.0  NA  5.538000e+01 23.52
   Odds Ratio  1.0 3.0  NA  1.130317e+24NA
  Lower 0.95 Upper 0.95
-50.32   -4.31e+00
  0.001.00e-02
-12.92   -6.10e-01
  0.005.40e-01
  2.272.124000e+01
  9.661.671337e+09
  7.077.907000e+01
   1171.102.182447e+34
  9.291.014700e+02
  10876.061.174706e+44
 ord.cr
 
 Logistic Regression Model
 
 lrm(formula = y ~ cohort + mean.ova.energy + b.mean + r.mean,
 na.action = na.delete, x = TRUE, y = TRUE)
 
   Model Likelihood Discrimination Rank
 Discrim.
  Ratio TestIndexes   Indexes
 
 Obs   182LR chi2 174.09 R2 0.953
 C   0.998
  0   143d.f. 5g33.065
   Dxy 0.996
  139Pr( chi2) 0.0001gr 2.290780e+14
 gamma   0.996
 max |deriv| 6e-07  gp0.338
  tau-a   0.337
 
   Brier 0.013
 
 
 Coef S.E.Wald Z Pr(|Z|)
 Intercept  -20.6064  8.5979 -2.40  0.0165
 cohort=bp.cat2=2  43.0670 18.3684  2.34  0.0190
 cohort=bp.cat2=3  55.3845 23.5159  2.36  0.0185
 mean.ova.energy   -42.7469 18.3663 -2.33  0.0199
 b.mean-5.8150  2.6984 -2.16  0.0312
 r.mean  8.3840  3.4523  2.43  0.0152
 
 p.cr - predict(ord.cr, all.data2.stand, type='mean', codes=TRUE)
 Error in model.frame.default(Terms, newdata, na.action = na.action, ...) :
   variable lengths differ (found for 'mean.ova.energy')
 In addition: Warning message:
 'newdata' had 72 rows but variable(s) found have 182 rows
 pred.mean2 - data.frame(p.cr)
 Error in data.frame(p.cr) : object 'p.cr' not found
 pred.mean2
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/help-with-predict-for-cr-model-using-rms-package-tp3723475p3723763.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help annotating plot.xmean.ordinaly in rms package

2011-08-07 Thread Frank Harrell

Chris for my purposes I need the output tied to the model's variable names,
so I'm not sure I want to add new naming options to the function.  You might
just add a line after nam - nam[-1] to specify your vector of names,
starting with the response variable, e.g., nam - c('velocity
m/s','age','blood pressure').
Frank


Christopher DiBari wrote:
 
 Hello List:
 
 Does anyone know if there's a way to annotate the plots produced by the
 plot.xmean.ordinaly function in the rms package.  I'd like to produce
 publication quality figures, but first need to annotate the axis labels. 
 Is
 there a way to do that?
 
 I appreciate any advice.
 
 Chris
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/help-annotating-plot-xmean-ordinaly-in-rms-package-tp3724864p3724937.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] useR!2012 Nashville TN: Your Input Needed

2011-08-07 Thread Frank Harrell


Dear R and S-Plus Community:

The 2012 R User Conference - useR!2012 - will be held in Nashville  
Tennessee USA, June 12-15, 2012 on the campus of Vanderbilt University.   
We would like to begin estimating the number of attendees, their area of  
interest, and the number seeking hotel vs. lower-cost housing.  If you are  
likely to attend useR!2012 please go to the following link to answer two  
questions: https://redcap.vanderbilt.edu/surveys/?s=2wiqWo


There are many fun things to do in Nashville -- Music City USA -- around  
the time of the meeting.  Vanderbilt is 2.2 miles (3.5 km) from the center  
of the action.


I hope to see many of you at useR!2011 at the University of Warwick in  
Coventry UK in just over a week.


If anyone knows of another site to which I should send this announcement  
please e-mail me your suggestion.


Frank

--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Revolutions Blog: July Roundup

2011-08-09 Thread Frank Harrell

David, as always this is a terrific roundup.  I note in passing that the big
data logistic function rxLogit used on the 1B observation dataset
(impressive run time!) inappropriately used the t distribution for testing
the coefficients in the logistic model [at least if the notation used is any
clue].  It should have used the normal distribution.

Cheers,

Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Revolutions-Blog-July-Roundup-tp3731426p3731597.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] useR! 2012 Nashville TN: Your Input Needed

2011-08-10 Thread Frank Harrell


So far we have received over 70 responses to our survey.  If you have NOT
responded already and are likely to attend useR! 2012, please take the
extremely short survey today.  The link is below.

More information about Nashville may be seen at
http://biostat.mc.vanderbilt.edu/UseR-2012
In a day or two the conference web address will be
http://www.r-project.org/useR-2012/

Thanks!

--

The 2012 R User Conference - useR! 2012 - will be held in Nashville
Tennessee USA, June 12-15, 2012 on the campus of Vanderbilt University.
We would like to begin estimating the number of attendees, their area of
interest, and the number seeking hotel vs. lower-cost housing.  If you are
likely to attend useR! 2012 please go to the following link to answer two
questions: https://redcap.vanderbilt.edu/surveys/?s=2wiqWo

There are many fun things to do in Nashville -- Music City USA -- around
the time of the meeting.  Vanderbilt is 2.2 miles (3.5 km) from the center
of the action.

I hope to see many of you at useR! 2011 at the University of Warwick in
Coventry UK in just over a week.

If anyone knows of another site to which I should send this announcement
please e-mail me your suggestion.

Frank
--
Frank E Harrell Jr Professor and Chairman  School of Medicine
   Department of Biostatistics Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] fit.mult.impute() in Hmisc

2011-08-13 Thread Frank Harrell

For your approach how do you know that either summary or vcov used multiple
imputation?  You are using a non-rms fitting function so be careful. 
Compare with using the lrm fitting function.  Also repace Design with the
rms package.

Please omit confidentiality notices from your e-mails.

Frank


I tried multiple imputation with aregImpute() and
fit.mult.impute() in Hmisc 3.8-3 (June 2010) and R-2.12.1.

The warning message below suggests that summary(f) of
fit.mult.impute() would only use the last imputed data set.
Thus, the whole imputation process is ignored.

  Not using a Design fitting function; summary(fit) 
   will use standard errors, t, P from last imputation only.  
   Use vcov(fit) to get the correct covariance matrix, 
   sqrt(diag(vcov(fit))) to get s.e.

But the standard errors in summary(f) agree with the values
from sqrt(diag(vcov(f))) to the 4th decimal point.  It would
seem that summary(f) actually adjusts for multiple
imputation?

Does summary(f) in Hmisc 3.8-3 actually adjust for MI?

If it does not adjust for MI, then how do I get the
MI-adjusted coefficients and standard errors?

I can't seem to find answers in the documentations, including
rereading section 8.10 of the Harrell (2001) book  Googling
located a thread in R-help back in 2003, which seemed dated.
Many thanks in advance for the help,

Yuelin.
http://idecide.mskcc.org
---
 library(Hmisc)
Loading required package: survival
Loading required package: splines
 data(kyphosis, package = rpart)
 kp - lapply(kyphosis, function(x) 
+   { is.na(x) - sample(1:length(x), size = 10); x })
 kp - data.frame(kp)
 kp$kyp - kp$Kyphosis == present
 set.seed(7)
 imp - aregImpute( ~ kyp + Age + Start + Number, dat = kp, n.impute = 10, 
+  type = pmm, match = closest)
Iteration 13 
 f - fit.mult.impute(kyp ~ Age + Start + Number, fitter=glm, xtrans=imp, 
+ family = binomial, data = kp)

Variance Inflation Factors Due to Imputation:

(Intercept) Age   Start  Number 
   1.061.281.171.12 

Rate of Missing Information:

(Intercept) Age   Start  Number 
   0.060.220.140.10 

d.f. for t-distribution for Tests of Single Coefficients:

(Intercept) Age   Start  Number 
2533.47  193.45  435.79  830.08 

The following fit components were averaged over the 10 model fits:

  fitted.values linear.predictors 

Warning message:
In fit.mult.impute(kyp ~ Age + Start + Number, fitter = glm, xtrans = imp, 
:
  Not using a Design fitting function; summary(fit) will use
standard errors, t, P from last imputation only.  Use vcov(fit) to get the
correct covariance matrix, sqrt(diag(vcov(fit))) to get s.e.


 f

Call:  fitter(formula = formula, family = binomial, data = completed.data)

Coefficients:
(Intercept)  AgeStart   Number  
-3.6971   0.0118  -0.1979   0.6937  

Degrees of Freedom: 80 Total (i.e. Null);  77 Residual
Null Deviance:  80.5 
Residual Deviance: 58   AIC: 66 
 sqrt(diag(vcov(f)))
(Intercept) Age   Start  Number 
  1.5444782   0.0063984   0.0652068   0.2454408 
 -0.1979/0.0652068
[1] -3.0350
 summary(f)

Call:
fitter(formula = formula, family = binomial, data = completed.data)

Deviance Residuals: 
   Min  1Q  Median  3Q Max  
-1.240  -0.618  -0.288  -0.109   2.409  

Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept)  -3.6971 1.5445   -2.39   0.0167
Age   0.0118 0.00641.85   0.0649
Start-0.1979 0.0652   -3.03   0.0024
Number0.6937 0.24542.83   0.0047

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 80.508  on 80  degrees of freedom
Residual deviance: 57.965  on 77  degrees of freedom
AIC: 65.97

Number of Fisher Scoring iterations: 5

 

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/fit-mult-impute-in-Hmisc-tp3419037p3741881.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Labelling all variables at once (using Hmisc label)

2011-08-16 Thread Frank Harrell

Do require(Hmisc); ?label to see the help file for label.  It will show you
how to do this:


Monsieur Do wrote:
 
 The labels can contain much more than just names. In my case, they are
 variable descriptions (items from a questionnaire). I need to keep the
 names as they are, hence the need for Hmisc's labels.
 
 On Mon, Aug 15, 2011 at 3:53 PM, Monsieur Do
 lt;nonaupourr...@yahoo.cagt; wrote:
 I have a dataset and a list of labels. I simply want
 to apply the labels to the variables, all at once. The only way I was able
 to
 do it was using a loop:
 
 for (i in
 1:length(data))  label(data[,i]) -
 data.labels[i]
 
 I'd like to find the non-loop way to do it, using
 apply or the like... Any help appreciated.
  
 Would you not be better off with names()?
  
 data - 1:10
 data.labels - letters[1:10]
 names(data) - data.labels
 data
  a  b  c  d  e  f  g  h  i  j
  1  2  3  4  5  6  7  8  9 10
  
 What are you trying to do with label() that names()
 doesn't accomplish?
  
 -- 
 Sarah Goslee
 http://www.functionaldiversity.org
   [[alternative HTML version deleted]]
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Labelling-all-variables-at-once-using-Hmisc-label-tp3745660p3746928.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] calibration curve for cph()

2011-08-16 Thread Frank Harrell



David Winsemius wrote:
 
 A combination of Predict (your newdata), cut2, and the plotting function
 of your choice ought to suffice. But thought that cross-validation was an
 option. Not at console at the moment (just off airplane.)
 
 Sent from my iPhone
 
 On Aug 15, 2011, at 5:26 PM, array chip lt;arrayprof...@yahoo.comgt;
 wrote:
 
 is there a R function that produces calibration curve on an independetn
 data automatically, just like what calibrate() does on the training data
 itself?
 
 Thanks
 
 John
 
 From: Comcast lt;dwinsem...@comcast.netgt;
 To: array chip lt;arrayprof...@yahoo.comgt;
 Cc: r-help@r-project.org lt;r-help@r-project.orggt;
 Sent: Monday, August 15, 2011 2:04 PM
 Subject: Re: [R] calibration curve for cph()
 
 Build a prediction function using  'Function' that gets applied to set2.
 Calibrate and validate.
 
 -- 
 David
 
 Sent from my iPhone
 
 On Aug 15, 2011, at 11:31 AM, array chip lt;arrayprof...@yahoo.comgt;
 wrote:
 
  Hi, the calibrate.cph() function in rms package generate calibration
 curve for Cox model on the same dataset where the model was derived using
 bootstrapping or cross-validation. If I have the model built on dataset
 1, and now I want to produce a calibration curve for this model on an
 independent dataset 2, how can I do that?
  
  Thanks
  
  John
  
 [[alternative HTML version deleted]]
  
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/calibration-curve-for-cph-tp3745328p3746931.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] contrast package with interactions in gls model

2011-08-17 Thread Frank Harrell

You did not follow the posting guide.  You did not specify which packages you
were using.  It appears that you are mixing the rms package with some other
functions such as gls.  If you want to use rms, use the Gls function instead
of gls, and type ?contrast.rms to see examples of the use of contrast().
Frank

Marylin Bejarano wrote:
 
 Hi!
 
 I try to explain the efffect of (1)  forest where i took samples's soils
 (*
 Lugar*: categorical variable with three levels), (2) nitrogen addition
 treatments (*Tra*: categorical variable with two levels) on total carbon
 concentration's soil samples (*C: *continue* *variable) during four months
 of sampling (*Time:* categorical and ordered variable with four levels).
 
 I fitted the following final model with gls function:
 
 var1-varIdent(form=~ 1| Lugar* factor(Time))
 FINAL-gls(C ~  Tra+ Lugar+ Time + Time*Tra + Tra*Lugar, data=datos,
 weights=var1, method=REML)
 
 the summary function resulted in this first data's set (I omit
 correlation's
 matrix):
 
 Generalized least squares fit by REML
   Model: C ~ Tra + Lugar + Time + Time * Tra + Tra * Lugar
   Data: datos
AIC  BIClogLik
   1129.458 1191.982 -540.7291
 
 Variance function:
  Structure: Different standard deviations per stratum
  Formula: ~1 | Lugar * factor(Time)
  Parameter estimates:
 Chixchulub*0   Xmatkuil*0Hobonil*0 Chixchulub*2   Xmatkuil*2
  Hobonil*2 Chixchulub*3
1.0000.77593240.53008110.96405590.8200742
  0.29665450.9553168
   Xmatkuil*3Hobonil*3 Chixchulub*4   Xmatkuil*4Hobonil*4
1.73502900.34302860.62416580.95739220.4651515
 
 Coefficients:
  Value Std.Error   
 t-value
 p-value
 (Intercept) 260.48540  16.48991  15.796653  0.
 Tra0  -9.38703  23.74893  -0.395261  0.6935
 LugarChixchulub   -0.15377  19.60260  -0.007845  0.9938
 Lugar Hobonil-173.21354  15.89736 -10.895741  0.
 Time2-14.74999  14.55909  -1.013112  0.3135
 Time3 14.42177  15.64594   0.921758  0.3589
 Time4 14.77803  16.72367   0.883659  0.3790
 Tra0:Time2 17.93859  20.78257   0.863156  0.3901
 Tra0:Time3-48.77118  22.17628  -2.199250  0.0302
 Tra0:Time4-52.63611  23.20192  -2.268610  0.0254
 Tra0:LugarChixchulub   74.43956  28.11275   2.647893  0.0094
 Tra0:Lugar Hobonil 43.03416  23.32391   1.845066  0.0680
 
 anova function generated this table:
 
  enom. DF: 100
 numDF   F-value p-value
 (Intercept) 1 1693.1234  .0001
 Tra 15.3225  0.0231
 Lugar   2  247.7047  .0001
  Time30.4767  0.6992
 Tra:Time36.0531  0.0008
 Tra:Lugar   23.5061  0.0338
 
 I want to detetect differences between levels of Tra:Lugar interaction.
 For
 example:
 
 1. Tra0:LugarChixchulub vs Tra1:LugarChixchulub  (between treatment levels
 for same forest) or,
 2. Tra0:LugarChixchulub vs Tra0:LugarHobonil(for same treatment
 among forests levels)
 
 I used function contrast (package contrast) whit following script to probe
 the hypotesis 1.:
 
 con-contrast(FINAL, list(Lugar= 'Xmatkuil', Tra=1),
 list(Lugar='Xmatkuil',
 Tra = 0))
 
 but i found this error message:
 
 Error en gendata.default(fit = list(modelStruct = list(varStruct =
 c(-0.253689933940456,  :
   not enough factors
 
 I would be grateful if somebody tell me I'm doing wrong with my contrast
 function script.
 
 Thanks in advance,
 
 
 Marylin Bejarano
 PHd candidate in Ecology Institute of Mexico's National Autonomous
 University
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/contrast-package-with-interactions-in-gls-model-tp3751255p3751269.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Labelling all variables at once (using Hmisc label)

2011-08-17 Thread Frank Harrell

I'm puzzled.  I provided a solution that did not require looping.
Frank

Monsieur Do wrote:
 
 I did read the help page before posting, but didn't find the direct way...
 My function here works fine. But just for learning purposes, I'd like to
 be able to avoid the loop...
 
 with.labels - function(x, labels=NULL, csvfile=NULL) {
 if(!is.null(csvfile)) labels - read.csv(csvfile, sep=\t, header=F,
 stringsAsFactors=F)[,1]
 for(i in 1:length(x)) label(x[,i]) - labels[i]
 if(length(labels) != length(x)) cat(Warning: data and labels are not of
 same length\n)
 return(x)
 }
 
 Thanks
 
 Message: 11
 Date: Tue, 16 Aug 2011 04:22:07 -0700 (PDT)
 From:
 Frank Harrell lt;f.harr...@vanderbilt.edugt;
 To:
 r-help@r-project.org
 Subject:
 Re: [R] Labelling all variables at once (using Hmisc label)
 Message-ID:
 1313493727519-3746928.p...@n4.nabble.com
 Content-Type:
 text/plain; charset=UTF-8
 
 Do
 require(Hmisc); ?label to see the help file for label.  It will show you
 how to
 do this:
  
 Monsieur
 Do wrote:
 
 I have a dataset and a list of labels. I simply want
 
  to
 apply the labels to the variables, all at once. The only way I was able
  to do
 it was using a loop:
 
  for (i in 1:length(data)) label(data[,i]) -data.labels[i]
 
  I'd like to find the non-loop way to do it, using
 
 
 apply or the like... Any help appreciated.
 
 -
 
 Frank
 Harrell
 Department
 of Biostatistics, Vanderbilt University
   [[alternative HTML version deleted]]
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Labelling-all-variables-at-once-using-Hmisc-label-tp3745660p3751273.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] contrast package with interactions in gls model

2011-08-18 Thread Frank Harrell

Thanks.  The reason I responded that way is that the error message mentioned
a function gendata that is in the rms and Design packages.  I guess it's in
the contrast package too.
Frank

MarylinBejarano wrote:
 
 Hi,
 
 Sorry! I fited model with gls of nlme package. And I trying to contrast
 interaction with function contrast of contrast package.
 
 I thanks for your reply,
 
 Regards.
 
 Marylin Bejarano
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/contrast-package-with-interactions-in-gls-model-tp3751255p3753135.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Labelling all variables at once (using Hmisc label)

2011-08-19 Thread Frank Harrell

Sorry about the nabble problem.  At any rate, do require(Hmisc) then ?label
to see how to associate a vector of labels with all the variables in a data
frame at once.
Frank

do999 wrote:
 
 Indeed, as David pointed out, all the portion that used courier font (all
 the good stuff) was absent from the email posting. 
 
 
 Thanks for your answers.
   [[alternative HTML version deleted]]
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Labelling-all-variables-at-once-using-Hmisc-label-tp3745660p3756459.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 3 4 5 >

1 - 100 of 452 matches

Mail list logo