Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-04 Thread John Haart
Dear List and Frank,

I have calculated the log-odds for my models but maybe i am not getting 
something but i am not understanding how for a categorical factor this helps? 
On all the examples i have see it relates to continuous factors where moving 
from one number to another shows either a increase or decrease, not as in my 
case a change of catagory.

Furthermore, this gives the values for each factor independent of each other, 
how do i get the log-odds for the entire model? I appreciate i maybe trying to 
put things in boxes again, i am not i am happy to report the log odds  of 
moving from one response level to the next but would like it for all the 
factors together not independently.

John

Low HighDiff.   Effect  S.E.Lower   Upper
WO  Woody:Non_woody 1   2   
NA  0.280.16-0.04   0.6
Odds Ratio  1   
2   NA  1.32NA  0.961.82
PD  Abiotic:Biotic  2   
1   NA  -1.21   0.13-1.47   -0.96
Odds Ratio  2   
1   NA  0.3 NA  0.230.38
ALT All:Low 3   
1   NA  0.470.190.110.84
Odds Ratio  3   
1   NA  1.6 NA  1.112.31
ALT High:Low3   
2   NA  -0.07   0.14-0.35   0.21
Odds Ratio  3   
2   NA  0.93NA  0.7 1.24
ALT Mid:Low 3   
4   NA  0.390.150.1 0.67
Odds Ratio  3   
4   NA  1.48NA  1.111.96
REG Two_plus:One1   2   
NA  -0.59   0.13-0.84   -0.34
Odds Ratio  1   
2   NA  0.55NA  0.430.72
BIO Arctic:Subtropical/Tropical 4   
1   NA  -1.02   0.81-2.61   0.58
Odds Ratio  4   
1   NA  0.36NA  0.071.78
BIO Boreal:Subtropical/Tropical 4   2   
NA  -1.21   0.81-2.79   0.37
Odds Ratio  4   
2   NA  0.3 NA  0.061.44
BIO Mediterranean:Subtropical/Tropical  4   3   
NA  -1.89   0.48-2.83   -0.95
Odds Ratio  4   
3   NA  0.15NA  0.060.39
BIO Temperate:Subtropical/Tropical  4   5   
NA  -0.09   0.16-0.41   0.23
Odds Ratio  4   
5   NA  0.91NA  0.661.26
On 3 Oct 2010, at 15:29, Frank Harrell wrote:


You still seem to be hung up on making arbitrary classifications.  Instead,
look at tendencies using odds ratios or rank correlation measures.  My book
Regression Modeling Strategies covers this.

Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2953220.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-04 Thread Frank Harrell

I may be missing a point, but the proportional odds model easily gives you
odds ratios for Y=j (independent of j by PO assumption).  Other options
include examining a rank correlation between the linear predictor and Y, or
(if Y is numeric and spacings between categories are meaningful) you can get
predicted mean Y (see the Mean.lrm in the R rms package, a replacement for
the Design package).

Frank 

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2954274.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-03 Thread John Haart
Thanks Frank and Greg, 

This makes alot more sense to me now. I appreciate you are both very busy, but 
i was wondering if i could trouble you for one last piece of advice. As my data 
is a little complicated for a first effort at R let alone modelling!

The response is on a range from 1-6, which indicates extinction risk - 1 being 
least concern and 6 being critical - hence using a ordinal model

The factors (6) are categorical - FRUIT TYPE - fleshy/dry
 HABITAT - terrestrial, 
aquatic, epiphyte

etc etc 

I am asking the question - How do different combinations of factors effect 
extinction risk.

Based on what you have both said i have called

 predict(model1, type=fitted)

Would this be the best way predicting the probability of falling into each 
response category  - 


y=2y=3 y=4 y=5 y=6
10.502220616 0.410236021 0.2892270912 0.2191420568 0.1774250519
20.745221699 0.668501579 0.5412223837 0.4486151612 0.3847379442
30.720381333 0.639796647 0.5095814746 0.4174618165 0.3551631876
40.752321112 0.676811675 0.5505781183 0.4579680710 0.3937100283
50.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585
60.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585
70.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585
80.824388319 0.763956402 0.6543788296 0.5663098186 0.5008981585
90.526291649 0.433739868 0.3094355120 0.2360800803 0.1919312111

I have 100 species for which i have their factors and i want to predict their 
response, so if i do the above and use the newdata function, and present the 
probabilities  as above rather than trying to classify them?

I  tried polr and that classified each response as either 1 or 6 i.e no 
2,3,4,5 - as did calling predict(model1, type=fitted.ind) which resulted in 
the probabilities of being 1 or 6 far outweighing 2,3,4,5 (Below) - this may 
just be that my model is not powefull enough to discrimate effectively as i 
know that is incorrect ( Brier score 2.01, AUC 66.9)?

 EXTINCTION=1 EXTINCTION=2 EXTINCTION=3 EXTINCTION=4 EXTINCTION=5 
EXTINCTION=6
1   0.4977794 0.0919845942  0.121008930  0.070085034 0.0417170048   
0.1774250519
2   0.2547783 0.0767201200  0.127279196  0.092607223 0.0638772170   
0.3847379442
3   0.2796187 0.0805846862  0.130215173  0.092119658 0.0622986289   
0.3551631876
4   0.2476789 0.0755094367  0.126233557  0.092610047 0.0642580427   
0.3937100283
5   0.1756117 0.0604319173  0.109577572  0.088069011 0.0654116601   
0.5008981585
6   0.1756117 0.0604319173  0.109577572  0.088069011 0.0654116601   
0.5008981585
7   0.1756117 0.0604319173  0.109577572  0.088069011 0.0654116601   
0.5008981585
8   0.1756117 0.0604319173  0.109577572  0.088069011 0.0654116601   
0.5008981585
9   0.4737084 0.0925517814  0.124304356  0.073355432 0.0441488692   
0.1919312111
10  0.2489307 0.0757263892  0.126424896  0.092614323 0.0641934484   
0.3921102030

 
Thanks very much for any advice given,

John


10   0.751069260 0.675342871 0.5489179746 0.4563036514 0.3921102030
On 1 Oct 2010, at 23:13, Frank Harrell wrote:


Well put Greg.  The job of the statistician is to produce good estimates
(probabilities in this case).  Those cannot be translated into action
without subject-specific utility functions.  Classification during the
analysis or publication stage is not necessary.

Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2951976.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-03 Thread Frank Harrell

You still seem to be hung up on making arbitrary classifications.  Instead,
look at tendencies using odds ratios or rank correlation measures.  My book
Regression Modeling Strategies covers this.

Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2953220.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-01 Thread Frank Harrell

John,

Don't conclude that one category is the most probable when its probability
of being equaled or exceeded is a maximum.  The first category would always
be the winner if that were the case.

When you say y=best remember that you are dealing with a probability model. 
Nothing is forcing you to classify an observation, and unless the category's
probability is high, this may be dangerous.  You might do well to consider a
more smooth approach such as using the generalized roc area (C-index) or its
related rank correlation measure Dxy.  Also there are odds ratios.

Frank

















-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2891623.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-01 Thread John Haart
Frank,

Thats great thanks for the advice, i appreciate that brier score, AUC etc are a 
better method of validation and discrimination  but when it comes to 
predictions of new data 

 d - data.frame(x1=c(.1,.5),x2=c(.5,.15))

 predict(f, d, type=fitted.ind)
  
 y=good  y=bettery=best
 1 0.3199710 0.3560355 0.3239935
 2 0.4153257 0.3437086 0.2409657
 
 predict mean(y) using codes 1,2,3
 
 
  predict(f, d, type='mean', codes=TRUE)
 
12 
 2.004022 1.825640 

How do i use this information  to assign x1 and x2 into a category on the 
response scale (good,better,best?)

Thanks

John




On 1 Oct 2010, at 12:14, Frank Harrell wrote:


John,

Don't conclude that one category is the most probable when its probability
of being equaled or exceeded is a maximum.  The first category would always
be the winner if that were the case.

When you say y=best remember that you are dealing with a probability model. 
Nothing is forcing you to classify an observation, and unless the category's
probability is high, this may be dangerous.  You might do well to consider a
more smooth approach such as using the generalized roc area (C-index) or its
related rank correlation measure Dxy.  Also there are odds ratios.

Frank

















-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2891623.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-01 Thread Frank Harrell

Why assign them at all?  Is this a forced choice at gunpoint problem? 
Remember what probabilities mean.

Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2909713.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-01 Thread peterfrancis
The reason I am trying to assign them is because I have a data set where i have 
arrived at  the most likely model that describes the data and now I have 
another dataset where I know the factors but not the response.

Therefore, surely I need to assign the predicted values to a response in order 
to say something like: 

Based on the model I believe unknown 1 is good, where as unknown 2 is very good 
etc?

Maybe I am missing something or using the wrong approach but I thought the main 
purpose of using the predict function on new data was to predict the response?

Peter

On 1 Oct 2010, at 14:51, Frank Harrell f.harr...@vanderbilt.edu wrote:

 
 Why assign them at all?  Is this a forced choice at gunpoint problem? 
 Remember what probabilities mean.
 
 Frank
 
 -
 Frank Harrell
 Department of Biostatistics, Vanderbilt University
 -- 
 View this message in context: 
 http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2909713.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-01 Thread Greg Snow
I have this discussion fairly often with doctors that I work with.  The issue 
is that you can certainly predict from a model, but you can predict on 
different scales.  Let's consider the simpler case of just 2 outcomes (disease 
yes/no):

Let's say you have 4 patients that you want to predict their disease status 
using their symptoms and a model, on the probability scale patient A is 
predicted to have 5% chance of yes, patient B is 49%, patient C is 51% and 
patient D is 95% probability of yes.  If we collapse this to just a prediction 
of yes/no then that means that we will treat A and B the same with a prediction 
of NO and patients C and D the same with a prediction of YES.  But does it 
really make sense to treat B and C so differently (they are only 2 percentage 
points different) while treating them the same as A or D?

If I were one of the patients I would want to know whether my probability of 
disease was 51% or 95%, not just a yes. 

With 3 groups wouldn't you want to know the difference between 33%, 33%, 34% 
and 2%, 8%, 90%?

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of peterfran...@me.com
 Sent: Friday, October 01, 2010 8:23 AM
 To: Frank Harrell
 Cc: r-help@r-project.org
 Subject: Re: [R] Interpreting the example given by Frank Harrell in the
 predict.lrm {Design} help
 
 The reason I am trying to assign them is because I have a data set
 where i have arrived at  the most likely model that describes the data
 and now I have another dataset where I know the factors but not the
 response.
 
 Therefore, surely I need to assign the predicted values to a response
 in order to say something like:
 
 Based on the model I believe unknown 1 is good, where as unknown 2 is
 very good etc?
 
 Maybe I am missing something or using the wrong approach but I thought
 the main purpose of using the predict function on new data was to
 predict the response?
 
 Peter
 
 On 1 Oct 2010, at 14:51, Frank Harrell f.harr...@vanderbilt.edu
 wrote:
 
 
  Why assign them at all?  Is this a forced choice at gunpoint
 problem?
  Remember what probabilities mean.
 
  Frank
 
  -
  Frank Harrell
  Department of Biostatistics, Vanderbilt University
  --
  View this message in context:
 http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-
 Harrell-in-the-predict-lrm-Design-help-tp2883311p2909713.html
  Sent from the R help mailing list archive at Nabble.com.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

2010-10-01 Thread Frank Harrell

Well put Greg.  The job of the statistician is to produce good estimates
(probabilities in this case).  Those cannot be translated into action
without subject-specific utility functions.  Classification during the
analysis or publication stage is not necessary.

Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2951976.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.