Re: [R] variable selection in logistic

milton ruser Thu, 03 Sep 2009 13:32:28 -0700

Hi Annie,

What kind of data (response and explanatory) you have?
As a ecological modeller, I always think first what
is my (our) mind models. After that I analyze a set of
concorrent - *but with ecological meaning* - models.
It helps me to test hypothesis as well as to discuss
the results. If you lump your data, they will tell you anything,
and you will find a `why` for every results (positive or
negative, good or bad)...


good luck

miltinho
On Thu, Sep 3, 2009 at 4:20 PM, annie Zhang <annie.zhang2...@gmail.com>wrote:

> Hi, Frank,
>
> If I want to do prediction as well as to select important predictors, which
> may be the best function to use when I have 35 samples and 35 predictors
> (penalized logistic with variable selection)? I saw there is a 'fastbw'
> function in the Design package. And there is a 'step.plr' function in the
> 'stepPlr' package.
>
> Thank you,
>
> Annie
>
> On Thu, Sep 3, 2009 at 10:11 AM, Frank E Harrell Jr <
> f.harr...@vanderbilt.edu> wrote:
>
> > annie Zhang wrote:
> >
> >> Thank you for all your reply.
> >> Actually as Bert said, besides predicion, I also need variable selection
> >> (I need to know which variables are important). As far as the sample
> size
> >> and number of variables, both of them are small around 35. How can I get
> >> accurate prediction as long as good predictors?
> >> Annie
> >>
> >
> > It is next to impossible to find a unique list of 'important' variables
> > without having 50 times as many subjects as potential predictors, unless
> > your signal:noise ratio is stunning.
> >
> > Frank
> >
> >
> >> On Thu, Sep 3, 2009 at 8:28 AM, Bert Gunter <gunter.ber...@gene.com
> <mailto:
> >> gunter.ber...@gene.com>> wrote:
> >>
> >>    But let's be clear here folks:
> >>
> >>    Ben's comment is apropos: ""As many variables as samples" is
> >>    particularly
> >>    scary."
> >>
> >>    (Aside -- how much scarier then are -omics analyses in which the
> >>    number of
> >>    variables is thousands of times the number of samples?)
> >>
> >>    Sensible penalization (it's usually not too sensitive to the details)
> >> is
> >>    only another way of obtaining a parsimonious model with good (in the
> >>    sense
> >>    of minimizing overall prediction error: bias + variance) prediction
> >>    properties. Alas, this is often not what scientists want: they use
> >>    variable
> >>    selection to find the "right" covariates, the "most important"
> >> variables
> >>    affecting the response. But this is beyond the power of empirical
> >>    modeling
> >>    here: "as many variables as samples" almost guarantees that there
> >>    will be
> >>    many different and even nonoverlapping subsets of variables that
> >>    are, within
> >>    statistical noise, equally "optimal" predictors. That is, variable
> >>    selection
> >>    in such circumstances is just a pretty sophisticated random number
> >>    generator
> >>    -- ergo Frank's Draconian warnings. Penalization produces better
> >>    prediction
> >>    engines with better properties, but it cannot overcome the "as many
> >>    variables as samples" problem either. Entropy rules. If what is
> >>    sought is a
> >>    way to determine the "truly important" variables, then the study must
> >> be
> >>    designed to provide the information to do so. You don't get
> >>    something for
> >>    nothing.
> >>
> >>    Cheers,
> >>
> >>    Bert Gunter
> >>    Genentech Nonclinical Biostatistics
> >>
> >>
> >>    -----Original Message-----
> >>    From: r-help-boun...@r-project.org
> >>    <mailto:r-help-boun...@r-project.org>
> >>    [mailto:r-help-boun...@r-project.org
> >>    <mailto:r-help-boun...@r-project.org>] On
> >>    Behalf Of Frank E Harrell Jr
> >>    Sent: Wednesday, September 02, 2009 9:07 PM
> >>    To: annie Zhang
> >>    Cc: r-help@r-project.org <mailto:r-help@r-project.org>
> >>    Subject: Re: [R] variable selection in logistic
> >>
> >>    annie Zhang wrote:
> >>     > Hi, Frank,
> >>     >
> >>     > You mean the backward and forward stepwise selection is bad? You
> >> also
> >>     > suggest the penalized logistic regression is the best choice? Is
> >>    there
> >>     > any function to do it as well as selecting the best penalty?
> >>     >
> >>     > Annie
> >>
> >>    All variable selection is bad unless its in the context of
> >> penalization.
> >>     You'll need penalized logistic regression not necessarily with
> >>    variable selection, for example a quadratic penalty as in a case
> study
> >>    in my book, or an L1 penalty (lasso) using other packages.
> >>
> >>    Frank
> >>
> >>     >
> >>     > On Wed, Sep 2, 2009 at 7:41 PM, Frank E Harrell Jr
> >>     > <f.harr...@vanderbilt.edu <mailto:f.harr...@vanderbilt.edu>
> >>    <mailto:f.harr...@vanderbilt.edu <mailto:f.harr...@vanderbilt.edu>>>
> >>
> >>    wrote:
> >>     >
> >>     >     David Winsemius wrote:
> >>     >
> >>     >
> >>     >         On Sep 2, 2009, at 9:36 PM, annie Zhang wrote:
> >>     >
> >>     >             Hi, R users,
> >>     >
> >>     >             What may be the best function in R to do variable
> >>    selection
> >>     >             in logistic
> >>     >             regression?
> >>     >
> >>     >
> >>     >         PhD theses, and books by famous statisticians have been
> >>    pursuing
> >>     >         the answer to that question for decades.
> >>     >
> >>     >             I have the same number of variables as the number of
> >>    samples,
> >>     >             and I want to select the best variablesfor prediction.
> >> Is
> >>     >             there any function
> >>     >             doing forward selection followed by backward
> >>    elimination in
> >>     >             stepwise
> >>     >             logistic regression?
> >>     >
> >>     >
> >>     >         You should probably be reading up on penalized regression
> >>     >         methods. The stepwise procedures reporting unadjusted
> >>     >         "significance" made available by SAS and SPSS to the
> unwary
> >>     >         neophyte user have very poor statistical properties.
> >>     >
> >>     >         --
> >>     >
> >>     >         David Winsemius, MD
> >>     >
> >>     >
> >>     >     Amen to that.
> >>     >
> >>     >     Annie, resist the temptation.  These methods bite.
> >>     >
> >>     >     Frank
> >>     >
> >>     >
> >>     >         Heritage Laboratories
> >>     >         West Hartford, CT
> >>     >
> >>     >         ______________________________________________
> >>     >         R-help@r-project.org <mailto:R-help@r-project.org>
> >>    <mailto:R-help@r-project.org <mailto:R-help@r-project.org>> mailing
> >> list
> >>     >         https://stat.ethz.ch/mailman/listinfo/r-help
> >>     >         PLEASE do read the posting guide
> >>     >         
> >> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> <http://www.r-project.org/posting-guide.html>
> >>    <http://www.r-project.org/posting-guide.html>
> >>     >         <http://www.r-project.org/posting-guide.html>
> >>     >         and provide commented, minimal, self-contained,
> >>    reproducible code.
> >>     >
> >>     >
> >>     >
> >>     >     --
> >>     >     Frank E Harrell Jr   Professor and Chair           School of
> >>    Medicine
> >>     >                         Department of Biostatistics   Vanderbilt
> >>    University
> >>     >
> >>     >
> >>
> >>
> >>    --
> >>    Frank E Harrell Jr   Professor and Chair           School of Medicine
> >>                         Department of Biostatistics   Vanderbilt
> >> University
> >>
> >>    ______________________________________________
> >>    R-help@r-project.org <mailto:R-help@r-project.org> mailing list
> >>    https://stat.ethz.ch/mailman/listinfo/r-help
> >>    PLEASE do read the posting guide
> >>    
> >> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> <http://www.r-project.org/posting-guide.html>
> >>    <http://www.r-project.org/posting-guide.html>
> >>    and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >>
> >
> > --
> > Frank E Harrell Jr   Professor and Chair           School of Medicine
> >                     Department of Biostatistics   Vanderbilt University
> >
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] variable selection in logistic

Reply via email to