Hi Annie, What kind of data (response and explanatory) you have? As a ecological modeller, I always think first what is my (our) mind models. After that I analyze a set of concorrent - *but with ecological meaning* - models. It helps me to test hypothesis as well as to discuss the results. If you lump your data, they will tell you anything, and you will find a `why` for every results (positive or negative, good or bad)...
good luck miltinho On Thu, Sep 3, 2009 at 4:20 PM, annie Zhang <annie.zhang2...@gmail.com>wrote: > Hi, Frank, > > If I want to do prediction as well as to select important predictors, which > may be the best function to use when I have 35 samples and 35 predictors > (penalized logistic with variable selection)? I saw there is a 'fastbw' > function in the Design package. And there is a 'step.plr' function in the > 'stepPlr' package. > > Thank you, > > Annie > > On Thu, Sep 3, 2009 at 10:11 AM, Frank E Harrell Jr < > f.harr...@vanderbilt.edu> wrote: > > > annie Zhang wrote: > > > >> Thank you for all your reply. > >> Actually as Bert said, besides predicion, I also need variable selection > >> (I need to know which variables are important). As far as the sample > size > >> and number of variables, both of them are small around 35. How can I get > >> accurate prediction as long as good predictors? > >> Annie > >> > > > > It is next to impossible to find a unique list of 'important' variables > > without having 50 times as many subjects as potential predictors, unless > > your signal:noise ratio is stunning. > > > > Frank > > > > > >> On Thu, Sep 3, 2009 at 8:28 AM, Bert Gunter <gunter.ber...@gene.com > <mailto: > >> gunter.ber...@gene.com>> wrote: > >> > >> But let's be clear here folks: > >> > >> Ben's comment is apropos: ""As many variables as samples" is > >> particularly > >> scary." > >> > >> (Aside -- how much scarier then are -omics analyses in which the > >> number of > >> variables is thousands of times the number of samples?) > >> > >> Sensible penalization (it's usually not too sensitive to the details) > >> is > >> only another way of obtaining a parsimonious model with good (in the > >> sense > >> of minimizing overall prediction error: bias + variance) prediction > >> properties. Alas, this is often not what scientists want: they use > >> variable > >> selection to find the "right" covariates, the "most important" > >> variables > >> affecting the response. But this is beyond the power of empirical > >> modeling > >> here: "as many variables as samples" almost guarantees that there > >> will be > >> many different and even nonoverlapping subsets of variables that > >> are, within > >> statistical noise, equally "optimal" predictors. That is, variable > >> selection > >> in such circumstances is just a pretty sophisticated random number > >> generator > >> -- ergo Frank's Draconian warnings. Penalization produces better > >> prediction > >> engines with better properties, but it cannot overcome the "as many > >> variables as samples" problem either. Entropy rules. If what is > >> sought is a > >> way to determine the "truly important" variables, then the study must > >> be > >> designed to provide the information to do so. You don't get > >> something for > >> nothing. > >> > >> Cheers, > >> > >> Bert Gunter > >> Genentech Nonclinical Biostatistics > >> > >> > >> -----Original Message----- > >> From: r-help-boun...@r-project.org > >> <mailto:r-help-boun...@r-project.org> > >> [mailto:r-help-boun...@r-project.org > >> <mailto:r-help-boun...@r-project.org>] On > >> Behalf Of Frank E Harrell Jr > >> Sent: Wednesday, September 02, 2009 9:07 PM > >> To: annie Zhang > >> Cc: r-help@r-project.org <mailto:r-help@r-project.org> > >> Subject: Re: [R] variable selection in logistic > >> > >> annie Zhang wrote: > >> > Hi, Frank, > >> > > >> > You mean the backward and forward stepwise selection is bad? You > >> also > >> > suggest the penalized logistic regression is the best choice? Is > >> there > >> > any function to do it as well as selecting the best penalty? > >> > > >> > Annie > >> > >> All variable selection is bad unless its in the context of > >> penalization. > >> You'll need penalized logistic regression not necessarily with > >> variable selection, for example a quadratic penalty as in a case > study > >> in my book, or an L1 penalty (lasso) using other packages. > >> > >> Frank > >> > >> > > >> > On Wed, Sep 2, 2009 at 7:41 PM, Frank E Harrell Jr > >> > <f.harr...@vanderbilt.edu <mailto:f.harr...@vanderbilt.edu> > >> <mailto:f.harr...@vanderbilt.edu <mailto:f.harr...@vanderbilt.edu>>> > >> > >> wrote: > >> > > >> > David Winsemius wrote: > >> > > >> > > >> > On Sep 2, 2009, at 9:36 PM, annie Zhang wrote: > >> > > >> > Hi, R users, > >> > > >> > What may be the best function in R to do variable > >> selection > >> > in logistic > >> > regression? > >> > > >> > > >> > PhD theses, and books by famous statisticians have been > >> pursuing > >> > the answer to that question for decades. > >> > > >> > I have the same number of variables as the number of > >> samples, > >> > and I want to select the best variablesfor prediction. > >> Is > >> > there any function > >> > doing forward selection followed by backward > >> elimination in > >> > stepwise > >> > logistic regression? > >> > > >> > > >> > You should probably be reading up on penalized regression > >> > methods. The stepwise procedures reporting unadjusted > >> > "significance" made available by SAS and SPSS to the > unwary > >> > neophyte user have very poor statistical properties. > >> > > >> > -- > >> > > >> > David Winsemius, MD > >> > > >> > > >> > Amen to that. > >> > > >> > Annie, resist the temptation. These methods bite. > >> > > >> > Frank > >> > > >> > > >> > Heritage Laboratories > >> > West Hartford, CT > >> > > >> > ______________________________________________ > >> > R-help@r-project.org <mailto:R-help@r-project.org> > >> <mailto:R-help@r-project.org <mailto:R-help@r-project.org>> mailing > >> list > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide > >> > > >> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > <http://www.r-project.org/posting-guide.html> > >> <http://www.r-project.org/posting-guide.html> > >> > <http://www.r-project.org/posting-guide.html> > >> > and provide commented, minimal, self-contained, > >> reproducible code. > >> > > >> > > >> > > >> > -- > >> > Frank E Harrell Jr Professor and Chair School of > >> Medicine > >> > Department of Biostatistics Vanderbilt > >> University > >> > > >> > > >> > >> > >> -- > >> Frank E Harrell Jr Professor and Chair School of Medicine > >> Department of Biostatistics Vanderbilt > >> University > >> > >> ______________________________________________ > >> R-help@r-project.org <mailto:R-help@r-project.org> mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> > >> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > <http://www.r-project.org/posting-guide.html> > >> <http://www.r-project.org/posting-guide.html> > >> and provide commented, minimal, self-contained, reproducible code. > >> > >> > >> > > > > -- > > Frank E Harrell Jr Professor and Chair School of Medicine > > Department of Biostatistics Vanderbilt University > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.