Russell, Please read this paper: https://pubmed.ncbi.nlm.nih.gov/10718731/ Cheers Ted
On Wed, Jun 30, 2021, 9:21 PM Russell Engelman <neovenatori...@gmail.com> wrote: > Dear All, > > What you see is the large uncertainty in “ancestral” states, which is part >> of the intercept here. The linear relationship that you overlaid on top of >> your data is the relationship predicted at the root of the tree (as if such >> a thing existed!). There is a lot of uncertainty about the intercept, but >> much less uncertainty in the slope. It looks like the slope is not affected >> by the inclusion or exclusion of monotremes. (for one possible reference on >> the greater precision in the slope versus the intercept, there’s this: >> http://dx.doi.org/10.1214/13-AOS1105 for the BM). > > > Yes, that sounds right from the other data I have. The line approximates > what would be expected for the root of Mammalia, and the signal in the PGLS > is more due to shifts in the y-intercept than shifts in slope, which in > turn is supported by the anatomy of the proxy. > > My second cent is that the phylogenetic predictions should be stable. The >> uncertainty in the intercept —and the large effect of including monotremes >> on the intercept— should not affect predictions, so long as you know for >> which species you want to make a prediction. If you want to make prediction >> for a species in a small clade “far” from monotremes, say, then the >> prediction is probably quite stable, even if you include monotremes: this >> is because the phylogenetic prediction should use the phylogenetic >> relationships for the species to be predicted. A prediction that uses the >> linear relationship at the root and ignores the placement of the species >> would be the worst-case scenario: for a mammal species with a completely >> unknown placement within mammals. > > > This is what I'm a bit confused about. I was always told (and it seemingly > implies this in some of the PGLS literature I read like Rohlf 2011 and > Smaers and Rohlf 2016) that it isn't possible to include phylogenetic data > from the new data points into the prediction in order to improve > predictions. I'm a little confused as to whether it's possible or not (see > below). > > There’s probably a number of software that do phylogenetic prediction. I >> know of Rphylopars and PhyloNetworks. > > > I will take a look into those. > > I think that Cécile' and Theodore' point is important and too often >> overlooked. Using GLS models, the BLUP (Best Linear Unbiased Prediction) is >> not simply obtained from the fitted line but should incorporates >> information from the (evolutionary here) model. > > > There’s a way to impute phylogenetic signal back into a PGLS model? I > am super surprised at that. I’ve talked to at least three different > colleagues who use PGLS about this issue, and all of them had told me that > there is no way to input phylogenetic signal back into the model for new > data points and I should just go with the single regression line the model > gives me (i.e., the regression line for the ancestral node). > > I tried looking around to see what previous researchers used when > using PCM on body mass (Esteban-Trivigno and Köhler 2011, Campione and > Evans 2012, Yapuncich 2017 thesis) and it looks like all of them just went > with the best fit line with the ancestral node, i.e., looking at their > reported results they give a simple trait~predictor equation that does not > include phylogeny when calculating new data. Campion and Evans 2012 used > PIC versus PGLS, which I know are technically equivalent but it doesn't > seem like they included phylogenetic information when they predicted new > data: they used their equations on dinosaurs but there are no dinosaurs in > the tree they used. I know that it’s possible to incorporate phylogenetic > signal into the new data using PVR but PVR has been criticized for other > reasons. > > This is something that seems really, really concerning because if > there is a method of using phylogenetic covariance to adjust the position > of new data points it seems like a lot of workers don’t know these methods > exist, to the point that even published papers overlook it. This was > something I was hoping to highlight in a later paper on the data, but it > sounds like people might have discussed it already. I remember talking with > my colleagues a lot about "isn't there some way to incorporate phylogenetic > information back into the model to improve accuracy of the prediction if we > know where the taxon is positioned?" and they just thought there wasn't a > way. > > Regarding the model comparison, I would simply avoid it (or limit it) by >> fitting models flexible enough to accommodate between your BM and OLS case >> and summarize the results obtained across all the trees… > > > I am not entirely sure what is meant here. Do you mean fitting both an OLS > and BM model and comparing both models? I am reporting both, but my concern > is about which model I report is the best one to use going forward, since > the BM model is seemingly less accurate (though I am just taking the fitted > values from the PGLS model, which I don't think include phylogenetic > information). The two models I use produce dramatically different results, > for example the BM model produces body mass estimates which are 25% larger > than OLS. > > Right now PGLS is something I would avoid if I had the option (if for no > other reason than not put all of the analyses in a single, overloaded > manuscript [the manuscript is already about 90 pages] and deviate from the > scope of the study), but I'm sure you know that most regression analyses > nowadays require some sort of preliminary PCM to be acceptable. > > Sincerely, > Russell > > On Wed, Jun 30, 2021 at 10:24 AM Julien Clavel <julien.cla...@hotmail.fr> > wrote: > >> I think that Cécile' and Theodore' point is important and too often >> overlooked. Using GLS models, the BLUP (Best Linear Unbiased Prediction) is >> not simply obtained from the fitted line but should incorporates >> information from the (evolutionary here) model. >> >> For multivariate linear model you can also do it by specifying a tree >> including both the species used to build the model and the ones you want to >> predict using the “predict” function in mvMORPH (I think that Rphylopars >> can deal with multivariate phylogenetic regression too). >> >> Regarding the model comparison, I would simply avoid it (or limit it) by >> fitting models flexible enough to accommodate between your BM and OLS case >> and summarize the results obtained across all the trees… >> >> Julien >> >> >> De : R-sig-phylo <r-sig-phylo-boun...@r-project.org> de la part de >> Theodore Garland <theodore.garl...@ucr.edu> >> Envoyé : mercredi 30 juin 2021 03:26 >> À : Cecile Ane <cecile....@wisc.edu> >> Cc : mailman, r-sig-phylo <r-sig-phylo@r-project.org>; >> neovenatori...@gmail.com <neovenatori...@gmail.com> >> Objet : Re: [R-sig-phylo] Model Selection and PGLS >> >> All true. I would just add two things. First, always graph your data and >> do ordinary OLS analyses as a reality check. >> >> Second, I think this is the original paper for phylogenetic prediction: >> Garland, Jr., T., and A. R. Ives. 2000. Using the past to predict the >> present: confidence intervals for regression equations in phylogenetic >> comparative methods. American Naturalist 155:346–364. >> There, we talk about the Equivalency of the Independent-Contrasts and >> Generalized Least Squares Approaches. >> >> Cheers, >> Ted >> >> >> On Tue, Jun 29, 2021 at 5:01 PM Cecile Ane <cecile....@wisc.edu> wrote: >> >> > Hi Russel, >> > >> > What you see is the large uncertainty in “ancestral” states, which is >> part >> > of the intercept here. The linear relationship that you overlaid on top >> of >> > your data is the relationship predicted at the root of the tree (as if >> such >> > a thing existed!). There is a lot of uncertainty about the intercept, >> but >> > much less uncertainty in the slope. It looks like the slope is not >> affected >> > by the inclusion or exclusion of monotremes. (for one possible >> reference on >> > the greater precision in the slope versus the intercept, there’s this: >> > http://dx.doi.org/10.1214/13-AOS1105 for the BM). >> > >> > My second cent is that the phylogenetic predictions should be stable. >> The >> > uncertainty in the intercept —and the large effect of including >> monotremes >> > on the intercept— should not affect predictions, so long as you know for >> > which species you want to make a prediction. If you want to make >> prediction >> > for a species in a small clade “far” from monotremes, say, then the >> > prediction is probably quite stable, even if you include monotremes: >> this >> > is because the phylogenetic prediction should use the phylogenetic >> > relationships for the species to be predicted. A prediction that uses >> the >> > linear relationship at the root and ignores the placement of the species >> > would be the worst-case scenario: for a mammal species with a completely >> > unknown placement within mammals. >> > >> > There’s probably a number of software that do phylogenetic prediction. I >> > know of Rphylopars and PhyloNetworks. >> > >> > my 2 cents… >> > Cecile >> > >> > --- >> > Cécile Ané, Professor (she/her) >> > H. I. Romnes Faculty Fellow >> > Departments of Statistics and of Botany >> > University of Wisconsin - Madison >> > www.stat.wisc.edu/~ane/<http://www.stat.wisc.edu/~ane/> >> > >> > CALS statistical consulting lab: >> > https://calslab.cals.wisc.edu/stat-consulting/ >> > >> > >> > >> > On Jun 29, 2021, at 5:37 PM, neovenatori...@gmail.com<mailto: >> > neovenatori...@gmail.com> wrote: >> > >> > Dear All, >> > >> > So this is the main problem I'm facing (see attached figure, which >> should >> > be small enough to post). When I calculate the best-fit line under a >> > Brownian model, this produces a best-fit line that more or less bypasses >> > the distribution of the data altogether. I did some testing and found >> that >> > this result was driven solely by the presence of Monotremata, resulting >> in >> > the model heavily downweighting all of the phylogenetic variation within >> > Theria in favor of the deep divergence between Monotremata and Theria. >> > Excluding Monotremata produces a PGLS fit that's comparable enough to >> the >> > OLS and OU model fit to be justifiable (though I can't just throw out >> > Monotremata for the sake of throwing it out). >> > >> > I am planning to do a more theoretical investigation into the effect of >> > Monotremata on the PGLS fit in a future study, but right now what I am >> > trying to do is perform a study in which I use this data to construct a >> > regression model that can be used to predict new data. Which is why I am >> > trying to use AIC to potentially justify going with OLS or an OU model >> over >> > a Brownian model. From a practical perspective the Brownian model is >> almost >> > unusable because it produces systematically biased estimates with high >> > error rates when applied to new data (error rate is roughly double that >> of >> > both the OLS and OU model). This is especially the case because the data >> > must be back-transformed into an arithmetic scale to be useable, and >> thus a >> > seemingly minor difference in regression models results in a massive >> > difference in predicted values. However, I need some objective test to >> show >> > that OLS fits the data better than the Brownian model, hence why I was >> > going with AIC. Overall, OLS does seem to outperform the Brownian model >> on >> > average, but the variation in AIC is so high it is hard to interpret >> this. >> > >> > This is kind of why I am leery of assuming a null Brownian model. A >> > Brownian model, if anything, does not seem to accurately model the >> > relationship between variables. >> > >> > This is why I am having trouble figuring out how to do model selection. >> > Just going with accuracy statistics like percent error or standard >> error of >> > the estimate OLS is better from a purely practical sense (it doesn't >> work >> > for the monotreme taxa, but it turns out that estimate error in the >> > monotremes is only decreased by 10% in a Brownian model when it >> > overestimates mass by nearly 75%, so the improvement really isn't worth >> it >> > and using this for monotremes isn't recommended in the first place), but >> > the reviewers are expressing skepticism over the fact that the Brownian >> > model produces less useable results. And I'm not entirely sure the best >> way >> > to go about the PGLS if using one of the birth-death trees isn't ideal, >> > perhaps what Dr. Upham says about using the DNA tree might work better. >> > >> > Ironically, an OU model might be argued to better fit the data, despite >> > the concerns that Dr. Bapst mentioned. Looking at the distribution of >> > signal even though signal is not random, it is more accurately >> described as >> > most taxa hewing to a stable equilibrium with rapid, high magnitude >> shifts >> > at certain evolutionary nodes, rather than the covariation between the >> two >> > traits evolving in a Brownian fashion. I did some experiments with a PSR >> > curve and the results seem to favor an OU model or other models with >> uneven >> > rates of evolution rather than a pure Brownian model. >> > >> > Of course, the broader issue I am facing is trying to deal with PGLS >> > succinctly; the scope of the study isn't necessarily an in-depth >> comparison >> > between different regression models, it's more looking at how this >> variable >> > correlates with body mass for practical purposes (for which considering >> > phylogeny is one part of that). It's definitely something to consider >> but I >> > am trying to avoid manuscript bloat. >> > >> > Sincerely, >> > Russell >> > >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > R-sig-phylo mailing list - R-sig-phylo@r-project.org >> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >> > Searchable archive at >> > http://www.mail-archive.com/r-sig-phylo@r-project.org/ >> > >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> R-sig-phylo mailing list - R-sig-phylo@r-project.org >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >> Searchable archive at >> http://www.mail-archive.com/r-sig-phylo@r-project.org/ > > [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/