All true. I would just add two things. First, always graph your data and do ordinary OLS analyses as a reality check.
Second, I think this is the original paper for phylogenetic prediction: Garland, Jr., T., and A. R. Ives. 2000. Using the past to predict the present: confidence intervals for regression equations in phylogenetic comparative methods. American Naturalist 155:346–364. There, we talk about the Equivalency of the Independent-Contrasts and Generalized Least Squares Approaches. Cheers, Ted On Tue, Jun 29, 2021 at 5:01 PM Cecile Ane <cecile....@wisc.edu> wrote: > Hi Russel, > > What you see is the large uncertainty in “ancestral” states, which is part > of the intercept here. The linear relationship that you overlaid on top of > your data is the relationship predicted at the root of the tree (as if such > a thing existed!). There is a lot of uncertainty about the intercept, but > much less uncertainty in the slope. It looks like the slope is not affected > by the inclusion or exclusion of monotremes. (for one possible reference on > the greater precision in the slope versus the intercept, there’s this: > http://dx.doi.org/10.1214/13-AOS1105 for the BM). > > My second cent is that the phylogenetic predictions should be stable. The > uncertainty in the intercept —and the large effect of including monotremes > on the intercept— should not affect predictions, so long as you know for > which species you want to make a prediction. If you want to make prediction > for a species in a small clade “far” from monotremes, say, then the > prediction is probably quite stable, even if you include monotremes: this > is because the phylogenetic prediction should use the phylogenetic > relationships for the species to be predicted. A prediction that uses the > linear relationship at the root and ignores the placement of the species > would be the worst-case scenario: for a mammal species with a completely > unknown placement within mammals. > > There’s probably a number of software that do phylogenetic prediction. I > know of Rphylopars and PhyloNetworks. > > my 2 cents… > Cecile > > --- > Cécile Ané, Professor (she/her) > H. I. Romnes Faculty Fellow > Departments of Statistics and of Botany > University of Wisconsin - Madison > www.stat.wisc.edu/~ane/<http://www.stat.wisc.edu/~ane/> > > CALS statistical consulting lab: > https://calslab.cals.wisc.edu/stat-consulting/ > > > > On Jun 29, 2021, at 5:37 PM, neovenatori...@gmail.com<mailto: > neovenatori...@gmail.com> wrote: > > Dear All, > > So this is the main problem I'm facing (see attached figure, which should > be small enough to post). When I calculate the best-fit line under a > Brownian model, this produces a best-fit line that more or less bypasses > the distribution of the data altogether. I did some testing and found that > this result was driven solely by the presence of Monotremata, resulting in > the model heavily downweighting all of the phylogenetic variation within > Theria in favor of the deep divergence between Monotremata and Theria. > Excluding Monotremata produces a PGLS fit that's comparable enough to the > OLS and OU model fit to be justifiable (though I can't just throw out > Monotremata for the sake of throwing it out). > > I am planning to do a more theoretical investigation into the effect of > Monotremata on the PGLS fit in a future study, but right now what I am > trying to do is perform a study in which I use this data to construct a > regression model that can be used to predict new data. Which is why I am > trying to use AIC to potentially justify going with OLS or an OU model over > a Brownian model. From a practical perspective the Brownian model is almost > unusable because it produces systematically biased estimates with high > error rates when applied to new data (error rate is roughly double that of > both the OLS and OU model). This is especially the case because the data > must be back-transformed into an arithmetic scale to be useable, and thus a > seemingly minor difference in regression models results in a massive > difference in predicted values. However, I need some objective test to show > that OLS fits the data better than the Brownian model, hence why I was > going with AIC. Overall, OLS does seem to outperform the Brownian model on > average, but the variation in AIC is so high it is hard to interpret this. > > This is kind of why I am leery of assuming a null Brownian model. A > Brownian model, if anything, does not seem to accurately model the > relationship between variables. > > This is why I am having trouble figuring out how to do model selection. > Just going with accuracy statistics like percent error or standard error of > the estimate OLS is better from a purely practical sense (it doesn't work > for the monotreme taxa, but it turns out that estimate error in the > monotremes is only decreased by 10% in a Brownian model when it > overestimates mass by nearly 75%, so the improvement really isn't worth it > and using this for monotremes isn't recommended in the first place), but > the reviewers are expressing skepticism over the fact that the Brownian > model produces less useable results. And I'm not entirely sure the best way > to go about the PGLS if using one of the birth-death trees isn't ideal, > perhaps what Dr. Upham says about using the DNA tree might work better. > > Ironically, an OU model might be argued to better fit the data, despite > the concerns that Dr. Bapst mentioned. Looking at the distribution of > signal even though signal is not random, it is more accurately described as > most taxa hewing to a stable equilibrium with rapid, high magnitude shifts > at certain evolutionary nodes, rather than the covariation between the two > traits evolving in a Brownian fashion. I did some experiments with a PSR > curve and the results seem to favor an OU model or other models with uneven > rates of evolution rather than a pure Brownian model. > > Of course, the broader issue I am facing is trying to deal with PGLS > succinctly; the scope of the study isn't necessarily an in-depth comparison > between different regression models, it's more looking at how this variable > correlates with body mass for practical purposes (for which considering > phylogeny is one part of that). It's definitely something to consider but I > am trying to avoid manuscript bloat. > > Sincerely, > Russell > > > [[alternative HTML version deleted]] > > _______________________________________________ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/