Re: [R-sig-phylo] Model Selection and PGLS

Theodore Garland Tue, 29 Jun 2021 18:26:44 -0700

All true.  I would just add two things.  First, always graph your data and
do ordinary OLS analyses as a reality check.


Second, I think this is the original paper for phylogenetic prediction:
Garland, Jr., T., and A. R. Ives. 2000. Using the past to predict the
present: confidence intervals for regression equations in phylogenetic
comparative methods. American Naturalist 155:346–364.
There, we talk about the Equivalency of the Independent-Contrasts and
Generalized Least Squares Approaches.

Cheers,
Ted


On Tue, Jun 29, 2021 at 5:01 PM Cecile Ane <cecile....@wisc.edu> wrote:

> Hi Russel,
>
> What you see is the large uncertainty in “ancestral” states, which is part
> of the intercept here. The linear relationship that you overlaid on top of
> your data is the relationship predicted at the root of the tree (as if such
> a thing existed!). There is a lot of uncertainty about the intercept, but
> much less uncertainty in the slope. It looks like the slope is not affected
> by the inclusion or exclusion of monotremes. (for one possible reference on
> the greater precision in the slope versus the intercept, there’s this:
> http://dx.doi.org/10.1214/13-AOS1105 for the BM).
>
> My second cent is that the phylogenetic predictions should be stable. The
> uncertainty in the intercept —and the large effect of including monotremes
> on the intercept— should not affect predictions, so long as you know for
> which species you want to make a prediction. If you want to make prediction
> for a species in a small clade “far” from monotremes, say, then the
> prediction is probably quite stable, even if you include monotremes: this
> is because the phylogenetic prediction should use the phylogenetic
> relationships for the species to be predicted. A prediction that uses the
> linear relationship at the root and ignores the placement of the species
> would be the worst-case scenario: for a mammal species with a completely
> unknown placement within mammals.
>
> There’s probably a number of software that do phylogenetic prediction. I
> know of Rphylopars and PhyloNetworks.
>
> my 2 cents…
> Cecile
>
> ---
> Cécile Ané, Professor (she/her)
> H. I. Romnes Faculty Fellow
> Departments of Statistics and of Botany
> University of Wisconsin - Madison
> www.stat.wisc.edu/~ane/<http://www.stat.wisc.edu/~ane/>
>
> CALS statistical consulting lab:
> https://calslab.cals.wisc.edu/stat-consulting/
>
>
>
> On Jun 29, 2021, at 5:37 PM, neovenatori...@gmail.com<mailto:
> neovenatori...@gmail.com> wrote:
>
> Dear All,
>
> So this is the main problem I'm facing (see attached figure, which should
> be small enough to post). When I calculate the best-fit line under a
> Brownian model, this produces a best-fit line that more or less bypasses
> the distribution of the data altogether. I did some testing and found that
> this result was driven solely by the presence of Monotremata, resulting in
> the model heavily downweighting all of the phylogenetic variation within
> Theria in favor of the deep divergence between Monotremata and Theria.
> Excluding Monotremata produces a PGLS fit that's comparable enough to the
> OLS and OU model fit to be justifiable (though I can't just throw out
> Monotremata for the sake of throwing it out).
>
> I am planning to do a more theoretical investigation into the effect of
> Monotremata on the PGLS fit in a future study, but right now what I am
> trying to do is perform a study in which I use this data to construct a
> regression model that can be used to predict new data. Which is why I am
> trying to use AIC to potentially justify going with OLS or an OU model over
> a Brownian model. From a practical perspective the Brownian model is almost
> unusable because it produces systematically biased estimates with high
> error rates when applied to new data (error rate is roughly double that of
> both the OLS and OU model). This is especially the case because the data
> must be back-transformed into an arithmetic scale to be useable, and thus a
> seemingly minor difference in regression models results in a massive
> difference in predicted values. However, I need some objective test to show
> that OLS fits the data better than the Brownian model, hence why I was
> going with AIC. Overall, OLS does seem to outperform the Brownian model on
> average, but the variation in AIC is so high it is hard to interpret this.
>
> This is kind of why I am leery of assuming a null Brownian model. A
> Brownian model, if anything, does not seem to accurately model the
> relationship between variables.
>
> This is why I am having trouble figuring out how to do model selection.
> Just going with accuracy statistics like percent error or standard error of
> the estimate OLS is better from a purely practical sense (it doesn't work
> for the monotreme taxa, but it turns out that estimate error in the
> monotremes is only decreased by 10% in a Brownian model when it
> overestimates mass by nearly 75%, so the improvement really isn't worth it
> and using this for monotremes isn't recommended in the first place), but
> the reviewers are expressing skepticism over the fact that the Brownian
> model produces less useable results. And I'm not entirely sure the best way
> to go about the PGLS if using one of the birth-death trees isn't ideal,
> perhaps what Dr. Upham says about using the DNA tree might work better.
>
> Ironically, an OU model might be argued to better fit the data, despite
> the concerns that Dr. Bapst mentioned. Looking at the distribution of
> signal even though signal is not random, it is more accurately described as
> most taxa hewing to a stable equilibrium with rapid, high magnitude shifts
> at certain evolutionary nodes, rather than the covariation between the two
> traits evolving in a Brownian fashion. I did some experiments with a PSR
> curve and the results seem to favor an OU model or other models with uneven
> rates of evolution rather than a pure Brownian model.
>
> Of course, the broader issue I am facing is trying to deal with PGLS
> succinctly; the scope of the study isn't necessarily an in-depth comparison
> between different regression models, it's more looking at how this variable
> correlates with body mass for practical purposes (for which considering
> phylogeny is one part of that). It's definitely something to consider but I
> am trying to avoid manuscript bloat.
>
> Sincerely,
> Russell
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Model Selection and PGLS

Reply via email to