Dear Russell,
I was not talking about the OLS residuals' (which is indeed expected to behave
better than GLS when there are errors in variables – see the reference I cited
last time) but about the residuals of your GLS fit, since this is this one
which have apparently a suspect slope.
Note also that if there are trends driven by two clades in your residuals, this
is likely a model design issue (e.g., a single regression line is not
sufficient to model your data. It seems that you have a different relationship
within Cetacea in your plot for instance). Using a (P)GLS instead of OLS will
not solve the problem, both approaches offer an unbiased estimate for the
slope! Provided there’s no observation errors… but when this happens you can
try to correct the bias using, for instance, the reliability ratio.
Yes, the sampling error can have mixed sources (some can be “biological”). If
you have only information about the variance for some species maybe you can
still approximate the value for the others by using a pooled estimate?
Best wishes,
Julien
De : Russell Engelman
Envoyé : vendredi 22 octobre 2021 00:54
À : Julien Clavel ; mailman, r-sig-phylo
Objet : Re: [R-sig-phylo] Irregularity in PGLS Slope Driven By Scope of Taxon
Selection
Dear Dr. Clavel,
If you plot the residuals against your predictor they will likely be correlated
in this case.
I'm not sure this is the case. For the OLS fit, when I plot a residuals versus
fits plot the results are mostly linear and suggestive of normality. There is
some non-random distribution of the residuals, but this is driven by two clades
that end up biasing the fit and is part of the reason I am trying to see if
PGLS methods produce more reasonable results.
The scale-location plot suggests increasing variance in residuals with
increasing size, but this also appears to be driven by the two clades that were
biasing the fit under OLS and overall show reduced correlation between brain
and body size. Thus the heteroskedasticity in this plot is driven by biological
variation rather than measurement error. Excluding these two groups produces a
scale-location plot where the log residuals are homoskedastic.
I would guess that there’s likely less or as much uncertainty in the estimate
of brain size than for body size across mammals if both were independently
estimated.
This seems to be what Pagel and Harvey (1988) were suggesting, that somehow
error variation in body size was driving shallower slopes in body size among
mammals (within-genus regressions had shallow slopes, then within-family, then
within-order). However, it wasn't quite clear what they meant by sampling error
(e.g., the imprecision in the actual measurement, or the intraspecific
variation in body mass due to body condition). I think it sounds reasonable
that this is probably the case.
Assuming you can obtain an estimate for this error, it’s usually possible to
correct this bias. An alternative is to include another “instrumental variable”
as covariate.
How would one go about doing this? I ask because most of this data comes from
prior literature sources and many times standard deviations in the variables
are not reported. Some of the data come from single individuals due to limited
availability of specimens in the parent study(/ies). I saw that Hansen &
Bartoszek 2012 mention a "reliability ratio" that they used to correct the
data, but I'm not exactly sure if this is the same thing.
Sincerely,
Russell
On Wed, Oct 20, 2021 at 10:21 AM Julien Clavel wrote:
Hi Russell,
Just a hint, but this type of bias (assuming there’s no formatting issues with
the data), often shows up when there’s considerable (non-random) errors in the
predictors (we talk about "error in variable models"). If you plot the
residuals against your predictor they will likely be correlated in this case. I
would guess that there’s likely less or as much uncertainty in the estimate of
brain size than for body size across mammals if both were independently
estimated. You can see for instance Morton-Jones & Henderson 2000
(Technometrics) for GLS in general, and Hansen & Bartoszek 2012 (Systematic
Biology) for the (P)GLS case.
Assuming you can obtain an estimate for this error, it’s usually possible to
correct this bias. An alternative is to include another “instrumental variable”
as covariate.
Best wishes,
Julien
De : R-sig-phylo de la part de Russell
Engelman
Envoyé : mercredi 20 octobre 2021 04:29
À : mailman, r-sig-phylo
Objet : [R-sig-phylo] Irregularity in PGLS Slope Driven By Scope of Taxon
Selection
Dear R-Sig-Phylo,
I'm having a very strange issue with PGLS in R and I was wondering if anyone
had seen this before.
I've been doing some work with brain size in mammals, using the dataset of
Burger et al. 2019 as a base. The data here is using the dataset of Burger et
al. 2019, but it happens as well with my own data.
I have been trying to calculate a PGLS fit