Hi Vladimir.
I'm not clear on the source of your misunderstanding, so let me do my best.
1) I think that's right is casual shorthand in English. In this case I
just mean that I think that this method will give us the correct
variance on beta (but I'm not sure - hence 'I think'). It certainly
seems better than computing the variance on the estimator from the
single best tree or consensus tree, which ignores phylogenetic
uncertainty; or from the variance in beta across trees, which ignores
sampling error in the estimation of beta on any tree. It seems unlikely
that this would dramatically underestimate the variance in beta.
2) The null hypothesis here is the slope of our fitted model (beta) is
zero, but I don't think we need to focus specifically on null hypothesis
testing. What we're really interested in is getting a true (or at
least better) estimate of the uncertainty in the model coefficient that
takes into account the different sources of error in our data tree. I
just suggested a t-test because people are often interested in getting a
p-value from their fitted model and the p-value from a single consensus
or best tree will surely be too small.
I hope that seems reasonable to you.
All the best, Liam
Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org
On 4/2/2013 7:27 PM, Vladimir Minin wrote:
Hi Liam,
When you say I think that's right, I am not sure I understand what you
mean. In particular, what is your null hypothesis stated mathematically
and what frequentist guarantees do you have for the p-value you are
computing? Typically, we desire the uniform distribution of p-values
under the null. This would be easy to check by simulation, but this
simulation would require to generate both sequence and non-sequence
trait data, in other words we would need to decide what exactly the null
hypothesis is here.
I am suspicious of your approach, because, in general, Neyman-Pearson
hypothesis testing framework has hard time accommodating uncertainty in
nuisance parameters (phylogenies in this case).
Unfortunately, I don't know if a rigorous answer to the original
question posed by John exists, so I cannot suggest a practical
alternative other than building a large Bayesian model that encompasses
uncertainty about all its parameters. In that case, by the way, testing
hypotheses still remains infeasible, because Bayesian framework lacks
tools for testing model fit in the absence of an alternative (posterior
predictive model checks are exceptions, but their
interpretation/calibration is questionable anyway), meaning that
Bayesians can only compare models, but cannot examine the fit of one
(null) model in isolation.
-Vladimir
Vladimir Minin
Assistant Professor
Department of Statistics
University of Washington, Seattle
Padelford Hall C-315, Box 354322
Seattle, WA 98195-4322
telephone: 206-543-4302
web: www.stat.washington.edu/vminin http://www.stat.washington.edu/vminin
On Apr 1, 2013, at 9:04 AM, Liam J. Revell wrote:
Hi John list.
Can I add to this by suggesting (in case it is not clear from Joe's
comment) that we should probably combine the variance among estimates
obtained from different trees in the posterior sample with the
sampling variance of any single estimate?
One fairly sensible way to do this is to compute the variance due to
phylogenetic uncertainty as the variance among estimates obtained from
the trees of the posterior sample; and then to compute the sampling
variance as the mean variance of the estimator from each tree; and
then add the two variances them. The standard error of our estimate
(computed as the mean across trees) is the square-root of this
variance. To conduct a hypothesis test on the regression coefficient,
then, you would compute the mean across trees and then the ratio of
the parameter and its standard error should have a t-distribution with
n-2 degrees of freedom for n taxa (not contrasts).
To do this from a practical perspective from trees in 'multiPhylo'
object (here trees) and data in x y for a simple bivariate
regression, we do the following:
# first define the following custom function
ff-function(tree,x,y){
pic.x-pic(x,tree)
pic.y-pic(y,tree)
fit-lm(pic.y~pic.x-1)
setNames(c(coef(fit),vcov(fit)),c(beta,var(beta)))
}
# now apply to all trees in your sample
BB-t(sapply(trees,ff,x,y))
# total variance in beta estimated by
varBeta-var(BB[,beta])+mean(BB[,var(beta)])
t.beta-mean(BB[,beta])/sqrt(varBeta)
P.beta-2*pt(abs(t.beta),df=length(trees[[1]]$tip)-2,lower.tail=FALSE)
I think that's right.
All the best, Liam
Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu mailto:liam.rev...@umb.edu
blog: http://blog.phytools.org
On 4/1/2013 11:10 AM, Joe Felsenstein wrote:
John D asked:
Given that things have been quiet in