Diego Bilski --

> I'm wondering if it would be statistically/philosophically correct to use
> n-fold cross-validation to evaluate a linear regression with independent
> contrasts. My doubt comes from the fact that when simply dividing the IC
> dataset in, lets say, 10 folds, some folds will remove the contrasts of
> internal nodes without necessarily removing an entire clade above that
> point, producing what can be viewed as two independent clades (a graphical
> example would be, in Felsenstein's seminal paper, fig. 8, remove the
> contrast at node 13, while keeping those at nodes 9 and/or 10).

and Ted Garland wrote:

Couldn't you also just do this back at the level of the original tree and
> tip data, creating subsets by pruning the tree before you compute contrasts?

Under the model of multivariate normality with Brownian Motion
change along the phylogeny, the contrasts are i.i.d. so of course
one can use them as points for cross-validation.  But of course,
unless the regression is nonlinear, there is already a parametric
framework for distributions of regression coefficients (and other
associated phenomena) in that i.i.d. MVN framework.

The issue of what entities should be sampled in cross-validation
depends on how, at what level, you expect the model to depart from
multivariate normality with Brownian Motion.  Diego and Ted seem to
have some such expectation but I can't see what that alternative
model would be.

Joe Felsenstein         j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

        [[alternative HTML version deleted]]

R-sig-phylo mailing list - R-sig-phylo@r-project.org
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to