Diego Bilski --

> I'm wondering if it would be statistically/philosophically correct to use
> n-fold cross-validation to evaluate a linear regression with independent
> contrasts. My doubt comes from the fact that when simply dividing the IC
> dataset in, lets say, 10 folds, some folds will remove the contrasts of
> internal nodes without necessarily removing an entire clade above that
> point, producing what can be viewed as two independent clades (a graphical
> example would be, in Felsenstein's seminal paper, fig. 8, remove the
> contrast at node 13, while keeping those at nodes 9 and/or 10).

and Ted Garland wrote:

Couldn't you also just do this back at the level of the original tree and
> tip data, creating subsets by pruning the tree before you compute contrasts?
>

Under the model of multivariate normality with Brownian Motion
change along the phylogeny, the contrasts are i.i.d. so of course
one can use them as points for cross-validation.  But of course,
unless the regression is nonlinear, there is already a parametric
framework for distributions of regression coefficients (and other
associated phenomena) in that i.i.d. MVN framework.

The issue of what entities should be sampled in cross-validation
depends on how, at what level, you expect the model to depart from
multivariate normality with Brownian Motion.  Diego and Ted seem to
have some such expectation but I can't see what that alternative
model would be.

Joe
----
Joe Felsenstein         j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to