Diego Bilski -- > I'm wondering if it would be statistically/philosophically correct to use > n-fold cross-validation to evaluate a linear regression with independent > contrasts. My doubt comes from the fact that when simply dividing the IC > dataset in, lets say, 10 folds, some folds will remove the contrasts of > internal nodes without necessarily removing an entire clade above that > point, producing what can be viewed as two independent clades (a graphical > example would be, in Felsenstein's seminal paper, fig. 8, remove the > contrast at node 13, while keeping those at nodes 9 and/or 10).
and Ted Garland wrote: Couldn't you also just do this back at the level of the original tree and > tip data, creating subsets by pruning the tree before you compute contrasts? > Under the model of multivariate normality with Brownian Motion change along the phylogeny, the contrasts are i.i.d. so of course one can use them as points for cross-validation. But of course, unless the regression is nonlinear, there is already a parametric framework for distributions of regression coefficients (and other associated phenomena) in that i.i.d. MVN framework. The issue of what entities should be sampled in cross-validation depends on how, at what level, you expect the model to depart from multivariate normality with Brownian Motion. Diego and Ted seem to have some such expectation but I can't see what that alternative model would be. Joe ---- Joe Felsenstein j...@gs.washington.edu Department of Genome Sciences and Department of Biology, University of Washington, Box 355065, Seattle, WA 98195-5065 USA [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/