On Mon, 10 May 2004 15:22:14 +0000 (UTC), [EMAIL PROTECTED] wrote: > A simulation paper by Steyerberg a few years ago showed that the split > half approach is probably too conservative. You're better off estimating > an a priori model and then using the bootstrap for validation. If you've > "always" had success before wiht the split half method, I'd say you've > been a very lucky fellow till now :-) > > Mike Babyak
Straightforward: the failure of replication means that the apparent success in half the sample was due to overfitting. Anyone can check my stats-FAQ for some comments posted years ago about stepwise selection, which is the popular error. Here is something that I posted June 6, 2003, relevant to stepwise. ===== from sci.stat.consult . I was impressed by an argument in this: "Linear model selection by cross-validation", Jun Shao, JASA, vol 88, issue 422 (June 1993), 486-494. available online through JSTOR if you subscribe. It seemed to make a *certain* amount of sense -- he argues, as I understand it, that as N (sample size) gets larger, the training-fraction should approach zero. [He paints some logic, and he satisfies my prejudice, that you need a lot more replication than most folks figure.] I like that conclusion, that replication is tough; I know that I haven't followed all the reasoning. ====== I never did study that paper more, or hear more about it. It seems to me that the approach in decision-trees that was called 'random forests' is perhaps implicitly using tiny samples and searching for multiple replications. - My doubts about tiny samples-plus-replication are summed up by this observation: If you have evidence apparent in multiple, small sections of the sample, then you will have the same effect measured as, say, 0.0001 (or better) in the full sample. Isn't that just another way to achieve 'Bonferroni correction' for doing multiple tests? -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
