No matter how the performance of the model is measured (precision, recall, MSE, correlation), we always need to measure on the test set, not on the training set. Performance on the training only tells us that the model learns what it's supposed to learn. It is not a good indicator of performance on unseen data. The test set can be obtained using an independent sample or holdout techniques (cross-validation, leave-one-out). To meaningfully compare the performance of two algorithms for a given type of data, we need to compute if a difference in performance is significant. We also need to compare performance against a baseline (chance or frequency).
References http://www.mccombs.utexas.edu/faculty/Maytal.Saar-Tsechansky/Teaching/MIS_373/Fall2004/Model%20Evaluation.ppt http://research.cs.tamu.edu/prism/lectures/iss/iss_l13.pdf http://homepages.inf.ed.ac.uk/keller/teaching/internet/lecture_evaluation.pdf Mitchell, Tom. M. 1997. Machine Learning. New York: McGraw-Hill. Witten, Ian H., and Eibe Frank. 2000. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. San Diego, CA: Morgan Kaufmann. ----- Original Message ----- From: "Henry Bulley" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Saturday, November 27, 2004 12:28 PM Subject: A Classification validation question > Hello, > > I recently read that: > you can't validate the "classification model with the data used to develop > the model. You must use completely independent data otherwise you bias the > results. > > Is there any resampling approach to address this issue? > I would be grateful if any of you can point me to some good references or > studies. > > Thanks for your help > > Henry >
