On Wed, 26 Nov 2003 13:24:39 -0500, Rajarshi Guha <[EMAIL PROTECTED]> wrote:
> Hello, > I was wondering whether anybody whould be able to help with this query. > > > I have some neural network models which makes predictions for a dataset. When > comparing various models we evalute the effectiveness by looking the RMS > error and the value of R^2 between the predicted and actual values. > > However, I seem to have read somewhere that R^2 is not always a 'good > indicator' - in that a data set can be randomly generated yet show a good > R^2. Is this true? And if so, does anybody know how I can reference this > (paper/book)? In a simple OLS regression, where you have not done any preselection of variables, the expected value of R^2 is equal to the number of (random) variables, divided by the (N-1). This is what the correction is for, when you read about the "adjusted R-squared" that a regression program gives you. So, a random set of 10 variables predicting 21 cases gives you an R-squared (by chance alone) of 0.50. Now, if you screened out some variables before hand, then there are papers saying that you should consider the *starting* number of variables as the source of bias. See any book on regression; this is one of those facts that should not need much specific mention or defense in your own use of it. Does this deal with what you had in mind? -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html "Taxes are the price we pay for civilization." . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
