Max R-sq for Binary Data

Milo Schield Sun, 6 Feb 2000 11:11:27 -0800
QUESTION:  What is the theoretical maximum value of R-sq ** when binary data
(Y) is obtained from a simple linear model?

The data is binary with Y values taken from a linear model going from 0 to 1
over the range of X.

The binary sequences of Y values are organized to minimize* the standard
deviation around the model.

TYPE REGRESSION            DISTRIBUTION OF X VALUES
a.     OLS                                     linear
b.     OLS                                     normal [width truncated at 6
sigma?]
c.     Logistic                                linear
d.     Logistic                                normal [width truncated at 6
sigma?]


Based on some discrete trials, I get the following estimates for R-sq:
a.  99%
b.  16%
c.  96%
d.  16%

* On the selection of binary Y values.  Suppose the X values are linearly
distributed from 0 to 1 and the Model is Y=X.  In the discrete case with 100
points, the first 5 would be all zeros and the last 5 would be all ones.  At
the center, half the points would be zeroes and the other half would be
ones.

** R^2 = (S^2 around mean   -   S^2 around model) / (S^2 around mean)






===========================================================================
  This list is open to everyone. Occasionally, people lacking respect
  for other members of the list send messages that are inappropriate
  or unrelated to the list's discussion topics. Please just delete the
  offensive email.

  For information concerning the list, please see the following web page:
  http://jse.stat.ncsu.edu/
===========================================================================
Max R-sq for Binary Data

Reply via email to