Dear all,
please allow me to repost this message, as it has been left unanswered on
sci.stat.consult for a week. - Namely, to the best of my knowledge the
question posed is neither excessively stupid / banal for deserving any kind
of reply, nor too difficult / complex for the numerous experts of the the
sci.stat.* forums to be able to comment on it. - Please, provide any kind of
answer / guideline / opinion and the interested research group and myself
will be profoundly grateful!
--- Original message ---
This is about rewriting the basic formula for the hat elements ("leverages")
in
multiple linear regression (see note at the end of the message for context).
- I am familiar with the expression for the matrix in the diagonal of which
lie the hat elements (Stevens, 1996, p. 109):
H = X (X' X)^-1 X'
where X is the raw-scores matrix (with 1's in the 1st column, corresponding
to the constant term).
Knowing the simplification for the bivariate case (Myers & Well, 1991)
h_sub_ii = 1/N + (x_sub_i - x_bar)^2 / SS_sub_x ,
it is this kind of non-matrix expression that I am looking for for the
general
(p-variate) case. In other words, I don't want to "carry arround" the entire
data matrix which the model has been derived from, when predicting Y (with
CI) for a single <new> case. - I would like to be able to perform the
calculation only using a set of <constants>, e.g., sum of values for every
predictor and sums of squared values, or means plus variances, or
variance-covariance matrix plus some of the former (see below).
Now, for uncorrelated predictors (Darlington, 1990, p. 354)
h_sub_i = (MD_sub_i + 1) / N
where MD stands for Mahalanobis' distance (which, of course, in the
no-correlations case is simply standardised Euclidean distance from the
means-vector). - What I wonder is whether I can use this formula if the
predictors are correlated to a non-negligible extent, in which case, of
course, I would use (Stevens, 1996, p. 111)
MD_sub_i^2 = (x_sub_i - x_bar)' S^-1 (x_sub_i - x_bar)
whereby x_sub_i and x_bar are case-vector and centroid, respectively, and S
is the covariance matrix).
- Is this approach reasonable? If it is, I would also very much appreciate
opinions on whether a warning (about trustworthiness of the prediction)
should be echoed by the application if the leverage exceeds, say, 3p/N (a
sound limit according to Stevens, 1996, p. 108).
Thanking everyone in advance for replying and, admittedly, eagerly awaiting
hints on <the final piece of the puzzle>,
Gaj Vidmar
Univ. of Ljubljana, Dept. of Psychology
--- Note ---
All this is related to a previous question of mine about confidence
intervals for logistic regression, to which I have received a kind and most
helpful answer. To summarise, I should treat the logit-transformed P's as a
linear model as far as "inference apparatus" is concerned, i.e., obtain 95%
CI for logit of P via
E(Y) +- 1.96 * sqrt (MSE * [1 + h_sub_ii])
like in the linear model (Myers & Well, 1991) and than transform the
limits back to the P-scale. However, as the application of the logistic
model is bound to work in a small embedded device, the calculations should
be as economic as possible in every sense.
--- References ---
Darlington, R.B. (1990). Regression and Linear Models. New York:
McGraw-Hill.
Myers, J.L., Well, A.D. (1991). Research Design and Statistical Analysis.
New York: Harper Collins.
Stevens, J.C. (1996). Applied Multivariate Statistics for the Social
Sciences (3rd Ed.). Mahwah: Lawrence Erlbaum.
===========================================================================
This list is open to everyone. Occasionally, less thoughtful
people send inappropriate messages. Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.
For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================