Hi Peter,

there is no problem if the missing cell is not in the first row or column: the corresponding interaction parameter is omitted. In my case the data in the (1,4) cell is missing. What results is clear to me now: the (3,4) interaction parameter is dropped so that "(Intercept) + Biv" now refers to the mean of the (3,4) cell rather than the that of the (1,4) cell making the (3,4) cell a sort of 'honorary' member of the first row. This could have been done to the (2,4) cell but I guess the rule is to drop the cell with the highest sum of row and column number.

Murray Jorgensen

Peter Dalgaard wrote:
Murray Jorgensen wrote:
I am wondering how to interpret the parameter estimates that lm()
reports in this sort of situation:

y = round(rnorm(n=24,mean=5,sd=2),2)
A = gl(3,2,24,labels=c("one","two","three"))
B = gl(4,6,24,labels=c("i","ii","iii","iv"))
# Make both observations for A=1, B=4 missing
y[19] = NA
y[20] = NA
data.frame(y,A,B)
nonadd = lm(y ~ A * B)


summary(nonadd)

Call:
lm(formula = y ~ A * B)

Residuals:
Min 1Q Median 3Q Max
-3.555e+00 -7.675e-01 -6.939e-17 7.675e-01 3.555e+00

Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.755 1.667 2.252 0.0457 *
Atwo 1.655 2.358 0.702 0.4974
Athree 3.330 2.358 1.412 0.1856
Bii 1.435 2.358 0.609 0.5552
Biii 2.055 2.358 0.871 0.4021
Biv -1.635 2.358 -0.693 0.5025
Atwo:Bii -1.145 3.335 -0.343 0.7378
Athree:Bii -4.535 3.335 -1.360 0.2011
Atwo:Biii -3.230 3.335 -0.969 0.3536
Athree:Biii -2.105 3.335 -0.631 0.5408
Atwo:Biv 1.655 3.335 0.496 0.6295
Athree:Biv NA NA NA NA
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Residual standard error: 2.358 on 11 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.2797, Adjusted R-squared: -0.3752
F-statistic: 0.4271 on 10 and 11 DF, p-value: 0.9044

fitted(nonadd)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21
3.755 3.755 5.410 5.410 7.085 7.085 5.190 5.190 5.700 5.700 3.985 3.985
5.810 5.810 4.235 4.235 7.035 7.035 5.430
22 23 24
5.430 5.450 5.450
t(model.matrix(nonadd)%*%coef(nonadd))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 22 23 24
[1,] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

I guess that the parameter estimates reported are linear combinations of
the cell means, but which linear combinations and how does lm() decide
what parameters to report?

Cheers, Murray


What's the problem? The parameters are defined as usual for the two-way layout:

The intercept is the fitted value in the top left corner
The A coefficients are the fitted values in the first column minus the intercept.
The B coefficients vice versa.
The interaction coefficients are the fitted values minus the sum of the the intercept and the corresponding A and B coefficients.

One interaction coefficient is set missing because you have no data, but except for that, the fitted values equal the cell means.


--
Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: m...@waikato.ac.nz                                Fax 7 838 4155
Phone  +64 7 838 4773 wk    Home +64 7 825 0441   Mobile 021 0200 8350

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to