Re: [R] regression with categorial variables

Peter Ehlers Sat, 30 Jan 2010 08:44:33 -0800

kayj wrote:

Hi All,


I am working on an example where the electric utility is investigating the
effect of size of household and the type of air conditioning on electricity

consumption. I fit a multiple linear regression

Electricity consumption=size of the house hold + air conditioning type

There are 3 air conditioning types so I modeled them as a dummy variable
Type A
Type B
Type C

Where type A is the reference

Below are the results

Electricity consumption= 0.4 size of the household+ 0.95 type B -0.95 type C

But when I look at the mean of the predicted values of electricity
consumption by air conditioning type, this is what I get

Type A  29.86
Type B  25.94
Type C  30.1

I calculated the above means by fitting a linear model as Electricity
consumption=  size of the household,  without including the air conditioning
type. Looked at the predicted valued of the response variable and calculated
the mean of the predicted valued for each category. But you can see that the
mean response for type B is lower than Type A(25.94 for type B and 29.86 for
Type A)


My question is the sign of the Beta’ in the regression model are not
consistent with the means, for type B the beta is positive 0.95.

Is this possible? In what circumstances this can happen?


Certainly, this is possible. Your simpler model is a
'coincident straight lines' model, which may not be
at all reasonable. The model including type of a.c. is
a 'parallel straight lines' model. What if type B is
used primarily in small households? Have you plotted
the data?

Here's an example:

x1 <- rep(1:8, c(10,30,70,60,12,12,3,3))
x2 <- factor(rep(LETTERS[c(2,1,3)], c(40,130,30)))
set.seed(1234)
y <- .4*x1 + 1*(x2=='B') - 1*(x2=='C') + .2*rnorm(200)
model1 <- lm(y ~ x1)
model2 <- lm(y ~ x1 + x2)
round(coef(model2), 4)
#(Intercept)          x1         x2B         x2C
#     0.0005      0.4013      0.9143     -0.9977

yp <- predict(model1)
tapply(yp, x2, mean)
#        A        B        C
# 1.431830 1.386754 1.496052

# Have a look at the data with:

plot(y ~ x1, type='n')
points(y ~ x1, subset={x2=='A'})
points(y ~ x1, subset={x2=='B'}, col=4)
points(y ~ x1, subset={x2=='C'}, col=2)
abline(lm(y ~ x1, subset={x2=='A'}))
abline(lm(y ~ x1, subset={x2=='B'}), col=4)
abline(lm(y ~ x1, subset={x2=='C'}), col=2)
abline(model1, lwd=2)

 -Peter Ehlers


I appreciate your input.


--
Peter Ehlers
University of Calgary

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regression with categorial variables

Reply via email to