0 1 72.5
1 0 1 0 74.5
0 1 1 0 65.2
0 1 1 0 70.7
1 0 1 0 77.5
which would fit the equation
Weight = b.S.F*Sex.F + b.D.V*Diet.V + error
with the same absorption of a base-level of each factor into the
Intercept (since now we have 2 redundancies: for each factor,
the two dummy variables add up to 1). The coefficient of Sex.F
will represent a difference between Males and Females, the
coefficient of Diet.V will represent a difference between
meat-eaters and vegetarians. Because of the redundacies, an
equivalent representation of the data used in the calculations is
Sex.F Diet.V Weight
0 1 69.5
1 1 60.2
1 1 65.7
0 1 72.5
0 0 74.5
1 0 65.2
1 0 70.7
0 0 77.5
But now we have the opportunity to ask: Is the difference
between meat-eater and vegetarian Males the same as the
difference between meat-eater and vegetarian Females? Now we
need the Interaction -- the difference, between Males and
Females, of the two differences between the two diets: one
difference evaluated for Males, the other for Females. This
leads to the regression model
Weight ~ Sex * Diet, equivalent to Weight ~ Sex + Diet + Sex:Diet
and we now need a further dummy variable for the different
combinations of levels of the two factors:
Sex.F Diet.V Sex.F:Diet.V Weight
0 1 0 69.5
1 1 1 60.2
1 1 1 65.7
0 1 0 72.5
0 0 0 74.5
1 0 0 65.2
1 0 0 70.7
0 0 0 77.5
where the variable Sex.F:Diet.V has the value 1 when Sex.F=1
and Diet.V=1, and the value 0 otherwise.
This is all very basic and straightforward (though can appear
more complicated in richer problems). But the point about using
a variable of factor type in R is beginning to emerge. When
there is a factor with k levels, you need (k-1) dummy variables
as quantitative variables for the regression. Interactions
introduce further dummy variables. For all this to happen, a
variable which is going to be used as a factor needs a special
representation inside R, so that R knows how to set about
constructing all that stuff. So, in R, a factor is not a simple
list of levels (like c(M,F,F,M,M,F,F,M)), but
a more elaborate encoding, and a more complex structure.
Once past this stage, there is then the question of what
system of *contrasts* is going to be used. For 2-level factors
(as above) there are not many issues which arise -- the effect
of a factor corresponds to a simple difference between the
results corresponding to its two levels. But, say, for the
Terrain factor (G,F,S) there are several ways in which differences
can be formulated. For example:
G, F-G, S-G (treatment contrasts)
Or, for Social Class (ordered, ABCDE)
D-E, C-D, B-C, A-B (successive difference contrasts)
E, D-E, C-(mean of DE), B-(mean of CDE), A-(mean of BCDE)
(Helmert contrasts)
and so on. What system of contrasts you use will depend on what
aspects of the differences between categories you are interested in.
And then the contrast specification also has to be part of the
specification of a factor (since it determines how to compute
the dummy variables which will represent it in the regression).
See John Maindonald's on-line book.
Hoping this helps!
Ted.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On
Behalf Of [EMAIL PROTECTED]
Sent: Tuesday, October 07, 2008 2:29 PM
To: r-help@r-project.org
Subject: [R] Factor tutorial?
This is probably a very basic question. I want to understand factors
but I
am not sure where to turn. Looking up factor in the Chambers book
doesn't
even show up in the index. Maybe I am just slow but ?factor doesn't
help
either. Would someone please point me to a very basic tutorial where I
can
see what the usefullness of factors is (so far they have just gotten in
the
way).
Thank you.
Kevin
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 08