Hi all,

I have a pretty basic question about categorical variables but I can't seem to be able to find answer so I am hoping someone here can help. I found that if the factor names are all in numbers, fitting the model in lm would return labels that are not very recognizable.

# Example: let's just assume that we want to fit this model
fit <- lm(height ~ age + Seed, data=Loblolly)

# See the category names are all mangled up here
fit


Call:
lm(formula = height ~ age + Seed, data = Loblolly)

Coefficients:
(Intercept) age Seed.L Seed.Q Seed.C Seed^4 -1.31240 2.59052 4.86941 0.87307 0.37894 -0.46853 Seed^5 Seed^6 Seed^7 Seed^8 Seed^9 Seed^10 0.55237 0.39659 -0.06507 0.35074 -0.83442 0.42085
    Seed^11      Seed^12      Seed^13
    0.53906     -0.29803     -0.77254



One possible solution I found is to rename the categorical variables

seed.str <- paste("S", Loblolly$Seed, sep="")
seed.str <- factor(seed.str)
fit <- lm(height ~ age + seed.str, data=Loblolly)
fit



Call:
lm(formula = height ~ age + seed.str, data = Loblolly)

Coefficients:
 (Intercept)           age  seed.strS303  seed.strS305  seed.strS307
     -0.4301        2.5905        0.8600        1.8683       -1.9183
seed.strS309  seed.strS311  seed.strS315  seed.strS319  seed.strS321
      0.5350       -1.5933       -0.8867       -0.3650       -2.0350
seed.strS323  seed.strS325  seed.strS327  seed.strS329  seed.strS331
      0.3067       -1.3233       -2.6400       -2.9333       -2.2267


Now it is actually possible to see which one is which, but is kind of lame. Can someone point me to a more elegant solution? Thank you so much.

Saiwing Yeung

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to