Re: [R] Dealing with data

2010-08-14 Thread Jonathan Christensen
Your second fit makes no sense, as you can easily tell if you look at the
regression summaries. Fitting with spray as a categorical variable gives you
an overall p-value of less than 2.2e-16, while fitting with
as.numeric(spray) gives an overall p-value of .2118. The fit you've done
with as.numeric induces a completely invalid model, as others have tried to
point out.

Jonathan


On Fri, Aug 13, 2010 at 1:55 PM, TGS cran.questi...@gmail.com wrote:

 # I wasn't trying to do ANOVA. I was simply trying to figure out how
 regress count on sprays (this is after I saw another poster asking an
 unrelated question with the InsectSprays dataset).
 #
 # Anyhow, David clarified this but also, thanks for your explanation as
 well.

 rm(list = ls()); sprays - as.numeric(InsectSprays$spray)

 lm(formula = count ~ 0 + spray, data = InsectSprays)
 lm(formula = count ~ 0 + sprays, data = InsectSprays)

 # besides the point, in the ANOVA problem the degrees of freedom would be
 5, not 1.

 On Aug 13, 2010, at 12:27 PM, Greg Snow wrote:

 So you want 1 degree of freedom for InsectSprays?  You believe that the
 difference between A and B is exactly the same as between B and C which is
 exactly the same as between D and E (etc.)?  that seems an odd assumption,
 but you can get that by using as.numeric (as I and others have already
 stated).

 If on the other hand you want InsectSprays to be treated correctly with the
 correct number of degrees of freedom, but have the output on a single line
 testing the overall effect, then you want to use the aov function rather
 than lm (internally they do the same thing, but the default summary output
 for aov is 1 line per term).

 Hope this helps,

 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111


  -Original Message-
  From: TGS [mailto:cran.questi...@gmail.com]
  Sent: Friday, August 13, 2010 11:51 AM
  To: Greg Snow
  Cc: r-help@r-project.org
  Subject: Re: [R] Dealing with data
 
  # Greg, if R automatically does that then I don't know why it's
  treating each indicator
  # as a different regressor. In other words, I am interested in treating
  'spray' as one
  # independent variable.
  #
  # Erik, which book do you suggest I read? Thanks.
 
  data(InsectSprays)
  lm(InsectSprays$count ~ 0 + InsectSprays$spray)
 
  On Aug 13, 2010, at 10:34 AM, Greg Snow wrote:
 
  R/S does all of that automatically for you, you do not need to manually
  create the indicator variables.
 
  If you do something like:
 
  fit - lm( Sepal.Width ~ Species, data=iris, x=TRUE)
 
  Then look at the matrix actually used:
 
  fit$x
 
  Or the output:
 
  summary(fit)
 
  You will see that Species was automatically converted into indicator
  variables and those were used in the regression.
 
  If you really need the indicator variables yourself, look at the
  model.matrix function, e.g.:
 
  model.matrix( ~Species, data=iris )
 
  Or
 
  model.matrix( ~Species - 1, data=iris )
 
  If you really want 1 for A, 2 for B, etc. then look at as.numeric on a
  factor variable (e.g. as.numeric(iris$Species) ).
 
  Hope this helps,
 
  --
  Gregory (Greg) L. Snow Ph.D.
  Statistical Data Center
  Intermountain Healthcare
  greg.s...@imail.org
  801.408.8111
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
  project.org] On Behalf Of TGS
  Sent: Friday, August 13, 2010 11:22 AM
  To: David Winsemius
  Cc: r-help@r-project.org
  Subject: Re: [R] Dealing with data
 
  To clarify, I'd like to create a column of indicators for the
  respective letters so that I could maybe do regression on indicators,
  etc.
 
  For instance, A gets 1, B gets 2, and so on.
 
  On Aug 13, 2010, at 10:19 AM, David Winsemius wrote:
 
 
  On Aug 13, 2010, at 1:03 PM, TGS wrote:
 
  # how would I code in R to look at the letter of the alphabet
  # in the second column and create a indicator column for the
  # corresponding letter?
 
  data(InsectSprays)
  InsectSprays$spray
 
  It's already what most people mean when they say indicator column,
  i.e., a factor variable (and not a character vector)  so,  what
  do
  _you_ mean?
 
 
 
  --
 
  David Winsemius, MD
  West Hartford, CT
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html
  and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read

Re: [R] Dealing with data

2010-08-13 Thread David Winsemius


On Aug 13, 2010, at 1:03 PM, TGS wrote:


# how would I code in R to look at the letter of the alphabet
# in the second column and create a indicator column for the
# corresponding letter?

data(InsectSprays)
InsectSprays$spray


It's already what most people mean when they say indicator column,  
i.e., a factor variable (and not a character vector)  so,  what do  
_you_ mean?





--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dealing with data

2010-08-13 Thread TGS
To clarify, I'd like to create a column of indicators for the respective 
letters so that I could maybe do regression on indicators, etc.

For instance, A gets 1, B gets 2, and so on.

On Aug 13, 2010, at 10:19 AM, David Winsemius wrote:


On Aug 13, 2010, at 1:03 PM, TGS wrote:

 # how would I code in R to look at the letter of the alphabet
 # in the second column and create a indicator column for the
 # corresponding letter?
 
 data(InsectSprays)
 InsectSprays$spray

It's already what most people mean when they say indicator column, i.e., a 
factor variable (and not a character vector)  so,  what do _you_ mean?
 


-- 

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dealing with data

2010-08-13 Thread Greg Snow
R/S does all of that automatically for you, you do not need to manually create 
the indicator variables.

If you do something like:

 fit - lm( Sepal.Width ~ Species, data=iris, x=TRUE)

Then look at the matrix actually used:

 fit$x

Or the output:

 summary(fit)

You will see that Species was automatically converted into indicator variables 
and those were used in the regression.

If you really need the indicator variables yourself, look at the model.matrix 
function, e.g.:

 model.matrix( ~Species, data=iris )

Or

 model.matrix( ~Species - 1, data=iris )

If you really want 1 for A, 2 for B, etc. then look at as.numeric on a factor 
variable (e.g. as.numeric(iris$Species) ).

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of TGS
 Sent: Friday, August 13, 2010 11:22 AM
 To: David Winsemius
 Cc: r-help@r-project.org
 Subject: Re: [R] Dealing with data
 
 To clarify, I'd like to create a column of indicators for the
 respective letters so that I could maybe do regression on indicators,
 etc.
 
 For instance, A gets 1, B gets 2, and so on.
 
 On Aug 13, 2010, at 10:19 AM, David Winsemius wrote:
 
 
 On Aug 13, 2010, at 1:03 PM, TGS wrote:
 
  # how would I code in R to look at the letter of the alphabet
  # in the second column and create a indicator column for the
  # corresponding letter?
 
  data(InsectSprays)
  InsectSprays$spray
 
 It's already what most people mean when they say indicator column,
 i.e., a factor variable (and not a character vector)  so,  what do
 _you_ mean?
 
 
 
 --
 
 David Winsemius, MD
 West Hartford, CT
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dealing with data

2010-08-13 Thread Erik Iverson



TGS wrote:

To clarify, I'd like to create a column of indicators for the
respective letters so that I could maybe do regression on indicators,
etc.

For instance, A gets 1, B gets 2, and so on.


That's precisely how factors are handled by modeling functions in R!
No need to reinvent the wheel.  You should probably read a document
or book describing introductory regression methods in R.



On Aug 13, 2010, at 10:19 AM, David Winsemius wrote:


On Aug 13, 2010, at 1:03 PM, TGS wrote:


# how would I code in R to look at the letter of the alphabet # in
the second column and create a indicator column for the #
corresponding letter?

data(InsectSprays) InsectSprays$spray


It's already what most people mean when they say indicator column,
i.e., a factor variable (and not a character vector)  so,  what
do _you_ mean?




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dealing with data

2010-08-13 Thread David Winsemius


On Aug 13, 2010, at 1:22 PM, TGS wrote:

To clarify, I'd like to create a column of indicators for the  
respective letters so that I could maybe do regression on  
indicators, etc.


You can just enter that column name in a regression formula. No need  
to create a separate variable. Try:


lm(count ~ spray, data=InsectSprays)



For instance, A gets 1, B gets 2, and so on.


That happens to be exactly the manner in which factor variables are  
stored internally. Try this:


str(InsectSprays)

If for some better reason, other than what you have so far stated, you  
still needed to get the at the internal values of the factor  
variables, you can just use:


as.numeric(InsectSprays$spray)


This question is making me think you have not yet worked through much  
of Introduction to R.


http://cran.r-project.org/doc/manuals/R-intro.pdf

Admittedly it is long but I think you said you were strong on CS and  
weaker in statistics? If you are in a real hurry and had a solid stats  
background,  you could look at other contributed introductions. One  
that kept me up at night when I was starting R (about 5 years ago) was  
Faraway's Practical Regression and ANOVA Using R:


http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf

I also though that Kuhnert and Venables' offering was scintillating:
http://cran.r-project.org/doc/contrib/Kuhnert+Venables-R_Course_Notes.zip

Others:
http://cran.r-project.org/other-docs.html

Faraway gets to factor object types by page 11, whereas you would need  
to be several chapters into the Introduction to R to get that  
information.


--
David.



On Aug 13, 2010, at 10:19 AM, David Winsemius wrote:


On Aug 13, 2010, at 1:03 PM, TGS wrote:


# how would I code in R to look at the letter of the alphabet
# in the second column and create a indicator column for the
# corresponding letter?

data(InsectSprays)
InsectSprays$spray


It's already what most people mean when they say indicator column,  
i.e., a factor variable (and not a character vector)  so,  what  
do _you_ mean?

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dealing with data

2010-08-13 Thread TGS
# Greg, if R automatically does that then I don't know why it's treating each 
indicator
# as a different regressor. In other words, I am interested in treating 'spray' 
as one
# independent variable.
# 
# Erik, which book do you suggest I read? Thanks.

data(InsectSprays)
lm(InsectSprays$count ~ 0 + InsectSprays$spray)

On Aug 13, 2010, at 10:34 AM, Greg Snow wrote:

R/S does all of that automatically for you, you do not need to manually create 
the indicator variables.

If you do something like:

 fit - lm( Sepal.Width ~ Species, data=iris, x=TRUE)

Then look at the matrix actually used:

 fit$x

Or the output:

 summary(fit)

You will see that Species was automatically converted into indicator variables 
and those were used in the regression.

If you really need the indicator variables yourself, look at the model.matrix 
function, e.g.:

 model.matrix( ~Species, data=iris )

Or

 model.matrix( ~Species - 1, data=iris )

If you really want 1 for A, 2 for B, etc. then look at as.numeric on a factor 
variable (e.g. as.numeric(iris$Species) ).

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of TGS
 Sent: Friday, August 13, 2010 11:22 AM
 To: David Winsemius
 Cc: r-help@r-project.org
 Subject: Re: [R] Dealing with data
 
 To clarify, I'd like to create a column of indicators for the
 respective letters so that I could maybe do regression on indicators,
 etc.
 
 For instance, A gets 1, B gets 2, and so on.
 
 On Aug 13, 2010, at 10:19 AM, David Winsemius wrote:
 
 
 On Aug 13, 2010, at 1:03 PM, TGS wrote:
 
 # how would I code in R to look at the letter of the alphabet
 # in the second column and create a indicator column for the
 # corresponding letter?
 
 data(InsectSprays)
 InsectSprays$spray
 
 It's already what most people mean when they say indicator column,
 i.e., a factor variable (and not a character vector)  so,  what do
 _you_ mean?
 
 
 
 --
 
 David Winsemius, MD
 West Hartford, CT
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dealing with data

2010-08-13 Thread Greg Snow
So you want 1 degree of freedom for InsectSprays?  You believe that the 
difference between A and B is exactly the same as between B and C which is 
exactly the same as between D and E (etc.)?  that seems an odd assumption, but 
you can get that by using as.numeric (as I and others have already stated).

If on the other hand you want InsectSprays to be treated correctly with the 
correct number of degrees of freedom, but have the output on a single line 
testing the overall effect, then you want to use the aov function rather than 
lm (internally they do the same thing, but the default summary output for aov 
is 1 line per term).

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: TGS [mailto:cran.questi...@gmail.com]
 Sent: Friday, August 13, 2010 11:51 AM
 To: Greg Snow
 Cc: r-help@r-project.org
 Subject: Re: [R] Dealing with data
 
 # Greg, if R automatically does that then I don't know why it's
 treating each indicator
 # as a different regressor. In other words, I am interested in treating
 'spray' as one
 # independent variable.
 #
 # Erik, which book do you suggest I read? Thanks.
 
 data(InsectSprays)
 lm(InsectSprays$count ~ 0 + InsectSprays$spray)
 
 On Aug 13, 2010, at 10:34 AM, Greg Snow wrote:
 
 R/S does all of that automatically for you, you do not need to manually
 create the indicator variables.
 
 If you do something like:
 
  fit - lm( Sepal.Width ~ Species, data=iris, x=TRUE)
 
 Then look at the matrix actually used:
 
  fit$x
 
 Or the output:
 
  summary(fit)
 
 You will see that Species was automatically converted into indicator
 variables and those were used in the regression.
 
 If you really need the indicator variables yourself, look at the
 model.matrix function, e.g.:
 
  model.matrix( ~Species, data=iris )
 
 Or
 
  model.matrix( ~Species - 1, data=iris )
 
 If you really want 1 for A, 2 for B, etc. then look at as.numeric on a
 factor variable (e.g. as.numeric(iris$Species) ).
 
 Hope this helps,
 
 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
  project.org] On Behalf Of TGS
  Sent: Friday, August 13, 2010 11:22 AM
  To: David Winsemius
  Cc: r-help@r-project.org
  Subject: Re: [R] Dealing with data
 
  To clarify, I'd like to create a column of indicators for the
  respective letters so that I could maybe do regression on indicators,
  etc.
 
  For instance, A gets 1, B gets 2, and so on.
 
  On Aug 13, 2010, at 10:19 AM, David Winsemius wrote:
 
 
  On Aug 13, 2010, at 1:03 PM, TGS wrote:
 
  # how would I code in R to look at the letter of the alphabet
  # in the second column and create a indicator column for the
  # corresponding letter?
 
  data(InsectSprays)
  InsectSprays$spray
 
  It's already what most people mean when they say indicator column,
  i.e., a factor variable (and not a character vector)  so,  what
 do
  _you_ mean?
 
 
 
  --
 
  David Winsemius, MD
  West Hartford, CT
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
  guide.html
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dealing with data

2010-08-13 Thread TGS
# I wasn't trying to do ANOVA. I was simply trying to figure out how regress 
count on sprays (this is after I saw another poster asking an unrelated 
question with the InsectSprays dataset).
# 
# Anyhow, David clarified this but also, thanks for your explanation as well.

rm(list = ls()); sprays - as.numeric(InsectSprays$spray)

lm(formula = count ~ 0 + spray, data = InsectSprays)
lm(formula = count ~ 0 + sprays, data = InsectSprays)

# besides the point, in the ANOVA problem the degrees of freedom would be 5, 
not 1.

On Aug 13, 2010, at 12:27 PM, Greg Snow wrote:

So you want 1 degree of freedom for InsectSprays?  You believe that the 
difference between A and B is exactly the same as between B and C which is 
exactly the same as between D and E (etc.)?  that seems an odd assumption, but 
you can get that by using as.numeric (as I and others have already stated).

If on the other hand you want InsectSprays to be treated correctly with the 
correct number of degrees of freedom, but have the output on a single line 
testing the overall effect, then you want to use the aov function rather than 
lm (internally they do the same thing, but the default summary output for aov 
is 1 line per term).

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: TGS [mailto:cran.questi...@gmail.com]
 Sent: Friday, August 13, 2010 11:51 AM
 To: Greg Snow
 Cc: r-help@r-project.org
 Subject: Re: [R] Dealing with data
 
 # Greg, if R automatically does that then I don't know why it's
 treating each indicator
 # as a different regressor. In other words, I am interested in treating
 'spray' as one
 # independent variable.
 #
 # Erik, which book do you suggest I read? Thanks.
 
 data(InsectSprays)
 lm(InsectSprays$count ~ 0 + InsectSprays$spray)
 
 On Aug 13, 2010, at 10:34 AM, Greg Snow wrote:
 
 R/S does all of that automatically for you, you do not need to manually
 create the indicator variables.
 
 If you do something like:
 
 fit - lm( Sepal.Width ~ Species, data=iris, x=TRUE)
 
 Then look at the matrix actually used:
 
 fit$x
 
 Or the output:
 
 summary(fit)
 
 You will see that Species was automatically converted into indicator
 variables and those were used in the regression.
 
 If you really need the indicator variables yourself, look at the
 model.matrix function, e.g.:
 
 model.matrix( ~Species, data=iris )
 
 Or
 
 model.matrix( ~Species - 1, data=iris )
 
 If you really want 1 for A, 2 for B, etc. then look at as.numeric on a
 factor variable (e.g. as.numeric(iris$Species) ).
 
 Hope this helps,
 
 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111
 
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of TGS
 Sent: Friday, August 13, 2010 11:22 AM
 To: David Winsemius
 Cc: r-help@r-project.org
 Subject: Re: [R] Dealing with data
 
 To clarify, I'd like to create a column of indicators for the
 respective letters so that I could maybe do regression on indicators,
 etc.
 
 For instance, A gets 1, B gets 2, and so on.
 
 On Aug 13, 2010, at 10:19 AM, David Winsemius wrote:
 
 
 On Aug 13, 2010, at 1:03 PM, TGS wrote:
 
 # how would I code in R to look at the letter of the alphabet
 # in the second column and create a indicator column for the
 # corresponding letter?
 
 data(InsectSprays)
 InsectSprays$spray
 
 It's already what most people mean when they say indicator column,
 i.e., a factor variable (and not a character vector)  so,  what
 do
 _you_ mean?
 
 
 
 --
 
 David Winsemius, MD
 West Hartford, CT
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Dealing with data

2010-08-13 Thread TGS
But in your comment, it sounded like you were in the realm of ANOVA when you 
made the degrees of freedom comment. I'm not going to get into the theory of 
statistics with you :) I'm just trying to learn R, take it easy. Yes, I 
understand that in the regression problem, the degrees of freedom for 
regression is 1, and in ANOVA, the degrees of freedom for sprays are 5. Thanks.

On Aug 13, 2010, at 12:54 PM, Greg Snow wrote:

If you do as.numeric on InsectSprays and use the result as a predictor in lm, 
then it will only fit 1 degree of freedom, not 5, try it and see.  That is why 
I was asking and giving an alternative that would still use 5 degrees of 
freedom.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: TGS [mailto:cran.questi...@gmail.com]
 Sent: Friday, August 13, 2010 1:52 PM
 To: Greg Snow
 Subject: Re: [R] Dealing with data
 
 P.S. The degrees of freedom for sprays would be 5 and not 1.
 
 On Aug 13, 2010, at 12:27 PM, Greg Snow wrote:
 
 So you want 1 degree of freedom for InsectSprays?  You believe that the
 difference between A and B is exactly the same as between B and C which
 is exactly the same as between D and E (etc.)?  that seems an odd
 assumption, but you can get that by using as.numeric (as I and others
 have already stated).
 
 If on the other hand you want InsectSprays to be treated correctly with
 the correct number of degrees of freedom, but have the output on a
 single line testing the overall effect, then you want to use the aov
 function rather than lm (internally they do the same thing, but the
 default summary output for aov is 1 line per term).
 
 Hope this helps,
 
 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111
 
 
 -Original Message-
 From: TGS [mailto:cran.questi...@gmail.com]
 Sent: Friday, August 13, 2010 11:51 AM
 To: Greg Snow
 Cc: r-help@r-project.org
 Subject: Re: [R] Dealing with data
 
 # Greg, if R automatically does that then I don't know why it's
 treating each indicator
 # as a different regressor. In other words, I am interested in
 treating
 'spray' as one
 # independent variable.
 #
 # Erik, which book do you suggest I read? Thanks.
 
 data(InsectSprays)
 lm(InsectSprays$count ~ 0 + InsectSprays$spray)
 
 On Aug 13, 2010, at 10:34 AM, Greg Snow wrote:
 
 R/S does all of that automatically for you, you do not need to
 manually
 create the indicator variables.
 
 If you do something like:
 
 fit - lm( Sepal.Width ~ Species, data=iris, x=TRUE)
 
 Then look at the matrix actually used:
 
 fit$x
 
 Or the output:
 
 summary(fit)
 
 You will see that Species was automatically converted into indicator
 variables and those were used in the regression.
 
 If you really need the indicator variables yourself, look at the
 model.matrix function, e.g.:
 
 model.matrix( ~Species, data=iris )
 
 Or
 
 model.matrix( ~Species - 1, data=iris )
 
 If you really want 1 for A, 2 for B, etc. then look at as.numeric on
 a
 factor variable (e.g. as.numeric(iris$Species) ).
 
 Hope this helps,
 
 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.org
 801.408.8111
 
 
 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
 project.org] On Behalf Of TGS
 Sent: Friday, August 13, 2010 11:22 AM
 To: David Winsemius
 Cc: r-help@r-project.org
 Subject: Re: [R] Dealing with data
 
 To clarify, I'd like to create a column of indicators for the
 respective letters so that I could maybe do regression on
 indicators,
 etc.
 
 For instance, A gets 1, B gets 2, and so on.
 
 On Aug 13, 2010, at 10:19 AM, David Winsemius wrote:
 
 
 On Aug 13, 2010, at 1:03 PM, TGS wrote:
 
 # how would I code in R to look at the letter of the alphabet
 # in the second column and create a indicator column for the
 # corresponding letter?
 
 data(InsectSprays)
 InsectSprays$spray
 
 It's already what most people mean when they say indicator column,
 i.e., a factor variable (and not a character vector)  so,  what
 do
 _you_ mean?
 
 
 
 --
 
 David Winsemius, MD
 West Hartford, CT
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.