Re: [R] Factor variables with GAM models
On Fri, 2010-03-19 at 20:37 -0700, Steven McKinney wrote: Hi Noah GAM models were developed to assess the functional form of the relationship of continuous predictor variables to the response, so weren't really meant to handle factor variables as predictor variables. GAMs are of the form E(Y | X1, X2, ...) = So + S(X1) + S(X2) + ... where S(X) is a smooth function of X. But there is absolutely nothing wrong with including factors in mgcv::gam - they get expanded into the usually dummy variables depending on the current contrasts as part of the model set-up routines just like they do in lm(). Perhaps semiparametric might be a better description of such a model but at least one implementation of GAMs in R can certainly handle factors. I haven't used gam::gam so can't comment on that and the OP doesn't say which gam he is using. HTH G Hence you might want to rethink why you'd want a factor variable as a predictor variable in a GAM. This is why the gam machinery doesn't just do the factor conversion to indicator variables as is done in lm. HTH Steven McKinney From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Noah Silverman [n...@smartmediacorp.com] Sent: March 19, 2010 12:54 PM To: r-help@r-project.org Subject: [R] Factor variables with GAM models I'm just starting to learn about GAM models. When using the lm function in R, any factors I have in my data set are automatically converted into a series of binomial variables. For example, if I have a data.frame with a column named color and values red, green, blue. The lm function automatically replaces it with 3 variables colorred, colorgreen, colorblue which are binomial {0,1} When I use the gam function, R doesn't do this so I get an error. 1) Is there a way to ask the gam function to do this conversion for me? 2) If not, is there some other tool or utility to make this data transformation easy? 3) Last option - can I use lm to transform the data and then extract it into a new data.frame to then pass to gam? Thanks!!! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Factor variables with GAM models
It doesn't usually make much sense to *smooth* over a factor variable (in the cases where it does you should treat the factor as a random effect), but there is no problem in including factor variables in a GAM. `gam' lets you mix factor and continuous variables in a bunch of ways. Suppose that `a' is a factor, `x' is a continuous (or just metric) variable and `y' is a response y ~ a + s(x) will fit a model where `a' is treated exactly as a factor variable is treated by `lm', while `x' is smoothed over. In mgcv:gam then y ~ s(x,by=a) would create a `smooth-factor interaction' --- a separate smooth of `x' for each level of `a'. y ~ s(x,by=a,id=1) would do the same, but would insist on each of the smooths of `x' having the same smoothng parameter. ?gam.models gives some more detail. best, Simon On Friday 19 March 2010 19:54, Noah Silverman wrote: I'm just starting to learn about GAM models. When using the lm function in R, any factors I have in my data set are automatically converted into a series of binomial variables. For example, if I have a data.frame with a column named color and values red, green, blue. The lm function automatically replaces it with 3 variables colorred, colorgreen, colorblue which are binomial {0,1} When I use the gam function, R doesn't do this so I get an error. 1) Is there a way to ask the gam function to do this conversion for me? 2) If not, is there some other tool or utility to make this data transformation easy? 3) Last option - can I use lm to transform the data and then extract it into a new data.frame to then pass to gam? Thanks!!! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK +44 1225 386603 www.maths.bath.ac.uk/~sw283 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Factor variables with GAM models
You can some time manually substitute a categorical variable with a set of continuous variables. For example, you have the variables like landcover.class with 3 values class A, class B, class C. You cna transform it into 3 continuous variables landcover.class.A, landcover.class.B, landcover.class.C and assign a value of 1 or 100% for elements belonging to that class or of 0 for elements not belonging. That help some time. Regards Noah Silverman wrote: Steve, I get that. What you wrote make sense. My challenge is the data I'm attempting to model. Some of the variables are continuous, some are factors. both linear and poisson models work. (Poisson doing a much more accurate job.) However, some of the numerical variables are clearly non-linear. Hence my interest in GAM. I suppose one alternative would be to try some polynomial transformation on the variable as part of a Poisson model. Any other suggestions would be welcome. Thanks! -N On 3/19/10 8:37 PM, Steven McKinney wrote: Hi Noah GAM models were developed to assess the functional form of the relationship of continuous predictor variables to the response, so weren't really meant to handle factor variables as predictor variables. GAMs are of the form E(Y | X1, X2, ...) = So + S(X1) + S(X2) + ... where S(X) is a smooth function of X. Hence you might want to rethink why you'd want a factor variable as a predictor variable in a GAM. This is why the gam machinery doesn't just do the factor conversion to indicator variables as is done in lm. HTH Steven McKinney From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Noah Silverman [n...@smartmediacorp.com] Sent: March 19, 2010 12:54 PM To: r-help@r-project.org Subject: [R] Factor variables with GAM models I'm just starting to learn about GAM models. When using the lm function in R, any factors I have in my data set are automatically converted into a series of binomial variables. For example, if I have a data.frame with a column named color and values red, green, blue. The lm function automatically replaces it with 3 variables colorred, colorgreen, colorblue which are binomial {0,1} When I use the gam function, R doesn't do this so I get an error. 1) Is there a way to ask the gam function to do this conversion for me? 2) If not, is there some other tool or utility to make this data transformation easy? 3) Last option - can I use lm to transform the data and then extract it into a new data.frame to then pass to gam? Thanks!!! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Factor variables with GAM models
I'm just starting to learn about GAM models. When using the lm function in R, any factors I have in my data set are automatically converted into a series of binomial variables. For example, if I have a data.frame with a column named color and values red, green, blue. The lm function automatically replaces it with 3 variables colorred, colorgreen, colorblue which are binomial {0,1} When I use the gam function, R doesn't do this so I get an error. 1) Is there a way to ask the gam function to do this conversion for me? 2) If not, is there some other tool or utility to make this data transformation easy? 3) Last option - can I use lm to transform the data and then extract it into a new data.frame to then pass to gam? Thanks!!! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Factor variables with GAM models
Hi Noah GAM models were developed to assess the functional form of the relationship of continuous predictor variables to the response, so weren't really meant to handle factor variables as predictor variables. GAMs are of the form E(Y | X1, X2, ...) = So + S(X1) + S(X2) + ... where S(X) is a smooth function of X. Hence you might want to rethink why you'd want a factor variable as a predictor variable in a GAM. This is why the gam machinery doesn't just do the factor conversion to indicator variables as is done in lm. HTH Steven McKinney From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Noah Silverman [n...@smartmediacorp.com] Sent: March 19, 2010 12:54 PM To: r-help@r-project.org Subject: [R] Factor variables with GAM models I'm just starting to learn about GAM models. When using the lm function in R, any factors I have in my data set are automatically converted into a series of binomial variables. For example, if I have a data.frame with a column named color and values red, green, blue. The lm function automatically replaces it with 3 variables colorred, colorgreen, colorblue which are binomial {0,1} When I use the gam function, R doesn't do this so I get an error. 1) Is there a way to ask the gam function to do this conversion for me? 2) If not, is there some other tool or utility to make this data transformation easy? 3) Last option - can I use lm to transform the data and then extract it into a new data.frame to then pass to gam? Thanks!!! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Factor variables with GAM models
Steve, I get that. What you wrote make sense. My challenge is the data I'm attempting to model. Some of the variables are continuous, some are factors. both linear and poisson models work. (Poisson doing a much more accurate job.) However, some of the numerical variables are clearly non-linear. Hence my interest in GAM. I suppose one alternative would be to try some polynomial transformation on the variable as part of a Poisson model. Any other suggestions would be welcome. Thanks! -N On 3/19/10 8:37 PM, Steven McKinney wrote: Hi Noah GAM models were developed to assess the functional form of the relationship of continuous predictor variables to the response, so weren't really meant to handle factor variables as predictor variables. GAMs are of the form E(Y | X1, X2, ...) = So + S(X1) + S(X2) + ... where S(X) is a smooth function of X. Hence you might want to rethink why you'd want a factor variable as a predictor variable in a GAM. This is why the gam machinery doesn't just do the factor conversion to indicator variables as is done in lm. HTH Steven McKinney From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Noah Silverman [n...@smartmediacorp.com] Sent: March 19, 2010 12:54 PM To: r-help@r-project.org Subject: [R] Factor variables with GAM models I'm just starting to learn about GAM models. When using the lm function in R, any factors I have in my data set are automatically converted into a series of binomial variables. For example, if I have a data.frame with a column named color and values red, green, blue. The lm function automatically replaces it with 3 variables colorred, colorgreen, colorblue which are binomial {0,1} When I use the gam function, R doesn't do this so I get an error. 1) Is there a way to ask the gam function to do this conversion for me? 2) If not, is there some other tool or utility to make this data transformation easy? 3) Last option - can I use lm to transform the data and then extract it into a new data.frame to then pass to gam? Thanks!!! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.