Re: [R] AIC in MuMIn

2010-08-19 Thread Gavin Simpson
On Thu, 2010-08-19 at 04:42 +0800, elaine kuo wrote:
 Yes, I tried the example in the ?dredge and agreed that something else
 caused the mistake.
 
 
 Aside from the cause which takes time to clarify (16 explanatory
 variables in the model), 
 I would like to ask another question.
 
 
 Please kindly advise if it is possible to show the singular model with
 only one certain variable using the command subset. (or maybe others)
 I tried the command subset=X3 but it returned multiple models
 including X3.
 
 
 The above demand might look unnecessary when visual inspection offers
 the solution with few exp. variables. 
 However, it is of great importance for recognizing AIC for each
 variable in the model selection with more than 10 variables.

No, there isn't. The object returned by dredge is a data frame so you
can subset it to your hearts content, but you'll need to select rows
where you specify that all your variables (except the one you are
interested in) are missing (NA). Without writing this all out, I don't
know of a quick way of doing this sort of subsetting.

In effect, you want

data(Cement)
lm1 - lm(y ~ ., data = Cement)
dd - dredge(lm1, subset = X1)

want - with(dd, is.na(X)  is.na(X2)  is.na(X3)  is.na(X4))
want
## how many models selected?
sum(want)
## OK selected just 1, show it
dd[want, , drop = FALSE]

Oh, actually, I suppose you could automate this, so it will return all
models with single variable:

dd - dredge(lm1)
parms - !is.na(dd[, -c(1, (ncol(dd) - c(0:7)))])
want - which(rowSums(parms) == 1)
dd[want, ]

Having said all this, I don't think this is a good way to do model
selection.

G

 
 
 Please kindly help and thank you in advance.
 
 
 Elaine
 
 
 code
 library(MuMIn)
 
 
 data(Cement)
 lm1  -  lm(y  ~  .,  data  =  Cement)
 dd  -  dredge(lm1,  beta  = FALSE,  eval  =  TRUE,  rank  =  AIC)
 X3-dredge(lm1,  rank  =  AIC, subset=X3)
 
 On Wed, Aug 18, 2010 at 9:18 PM, Gavin Simpson
 gavin.simp...@ucl.ac.uk wrote:
 On Wed, 2010-08-18 at 21:11 +0800, elaine kuo wrote:
  A cause other than data based on standardized regression
  was identified.
  It is that the manual command added with target - at the
 left hand
  side.
 
  C1 did not work but C2 did.
  C1 target-dredge(mig.stds,  subset  =  temp_max)
 
 
  C2 dredge(mig.stds,  subset  =  temp_max)
 
 
 Glad you have this working, but that can't possibly be the
 reason for
 the error. There must be something else going on. The only
 difference
 between the two calls is that you are storing the result
 somewhere, yet
 the error was coming from within the running of the dredge
 function.
 
 If everything works from within a new, clean R session then
 great.
 
 
 G
 
 
 
  Elaine
 
  On Wed, Aug 18, 2010 at 5:37 PM, elaine kuo
 elaine.kuo...@gmail.com
  wrote:
 
 
  
  Please suggest how to define subset in my
 case
 
 
  How would I know? I still haven't seen your
 data. You
  seem to be
  mistaken on what is and is not included in
 your model
  and you fitted it.
  What hope do we have...? However, given the
 model
  'mig.stds' from above
  in this email:
 
   mig.stds -lm(SummerM_ratio ~ temp_max +
 evi_mean +
  topo_var +
topo_mean + coast +
 Iso_index_0808,
## now tell R were to find
 the
  variables in formula
data = datum.std)
   ## If you are fitting a Gaussian GLM it is
 better
  fitted with lm()
 
 
  If you want to consider dredged models
 containing
  temp_max, then you
  would do
 
  dredge(mig.stds, subset = temp_max)
 
  If you want models that contain temp_max and
 coast,
  then you'd do
 
  dredge(mig.stds, subset = temp_max  coast)
 
  or
 
  dredge(mig.stds, fixed = ~ temp_max + coast)
 
  The bits you include in subset or fixed are
 the names
  of your variables
  that you want in or out of the models. In
 your 

Re: [R] AIC in MuMIn

2010-08-19 Thread elaine kuo
 
  Please kindly advise if it is possible to show the singular model with
  only one certain variable using the command subset. (or maybe others)
  I tried the command subset=X3 but it returned multiple models
  including X3.
 
 
  The above demand might look unnecessary when visual inspection offers
  the solution with few exp. variables.
  However, it is of great importance for recognizing AIC for each
  variable in the model selection with more than 10 variables.

 No, there isn't. The object returned by dredge is a data frame so you
 can subset it to your hearts content, but you'll need to select rows
 where you specify that all your variables (except the one you are
 interested in) are missing (NA). Without writing this all out, I don't
 know of a quick way of doing this sort of subsetting.

 In effect, you want

 data(Cement)
 lm1 - lm(y ~ ., data = Cement)
 dd - dredge(lm1, subset = X1)

 want - with(dd, is.na(X)  is.na(X2)  is.na(X3)  is.na(X4))
 want
 ## how many models selected?
 sum(want)
 ## OK selected just 1, show it
 dd[want, , drop = FALSE]

 Oh, actually, I suppose you could automate this, so it will return all
 models with single variable:

 dd - dredge(lm1)
 parms - !is.na(dd[, -c(1, (ncol(dd) - c(0:7)))])
 want - which(rowSums(parms) == 1)
 dd[want, ]

 Having said all this, I don't think this is a good way to do model
 selection.
 =


Thanks for the time-efficient formula.
Actually it is over my current ability to automate it, but it did save a lot
of time already.
Moreover, please kindly explain the command beginning with parms.
data (Cement) was checked but could not ensure why the row and the column
were set to be 1 and (ncol(dd) - c(0:7)).

Elaine

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] AIC in MuMIn

2010-08-19 Thread Gavin Simpson
On Fri, 2010-08-20 at 06:31 +0800, elaine kuo wrote:
snip /
 
 
 Oh, actually, I suppose you could automate this, so it will
 return all
 models with single variable:
 
 dd - dredge(lm1)
 parms - !is.na(dd[, -c(1, (ncol(dd) - c(0:7)))])
 want - which(rowSums(parms) == 1)
 dd[want, ]
 
 Having said all this, I don't think this is a good way to do
 model
 selection.
 
 =
 
 
 Thanks for the time-efficient formula.
 Actually it is over my current ability to automate it, but it did save
 a lot of time already.

You don't need to automate it, I did it for you. The four lines above
show you only the lines from dd (your dredge result) that contain only
one covariate. That was what you asked for, wasn't it?

 Moreover, please kindly explain the command beginning with parms.
 data (Cement) was checked but could not ensure why the row and the
 column were set to be 1 and (ncol(dd) - c(0:7)).

OK; for this, you need to first get that dredge returns a data frame
with some additional attributes. (It actually inherits from class
data.frame so whilst it isn't exactly a data frame it works like one.)
So we can subset it as we would any other R data frame.

If you look at the object returned from dredge, you see the first column
refers to the intercept, the next N columns are for your covariates in
the full model you fitted, with the remaining columns representing the
'rank' statistic (AIC, BIC etc) plus some other measures.

 head(dd)
Global model: lm(formula = y ~ ., data = Cement)
---
Model selection table 
(Int)  XX1 X2X3  X4 k   R.sq Adj.R.sq   RSS   AIC  AICc
11  52.581.468 0.6623   4 0.9787   0.9744 57.90 64.31 69.31
24  71.651.452 0.4161   -0.2365 5 0.9823   0.9764 47.97 63.87 72.44
23  48.191.696 0.6569  0.25 5 0.9823   0.9764 48.11 63.90 72.48
13 103.101.440  -0.6140 4 0.9725   0.9670 74.76 67.63 72.63
25 111.701.052-0.41 -0.6428 5 0.9813   0.9750 50.84 64.62 73.19
17  52.53 0.1551 1.467 0.6410   5 0.9798   0.9731 54.86 65.61 74.18
   delta weight
11 0.000  0.544
24 3.125  0.114
23 3.163  0.112
13 3.322  0.103
25 3.879  0.078
17 4.869  0.048

So we select out only the bits we want:

dd[, -c(1, (ncol(dd) - c(0:7)))]

This bit is selecting columns by dropping the first and the last eight
columns. ncol(dd) gives the number of columns, ncol(dd) - 0 is the last
column, ncol(dd) - 1 is the next to the last column and so on. There are
eight of these columns we don't want. If you just run

 c(1, (ncol(dd) - c(0:7)))
[1]  1 14 13 12 11 10  9  8  7

you'll see that we have the column integers we want to drop, and then we
negate it to get the subsetting to drop rather than retain these
columns:

-c(1, (ncol(dd) - c(0:7)))

Having dropped the superfluous columns we aren't interested in, we are
left with an 'array' with one column per covariate (predictor variable).
If a given covariate is not in the model, it gets an NA in its column
for that model. The is.na() function returns an object of the same
dimension as dd[, -c(1, (ncol(dd) - c(0:7)))] but which indicates if
variable is NA or not:

 is.na(dd[, -c(1, (ncol(dd) - c(0:7)))])
   XX1X2X3X4
11  TRUE FALSE FALSE  TRUE  TRUE
24  TRUE FALSE FALSE  TRUE FALSE
23  TRUE FALSE FALSE FALSE  TRUE
13  TRUE FALSE  TRUE  TRUE FALSE
25  TRUE FALSE  TRUE FALSE FALSE
17 FALSE FALSE FALSE  TRUE  TRUE
19 FALSE FALSE  TRUE  TRUE FALSE


The ! negates this turning TRUE into FALSE and vice versa. In computing
and R, TRUE == 1, FALSE == 0, so when we do rowSums on this, we get
information regarding how many covariates are in each model:

 rowSums(parms)
11 24 23 13 25 17 19 26 31 27 28 29 30 16 22 32 14 20  6  4  7 15 10  9  8  3 
 2  3  3  2  3  3  3  3  4  4  4  4  4  2  3  5  2  3  1  1  2  2  2  2  2  1 
12 18 21  5  2  1 
 2  3  3  1  1  0

You asked for only those models containing a single covariate, so they
are the models with rowSum == 1. The which() function tells us which
models (rows of dd) these are and we save this indicator in 'want'

 which(rowSums(parms) == 1)
 6  4  3  5  2 
19 20 26 30 31

The final step is to select out rows 19, 20, 26, 30 and 31 (want) from
dd as these are the models you said you were interested in.

 dd[want, ]
Global model: lm(formula = y ~ ., data = Cement)
---
Model selection table 
   (Int) XX1 X2 X3  X4 k   R.sq Adj.R.sqRSSAIC
6 117.60   -0.7382 3 0.6745   0.6450  883.9  97.74
4  57.42 0.78913 0.6663   0.6359  906.3  98.07
3  81.48   1.869   3 0.5339   0.4916 1266.0 102.40
5 110.20-1.256 3 0.2859   0.2210 1939.0 108.00
2  82.31 1.874 3 0.2353   0.1657 2077.0 108.80
   AICc delta weight
6 100.4 31.10  0.511
4 100.7 31.42  

Re: [R] AIC in MuMIn

2010-08-18 Thread Gavin Simpson
On Wed, 2010-08-18 at 05:42 +0800, elaine kuo wrote:
 Thank you.
 Most of the answers solved the puzzles.
 
 
 Q2 
 
 
  I tried to display sub-model with only temp_ran using the
 code below but
  failed.
  Please kindly suggest the potential failure cause.
 
  code
 
  library(MuMIn)
  datam
 -read.csv(c:/migration/Mig_ratio_20100817.csv,header=T,
  row.names=1)
 
 
 
 mig.stds -lm(SummerM_ratio ~ temp_max + evi_mean + topo_var +
  topo_mean + coast + Iso_index_0808,
  ## now tell R were to find the variables in
 formula
  data = datum.std)
 ## If you are fitting a Gaussian GLM it is better fitted with
 lm()
 
 
  = Please explain why fitted lm is better for GLM.

Seriously? A GLM specified as glm(, family = gaussian) is the linear
model that you'd get with lm(). lm() fits the model far more efficiently
than glm(). The code you showed specifically used 'family = gaussian',
hence my comment.

 But temp_ran is not in your model...
 
  error in eval(expr, envir, enclos), 'temp_ran' not found
 
 
 When used properly (none of this datam.std$ business), subset
 will do
 what you want:
 
  dd2 - dredge(lm1, subset = X1)
  dd2
 Global model: lm(formula = y ~ ., data = Cement)
 ---
 Model selection table
(Int)X X1 X2  X3  X4 k   R.sq
 Adj.R.sq RSS
 3   52.58  1.4680 0.6623 4 0.9787
 0.9744   57.90
 
 
 = Please suggest how to define subset in my case

How would I know? I still haven't seen your data. You seem to be
mistaken on what is and is not included in your model and you fitted it.
What hope do we have...? However, given the model 'mig.stds' from above
in this email:

 mig.stds -lm(SummerM_ratio ~ temp_max + evi_mean + topo_var +
  topo_mean + coast + Iso_index_0808,
  ## now tell R were to find the variables in formula
  data = datum.std)
 ## If you are fitting a Gaussian GLM it is better fitted with lm()

If you want to consider dredged models containing temp_max, then you
would do

dredge(mig.stds, subset = temp_max)

If you want models that contain temp_max and coast, then you'd do

dredge(mig.stds, subset = temp_max  coast)

or

dredge(mig.stds, fixed = ~ temp_max + coast)

The bits you include in subset or fixed are the names of your variables
that you want in or out of the models. In your case, the names of the
variables as input into the model formula. With 'subset' you need to use
logical operators (and [], or [|]) whilst with 'fixed' you can specify
a formula of variables that should be included or excluded in the same
way you'd write any R formula.

But, now having been told this, please note that this is *all* discussed
on the ?dredge help page if you bother to read it. I've never used this
package, and, OK, I have used R for going on for 11 or 12 years now so
am used to reading help pages and understand the language a bit more you
perhaps do, but you do seem to be asking questions or running into
problems that are all covered by the help pages.

 Finally, it would be highly appreciated to recommend any references of
 R for a beginner like me.

Read the An Introduction to R manual that comes with R or that can be
downloaded from the R website/CRAN. Also look at the contributed
documentation section on the R website which contains numerous free
introductory guides.

 Elaine

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] AIC in MuMIn

2010-08-18 Thread Gavin Simpson
On Wed, 2010-08-18 at 08:51 +0100, Gavin Simpson wrote:
 On Wed, 2010-08-18 at 05:42 +0800, elaine kuo wrote:
  Thank you.
  Most of the answers solved the puzzles.
snip /
   = Please explain why fitted lm is better for GLM.
 
 Seriously? A GLM specified as glm(, family = gaussian) is the linear
 model that you'd get with lm(). lm() fits the model far more efficiently
 than glm(). The code you showed specifically used 'family = gaussian',
 hence my comment.

Hmmm. Thinking some more, I might not have answered your (unstated)
question. What do your mean by GLM?

I mean the Generalized Linear Model, not the General Linear Model. The
Generalized one allows for non-normal responses and different
mean-variance relationships and is the GLM of Nelder and Wedderburn
(1972, J. Royal Statistical Society, Series A, 135(3),370-384) and the
monograph by McCullagh and Wedderburn (1989, Generalized Linear Models,
Chapman  Hall/CRC). The R function glm() fits these kinds of model.

The General Linear Model (IIRC) was the linking of linear regression and
anova into a single entity. The R function lm() fits these kinds of
models.

The linear model is a special case of the Generalized Linear Model when
the Gaussian error is used with the identity link function. Hence a
Gaussian GLM (my GLM) with the identity link fitted by glm() will give
the same results as lm(), but it will do so in a very inefficient
manner. As this was what your code was doing I suggested using lm()
instead.

HTH

G

 
  But temp_ran is not in your model...
  
   error in eval(expr, envir, enclos), 'temp_ran' not found
  
  
  When used properly (none of this datam.std$ business), subset
  will do
  what you want:
  
   dd2 - dredge(lm1, subset = X1)
   dd2
  Global model: lm(formula = y ~ ., data = Cement)
  ---
  Model selection table
 (Int)X X1 X2  X3  X4 k   R.sq
  Adj.R.sq RSS
  3   52.58  1.4680 0.6623 4 0.9787
  0.9744   57.90
  
  
  = Please suggest how to define subset in my case
 
 How would I know? I still haven't seen your data. You seem to be
 mistaken on what is and is not included in your model and you fitted it.
 What hope do we have...? However, given the model 'mig.stds' from above
 in this email:
 
  mig.stds -lm(SummerM_ratio ~ temp_max + evi_mean + topo_var +
   topo_mean + coast + Iso_index_0808,
   ## now tell R were to find the variables in formula
   data = datum.std)
  ## If you are fitting a Gaussian GLM it is better fitted with lm()
 
 If you want to consider dredged models containing temp_max, then you
 would do
 
 dredge(mig.stds, subset = temp_max)
 
 If you want models that contain temp_max and coast, then you'd do
 
 dredge(mig.stds, subset = temp_max  coast)
 
 or
 
 dredge(mig.stds, fixed = ~ temp_max + coast)
 
 The bits you include in subset or fixed are the names of your variables
 that you want in or out of the models. In your case, the names of the
 variables as input into the model formula. With 'subset' you need to use
 logical operators (and [], or [|]) whilst with 'fixed' you can specify
 a formula of variables that should be included or excluded in the same
 way you'd write any R formula.
 
 But, now having been told this, please note that this is *all* discussed
 on the ?dredge help page if you bother to read it. I've never used this
 package, and, OK, I have used R for going on for 11 or 12 years now so
 am used to reading help pages and understand the language a bit more you
 perhaps do, but you do seem to be asking questions or running into
 problems that are all covered by the help pages.
 
  Finally, it would be highly appreciated to recommend any references of
  R for a beginner like me.
 
 Read the An Introduction to R manual that comes with R or that can be
 downloaded from the R website/CRAN. Also look at the contributed
 documentation section on the R website which contains numerous free
 introductory guides.
 
  Elaine
 
 HTH
 
 G
 -- 
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
  Dr. Gavin Simpson [t] +44 (0)20 7679 0522
  ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
  Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
  Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
  UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson  

Re: [R] AIC in MuMIn

2010-08-18 Thread elaine kuo
 Hmmm. Thinking some more, I might not have answered your (unstated)
 question. What do your mean by GLM?

 = I meant generalized linear model as well. Thanks for the references.
The first one was mentioned first in my life time after keeping asking
the same question.


 I mean the Generalized Linear Model, not the General Linear Model. The
 Generalized one allows for non-normal responses and different
 mean-variance relationships and is the GLM of Nelder and Wedderburn
 (1972, J. Royal Statistical Society, Series A, 135(3),370-384) and the
 monograph by McCullagh and Wedderburn (1989, Generalized Linear Models,
 Chapman  Hall/CRC). The R function glm() fits these kinds of model.

 The General Linear Model (IIRC) was the linking of linear regression and
 anova into a single entity. The R function lm() fits these kinds of
 models.

 The linear model is a special case of the Generalized Linear Model when
 the Gaussian error is used with the identity link function. Hence a
 Gaussian GLM (my GLM) with the identity link fitted by glm() will give
 the same results as lm(), but it will do so in a very inefficient
 manner. As this was what your code was doing I suggested using lm()
 instead.


= Thanks for the important suggestion.
I tried both lm and glm until now but no one commented on the method.


 Elaine


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] AIC in MuMIn

2010-08-18 Thread elaine kuo


 
 Please suggest how to define subset in my case

 How would I know? I still haven't seen your data. You seem to be
 mistaken on what is and is not included in your model and you fitted it.
 What hope do we have...? However, given the model 'mig.stds' from above
 in this email:

  mig.stds -lm(SummerM_ratio ~ temp_max + evi_mean + topo_var +
   topo_mean + coast + Iso_index_0808,
   ## now tell R were to find the variables in formula
   data = datum.std)
  ## If you are fitting a Gaussian GLM it is better fitted with lm()

 If you want to consider dredged models containing temp_max, then you
 would do

 dredge(mig.stds, subset = temp_max)

 If you want models that contain temp_max and coast, then you'd do

 dredge(mig.stds, subset = temp_max  coast)

 or

 dredge(mig.stds, fixed = ~ temp_max + coast)

 The bits you include in subset or fixed are the names of your variables
 that you want in or out of the models. In your case, the names of the
 variables as input into the model formula. With 'subset' you need to use
 logical operators (and [], or [|]) whilst with 'fixed' you can specify
 a formula of variables that should be included or excluded in the same
 way you'd write any R formula.

 But, now having been told this, please note that this is *all* discussed
 on the ?dredge help page if you bother to read it. I've never used this
 package, and, OK, I have used R for going on for 11 or 12 years now so
 am used to reading help pages and understand the language a bit more you
 perhaps do, but you do seem to be asking questions or running into
 problems that are all covered by the help pages.

 = I posted it for help, after following the manual with the command dredge
 but receiving an error message two days ago.



  command   target-dredge(mig.stds,  subset  =  temp_max)

   error  in eval(expr, envir, enclos) : 'temp_max' not found

 One possible cause could be data = datam.std.
 datam.std was produced as the code below, which seemed to make it hard to
find explanatory variables.
 Please kindly share your experience in R, because I am unsure if my
assumption is logical or not.

Also, please kindly advise how to modify the command for dredging subset if
possible.
( command   target-dredge(mig.stds,  subset  =  temp_max))
Thank you in advance.

Elaine

code
library(MuMIn)
datam -read.csv(c:/migration/Mig_ratio_20100817.csv,header=T,
row.names=1)

 # std regression model (16 indep. variables)
  datam.sd-scale(datam)
  datam.std-as.data.frame(datam.sd)
  summary (datam.std)
  mean(datam.std)

# obtain standard deviation
sd(datam.std)

mig.stds
-lm(SummerM_ratio~temp_ran+temp_mean+temp_max+temp_min+evi_ran+evi_mean+evi_max+evi_min+prec_ran+prec_mean+prec_max+prec_min+topo_var+topo_mean+coast+Iso_index_0808,data=datam.std)

summary(mig.stds)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] AIC in MuMIn

2010-08-18 Thread Gavin Simpson
On Wed, 2010-08-18 at 17:37 +0800, elaine kuo wrote:
snip /
 = I posted it for help, after following the manual with the
 command dredge but receiving an error message two days ago. 
 
  command   target-dredge(mig.stds,  subset  =  temp_max)
error  in eval(expr, envir, enclos) : 'temp_max' not found

You really don't get it, do you? I can't see what the problem is with
your data + script, because I don't have your data! So I can't run your
script, reproduce your error and work round it, then send you the
solution. How many times do I have to ask for your data or, if not
possible, some dummy data that reproduces the same problem?

If your data weren't available to send, this is what I would have
expected, as requested by the posting guide:

## dummy data - this could be anything really
dat - data.frame(FOO = rnorm(100), BAR = rnorm(100, mean = 10, sd = 4),
  OTHER = seq(1, 10, length = 100),
  Y = runif(100))

## standardise data
dat.std - data.frame(scale(dat, center = TRUE, scale = TRUE))

## load MuMIn
require(MuMIn)

## fit full model
mod - lm(Y ~ FOO + BAR + OTHER, data = dat.std)

## data dredge
dredged.mod - dredge(mod, subset = OTHER)
dredged.mod

The above works:

 ## dummy data
 dat - data.frame(FOO = rnorm(100), BAR = rnorm(100, mean = 10, sd = 4),
+   OTHER = seq(1, 10, length = 100),
+   Y = runif(100))
 
 ## standardise data
 dat.std - data.frame(scale(dat, center = TRUE, scale = TRUE))
 
 ## load MuMIn
 require(MuMIn)
Loading required package: MuMIn
 
 ## fit full model
 mod - lm(Y ~ FOO + BAR + OTHER, data = dat.std)
 mod

Call:
lm(formula = Y ~ FOO + BAR + OTHER, data = dat.std)

Coefficients:
(Intercept)  FOO  BAROTHER  
  1.908e-16   -3.463e-021.635e-02   -1.299e-01  

 summary(mod)

Call:
lm(formula = Y ~ FOO + BAR + OTHER, data = dat.std)

Residuals:
 Min   1Q   Median   3Q  Max 
-1.83375 -0.80706  0.01736  0.81847  1.72803 

Coefficients:
  Estimate Std. Error  t value Pr(|t|)
(Intercept)  1.908e-16  1.006e-01 1.90e-151.000
FOO -3.463e-02  1.013e-01   -0.3420.733
BAR  1.635e-02  1.011e-010.1620.872
OTHER   -1.299e-01  1.013e-01   -1.2820.203

Residual standard error: 1.006 on 96 degrees of freedom
Multiple R-squared: 0.01901,Adjusted R-squared: -0.01165 
F-statistic:  0.62 on 3 and 96 DF,  p-value: 0.6037 

 dredged.mod - dredge(mod, subset = OTHER)
 dredged.mod
Global model: lm(formula = Y ~ FOO + BAR + OTHER, data = dat.std)
---
Model selection table 
  (Int) BAR  FOO OTH kR.sq  Adj.R.sq   RSS   AIC  AICc
1 1.886e-16  -0.1324 3 0.01754  0.007514 97.26 287.0 287.3
3 1.886e-16 -0.03471 -0.1302 4 0.01874 -0.001493 97.14 288.9 289.3
2 1.908e-16 0.01652  -0.1322 4 0.01781 -0.002439 97.24 289.0 289.4
4 1.908e-16 0.01635 -0.03463 -0.1299 5 0.01901 -0.011650 97.12 290.9 291.5
  delta weight
1 0.000  0.549
3 2.049  0.197
2 2.143  0.188
4 4.239  0.066

So is this due to the way your names are formed?

dat2 - data.frame(dat, SOME_THING = rpois(100, 4))
dat2.std - data.frame(scale(dat2, center = TRUE, scale = TRUE))
mod2 - lm(Y ~ FOO + BAR + OTHER + SOME_THING, data = dat2.std)

## data dredge
dredged.mod2 - dredge(mod2, subset = SOME_THING)
dredged.mod2

No!

So I can't reproduce your problem and until you send the *exact* code
you enter into R plus your data I can no longer help you. Apologies if I
am beginning to sound harsh, but you aren't doing what I have asked you
to do (repeatedly) and what the posting guide for this list asks you to.
If you want free help here, the least you can do is respect the rules.

Feel free to send the above data/code off list to me if you wish. But I
won't answer anything further on this thread until they are provided. I
am wasting my time in a futile attempt to help you with your problem
because you won't do what I ask.

G

  One possible cause could be data = datam.std. 
  datam.std was produced as the code below, which seemed to make it
 hard to find explanatory variables. 
  Please kindly share your experience in R, because I am unsure if my
 assumption is logical or not. 
 
 
 Also, please kindly advise how to modify the command for dredging
 subset if possible.
 ( command   target-dredge(mig.stds,  subset  =  temp_max))
 Thank you in advance. 
 
 
 Elaine
 
 
 code
 library(MuMIn)
 datam -read.csv(c:/migration/Mig_ratio_20100817.csv,header=T,
 row.names=1)
  
  # std regression model (16 indep. variables)
   datam.sd-scale(datam)
   datam.std-as.data.frame(datam.sd)
   summary (datam.std)
   mean(datam.std)
 
 
 # obtain standard deviation
 sd(datam.std)
 
 
 mig.stds -lm(SummerM_ratio~temp_ran+temp_mean+temp_max+temp_min
 +evi_ran+evi_mean+evi_max+evi_min+prec_ran+prec_mean+prec_max+prec_min
 +topo_var+topo_mean+coast+Iso_index_0808,data=datam.std)
 
 summary(mig.stds)
 
 
 
 

-- 

Re: [R] AIC in MuMIn

2010-08-18 Thread elaine kuo
A cause other than data based on standardized regression was identified.
It is that the manual command added with target - at the left hand side.


C1 did not work but C2 did.
C1 target-dredge(mig.stds,  subset  =  temp_max)

C2 dredge(mig.stds,  subset  =  temp_max)

Elaine

On Wed, Aug 18, 2010 at 5:37 PM, elaine kuo elaine.kuo...@gmail.com wrote:


 
 Please suggest how to define subset in my case

 How would I know? I still haven't seen your data. You seem to be
 mistaken on what is and is not included in your model and you fitted it.
 What hope do we have...? However, given the model 'mig.stds' from above
 in this email:

  mig.stds -lm(SummerM_ratio ~ temp_max + evi_mean + topo_var +
   topo_mean + coast + Iso_index_0808,
   ## now tell R were to find the variables in formula
   data = datum.std)
  ## If you are fitting a Gaussian GLM it is better fitted with lm()

 If you want to consider dredged models containing temp_max, then you
 would do

 dredge(mig.stds, subset = temp_max)

 If you want models that contain temp_max and coast, then you'd do

 dredge(mig.stds, subset = temp_max  coast)

 or

 dredge(mig.stds, fixed = ~ temp_max + coast)

 The bits you include in subset or fixed are the names of your variables
 that you want in or out of the models. In your case, the names of the
 variables as input into the model formula. With 'subset' you need to use
 logical operators (and [], or [|]) whilst with 'fixed' you can specify
 a formula of variables that should be included or excluded in the same
 way you'd write any R formula.

 But, now having been told this, please note that this is *all* discussed
 on the ?dredge help page if you bother to read it. I've never used this
 package, and, OK, I have used R for going on for 11 or 12 years now so
 am used to reading help pages and understand the language a bit more you
 perhaps do, but you do seem to be asking questions or running into
 problems that are all covered by the help pages.

 = I posted it for help, after following the manual with the command
 dredge but receiving an error message two days ago.



  command   target-dredge(mig.stds,  subset  =  temp_max)

error  in eval(expr, envir, enclos) : 'temp_max' not found

  One possible cause could be data = datam.std.
  datam.std was produced as the code below, which seemed to make it hard to
 find explanatory variables.
  Please kindly share your experience in R, because I am unsure if my
 assumption is logical or not.

 Also, please kindly advise how to modify the command for dredging subset if
 possible.
 ( command   target-dredge(mig.stds,  subset  =  temp_max))
 Thank you in advance.

 Elaine

 code
 library(MuMIn)
 datam -read.csv(c:/migration/Mig_ratio_20100817.csv,header=T,
 row.names=1)

  # std regression model (16 indep. variables)
   datam.sd-scale(datam)
   datam.std-as.data.frame(datam.sd)
   summary (datam.std)
   mean(datam.std)

 # obtain standard deviation
 sd(datam.std)

 mig.stds
 -lm(SummerM_ratio~temp_ran+temp_mean+temp_max+temp_min+evi_ran+evi_mean+evi_max+evi_min+prec_ran+prec_mean+prec_max+prec_min+topo_var+topo_mean+coast+Iso_index_0808,data=datam.std)

 summary(mig.stds)





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] AIC in MuMIn

2010-08-18 Thread Gavin Simpson
On Wed, 2010-08-18 at 21:11 +0800, elaine kuo wrote:
 A cause other than data based on standardized regression
 was identified.
 It is that the manual command added with target - at the left hand
 side.

 C1 did not work but C2 did.
 C1 target-dredge(mig.stds,  subset  =  temp_max)
 
 
 C2 dredge(mig.stds,  subset  =  temp_max)

Glad you have this working, but that can't possibly be the reason for
the error. There must be something else going on. The only difference
between the two calls is that you are storing the result somewhere, yet
the error was coming from within the running of the dredge function.

If everything works from within a new, clean R session then great.

G

 
 
 Elaine
 
 On Wed, Aug 18, 2010 at 5:37 PM, elaine kuo elaine.kuo...@gmail.com
 wrote:
 
 
 
 Please suggest how to define subset in my case
 
 
 How would I know? I still haven't seen your data. You
 seem to be
 mistaken on what is and is not included in your model
 and you fitted it.
 What hope do we have...? However, given the model
 'mig.stds' from above
 in this email:
 
  mig.stds -lm(SummerM_ratio ~ temp_max + evi_mean +
 topo_var +
   topo_mean + coast + Iso_index_0808,
   ## now tell R were to find the
 variables in formula
   data = datum.std)
  ## If you are fitting a Gaussian GLM it is better
 fitted with lm()
 
 
 If you want to consider dredged models containing
 temp_max, then you
 would do
 
 dredge(mig.stds, subset = temp_max)
 
 If you want models that contain temp_max and coast,
 then you'd do
 
 dredge(mig.stds, subset = temp_max  coast)
 
 or
 
 dredge(mig.stds, fixed = ~ temp_max + coast)
 
 The bits you include in subset or fixed are the names
 of your variables
 that you want in or out of the models. In your case,
 the names of the
 variables as input into the model formula. With
 'subset' you need to use
 logical operators (and [], or [|]) whilst with
 'fixed' you can specify
 a formula of variables that should be included or
 excluded in the same
 way you'd write any R formula.
 
 But, now having been told this, please note that this
 is *all* discussed
 on the ?dredge help page if you bother to read it.
 I've never used this
 package, and, OK, I have used R for going on for 11 or
 12 years now so
 am used to reading help pages and understand the
 language a bit more you
 perhaps do, but you do seem to be asking questions or
 running into
 problems that are all covered by the help pages.
 
 
 = I posted it for help, after following the manual
 with the command dredge but receiving an error message
 two days ago. 
 
  command   target-dredge(mig.stds,  subset  =
  temp_max)
error  in eval(expr, envir, enclos) : 'temp_max' not found
 
 
  One possible cause could be data = datam.std. 
  datam.std was produced as the code below, which seemed to
 make it hard to find explanatory variables. 
  Please kindly share your experience in R, because I am unsure
 if my assumption is logical or not. 
 
 
 Also, please kindly advise how to modify the command for
 dredging subset if possible.
 ( command   target-dredge(mig.stds,  subset  =  temp_max))
 Thank you in advance. 
 
 
 Elaine
 
 
 code
 library(MuMIn)
 datam
 -read.csv(c:/migration/Mig_ratio_20100817.csv,header=T,
 row.names=1)
  
  # std regression model (16 indep. variables)
   datam.sd-scale(datam)
   datam.std-as.data.frame(datam.sd)
   summary (datam.std)
   mean(datam.std)
 
 
 # obtain standard deviation
 sd(datam.std)
 
 
 mig.stds -lm(SummerM_ratio~temp_ran+temp_mean+temp_max
 

Re: [R] AIC in MuMIn

2010-08-18 Thread elaine kuo
Yes, I tried the example in the ?dredge and agreed that something else
caused the mistake.

Aside from the cause which takes time to clarify (16 explanatory variables
in the model),
I would like to ask another question.

Please kindly advise if it is possible to show the singular model with only
one certain variable using the command subset. (or maybe others)
I tried the command subset=X3 but it returned multiple models including
X3.

The above demand might look unnecessary when visual inspection offers the
solution with few exp. variables.
However, it is of great importance for recognizing AIC for each variable in
the model selection with more than 10 variables.

Please kindly help and thank you in advance.

Elaine

code
library(MuMIn)

data(Cement)
lm1  -  lm(y  ~  .,  data  =  Cement)
dd  -  dredge(lm1,  beta  = FALSE,  eval  =  TRUE,  rank  =  AIC)
X3-dredge(lm1,  rank  =  AIC, subset=X3)

On Wed, Aug 18, 2010 at 9:18 PM, Gavin Simpson gavin.simp...@ucl.ac.ukwrote:

 On Wed, 2010-08-18 at 21:11 +0800, elaine kuo wrote:
  A cause other than data based on standardized regression
  was identified.
  It is that the manual command added with target - at the left hand
  side.

  C1 did not work but C2 did.
  C1 target-dredge(mig.stds,  subset  =  temp_max)
 
 
  C2 dredge(mig.stds,  subset  =  temp_max)

 Glad you have this working, but that can't possibly be the reason for
 the error. There must be something else going on. The only difference
 between the two calls is that you are storing the result somewhere, yet
 the error was coming from within the running of the dredge function.

 If everything works from within a new, clean R session then great.

 G

 
 
  Elaine
 
  On Wed, Aug 18, 2010 at 5:37 PM, elaine kuo elaine.kuo...@gmail.com
  wrote:
 
 
  
  Please suggest how to define subset in my case
 
 
  How would I know? I still haven't seen your data. You
  seem to be
  mistaken on what is and is not included in your model
  and you fitted it.
  What hope do we have...? However, given the model
  'mig.stds' from above
  in this email:
 
   mig.stds -lm(SummerM_ratio ~ temp_max + evi_mean +
  topo_var +
topo_mean + coast + Iso_index_0808,
## now tell R were to find the
  variables in formula
data = datum.std)
   ## If you are fitting a Gaussian GLM it is better
  fitted with lm()
 
 
  If you want to consider dredged models containing
  temp_max, then you
  would do
 
  dredge(mig.stds, subset = temp_max)
 
  If you want models that contain temp_max and coast,
  then you'd do
 
  dredge(mig.stds, subset = temp_max  coast)
 
  or
 
  dredge(mig.stds, fixed = ~ temp_max + coast)
 
  The bits you include in subset or fixed are the names
  of your variables
  that you want in or out of the models. In your case,
  the names of the
  variables as input into the model formula. With
  'subset' you need to use
  logical operators (and [], or [|]) whilst with
  'fixed' you can specify
  a formula of variables that should be included or
  excluded in the same
  way you'd write any R formula.
 
  But, now having been told this, please note that this
  is *all* discussed
  on the ?dredge help page if you bother to read it.
  I've never used this
  package, and, OK, I have used R for going on for 11 or
  12 years now so
  am used to reading help pages and understand the
  language a bit more you
  perhaps do, but you do seem to be asking questions or
  running into
  problems that are all covered by the help pages.
 
 
  = I posted it for help, after following the manual
  with the command dredge but receiving an error message
  two days ago.
 
   command   target-dredge(mig.stds,  subset  =
   temp_max)
 error  in eval(expr, envir, enclos) : 'temp_max' not found
 
 
   One possible cause could be data = datam.std.
   datam.std was produced as the code below, which seemed to
  make it hard to find explanatory variables.
   Please kindly share your experience in R, because I am unsure
  if my assumption is logical or not.
 
 
  Also, please kindly advise how to modify the 

Re: [R] AIC in MuMIn

2010-08-17 Thread Gavin Simpson
On Tue, 2010-08-17 at 16:05 +0800, elaine kuo wrote:
 Hello,

Why did you decide to post the exact same message from two different
email addresses??

 I am using package MuMIn to calculate AIC for a full model with 10
 explanatory variables.
 Thanks in advance in sharing your experience.
 
 Q1
 In the AIC list of all models, each model is differentiated by model number.
 Please kindly advise if it is possible to
 find the corresponding explanatory variable(s) for the model number.

Did you read ?dredge and look at the example to see how it worked? I've
never used this package and 30 seconds installing it, reading ?dredge
and running the very first example gave me the answer!

 # Example from Burnham and Anderson (2002), page 100:
 data(Cement)
 lm1 - lm(y ~ ., data = Cement)
 dd - dredge(lm1)
 subset(dd, delta  4)
Global model: lm(formula = y ~ ., data = Cement)
---
Model selection table 
(Int) XX1 X2X3  X4 k   R.sq Adj.R.sq   RSS   AIC  AICc
11  52.58   1.468 0.6623   4 0.9787   0.9744 57.90 64.31 69.31
24  71.65   1.452 0.4161   -0.2365 5 0.9823   0.9764 47.97 63.87 72.44
23  48.19   1.696 0.6569  0.25 5 0.9823   0.9764 48.11 63.90 72.48
13 103.10   1.440  -0.6140 4 0.9725   0.9670 74.76 67.63 72.63
25 111.70   1.052-0.41 -0.6428 5 0.9813   0.9750 50.84 64.62 73.19
   delta weight
11 0.000  0.572
24 3.125  0.120
23 3.163  0.118
13 3.322  0.109
25 3.879  0.082

(or make the last line just read dd to print the full table)

So this clearly shows which variables (X, X1, etc) are in each model.
So, without anything further to go with regarding your problem, does
this answer this question?

 Q2 error message
 I tried to display sub-model with only temp_ran using the code below but
 failed.
 Please kindly suggest the potential failure cause.
 
 code
 
 rm(list=ls())

Please don't do the above; people might run your code and delete their
workspace if they aren't looking closely!

 library(MuMIn)
 datam -read.csv(c:/migration/Mig_ratio_20100817.csv,header=T,
 row.names=1)

You do realise this isn't reproducible as I don't have your data file?
This is not what the posting guide asks you for. If you can't make the
problem appear with available data, then you might need to make *your*
data available for us to help.

 mig.stds -glm(datam.std$SummerM_ratio~
 datam.std$temp_max++datam.std$evi_mean+datam.std$topo_var+datam.std$topo_mean+

 datam.std$coast+datam.std$Iso_index_0808,data=datam.std,family=gaussian)

Uggg! Ok, first lesson before we go any further; STOP using the data
frame name in your model formula. This is WRONG, WRONG, WRONG (Is
that clear enough ;-) I don't know where you learnt this, but unlearn
it, FAST! If someone taught you this, have a word with them.

Your model should be fitted like this:

mig.stds -lm(SummerM_ratio ~ temp_max + evi_mean + topo_var +
  topo_mean + coast + Iso_index_0808,
  ## now tell R were to find the variables in formula
  data = datum.std)
## If you are fitting a Gaussian GLM it is better fitted with lm()

 target.model -  dredge(mig.stds, subset=datam.std$temp_ran)

But temp_ran is not in your model...

 error in eval(expr, envir, enclos), 'temp_ran' not found

Which would explain the error:

 dd2 - dredge(lm1, subset = FOO)
Error in eval(expr, envir, enclos) : object 'FOO' not found

When used properly (none of this datam.std$ business), subset will do
what you want:

 dd2 - dredge(lm1, subset = X1)
 dd2
Global model: lm(formula = y ~ ., data = Cement)
---
Model selection table 
(Int)X X1 X2  X3  X4 k   R.sq Adj.R.sq RSS
3   52.58  1.4680 0.6623 4 0.9787   0.9744   57.90
10  71.65  1.4520 0.4161 -0.2365 5 0.9823   0.9764   47.97
9   48.19  1.6960 0.6569  0.2500 5 0.9823   0.9764   48.11
5  103.10  1.4400-0.6140 4 0.9725   0.9670   74.76
11 111.70  1.0520-0.4100 -0.6428 5 0.9813   0.9750   50.84
6   52.53  0.15510 1.4670 0.6410 5 0.9798   0.9731   54.86
8  105.00 -0.16160 1.4380-0.6379 5 0.9735   0.9647   71.91
15  62.41  1.5510 0.5102  0.1019 -0.1441 6 0.9824   0.9736   47.86
12  47.68 -0.05068 1.7240 0.6632  0.2801 6 0.9824   0.9735   47.93
13  70.97  0.01981 1.4520 0.4221 -0.2282 6 0.9823   0.9735   47.94
14 111.50  0.13500 0.9923-0.4747 -0.6273 6 0.9818   0.9727   49.44
16  59.10 -0.01831 1.5930 0.5447  0.1453 -0.1124 7 0.9824   0.9698   47.85
2   71.93  1.51200 1.73004 0.6843   0.6212  857.40
1   81.48  1.86903 0.5339   0.4916 1266.00
4   72.35  2.3120 0.4945 4 0.5482   0.4578 1227.00
7   82.99  1.93300 1.0250-0.7430 5 0.7048   0.6064  801.80

 Q3 showing sub-models for two assigned explanatory variables
 The manual only explains how to exclude two variables.
 Please advise how to 

Re: [R] AIC in MuMIn

2010-08-17 Thread elaine kuo
Thank you.
Most of the answers solved the puzzles.

Q2


  I tried to display sub-model with only temp_ran using the code below but
  failed.
  Please kindly suggest the potential failure cause.
 
  code
 
  library(MuMIn)
  datam -read.csv(c:/migration/Mig_ratio_20100817.csv,header=T,
  row.names=1)


 mig.stds -lm(SummerM_ratio ~ temp_max + evi_mean + topo_var +
  topo_mean + coast + Iso_index_0808,
  ## now tell R were to find the variables in formula
  data = datum.std)
 ## If you are fitting a Gaussian GLM it is better fitted with lm()


 = Please explain why fitted lm is better for GLM.




 But temp_ran is not in your model...

  error in eval(expr, envir, enclos), 'temp_ran' not found

 When used properly (none of this datam.std$ business), subset will do
 what you want:

  dd2 - dredge(lm1, subset = X1)
  dd2
 Global model: lm(formula = y ~ ., data = Cement)
 ---
 Model selection table
(Int)X X1 X2  X3  X4 k   R.sq Adj.R.sq RSS
 3   52.58  1.4680 0.6623 4 0.9787   0.9744   57.90

 = Please suggest how to define subset in my case

Finally, it would be highly appreciated to recommend any references of R for
a beginner like me.


Elaine

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.