I am trying to fit the following linear model to logged per capita fecundity data (ie number of babies per female) for a mouse:

RsNRlS <- glm(formula = ln.fecundity ~ summer.rainfall + N + lagged.rainfall + season, …)

I am using this relationship in a simulation model, and the current statistical model I have fit is unsatisfactory. The problem is I get a global estimate of variance (MSE), but I think it varies across subsets of the data. Specifically, seasons when there is lots of reproduction (e.g. fall) tend to have high variance, while seasons with little reproduction (e.g. summer) have small amounts of variance. I am looking for a method for estimating the coefficients in my linear model, and estimating a separate error for subsets of the data (ie for each of the 4 seasons). The end goal is to take this linear model back into my simulation model to make predictions about fecundity, but with separate variance terms for subsets of the data.

I thought of several solutions, but some seem pretty ad hoc, while I think others would suffer from loss of a lot of power due to small sample size. The solutions I thought of were:
1) Fit the linear model to all data, but then estimate variances by season directly from the residuals. Seems ad hoc given the assumption of common variance in the initial fitting.
2) Use regression weights by season to reweight the data points/fitted values. Then in the simulation model inflate/deflate the estimate of MSE by these weights – seems ad hoc too, as translating weights (which would presumably basically downgrade the importance of outliers) into something meaningful for prediction using the regression equation seems arbitrary.
3) Fit a separate model for each season, either assume the global model (but maybe get poor fits due to #parameter v. #data points) or try reducing models and choosing the best for each season using AIC. The problem with this is that I think that some parameters have effects common to all seasons – eg lagged rainfall represents the effect of rainfall drowning babies in their burrows prior to the age at which they can be captured. This is pretty clearly a strong effect, just based on graphing the data, so I am uncertain about loosing power by splitting it by season.
4) Fit a mixed model to all the data, treating each season as a random factor with levels 1 or 0, and estimating the variance for each. Again this seems somewhat ad hoc, although based on my reading in elementary texts it would allow me to estimate the variance component for each of the seasons.


Based on talking to a colleague in my department some of the modern methods for fitting mixed models, in which the variance for each of the levels of a factor in a random effect can be estimated. That seems the best option, as it explicitly does what I want. However, it is well over my head in terms of either the R code or the statistical background. Can anyone suggest a good approach, or a published example, or some useful reading on how one might approach this problem?



Chris Wilcox
The Ecology Centre
Department of Zoology and Entomology
Brisbane, 4072  QLD
Australia
tel  61-7-3365-1686
fax 61-7-3365-1655
website www.uq.edu.au/~uqcwilco

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to