Dear all, I am attempting to explain patterns of arthropod family richness (count data) using a regression model. It seems to be able to do a pretty good job as an explanatory model (i.e. demonstrating relationships between dependent and independent variables), but it has systematic problems as a predictive model: It is biased high at low observed values of family richness and biased low at high observed values of family richness (see attached pdf). I have tried diverse kinds of reasonable regression models mostly as in Zeileis, et al. (2007), as well as transforming my variables, both with only small improvements.
Do you have suggestions for making a model that would perform better as a predictive model? Thank you for your time. Sincerely, Matthew Bowser STEP student USFWS Kenai National Wildlife Refuge Soldotna, Alaska, USA M.Sc. student University of Alaska Fairbanks Fairbankse, Alaska, USA Reference Zeileis, A., C. Kleiber, and S. Jackman, 2007. Regression models for count data in R. Technical Report 53, Department of Statistics and Mathematics, Wirtschaftsuniversität Wien, Wien, Austria. URL http://cran.r-project.org/doc/vignettes/pscl/countreg.pdf. Code `data` <- structure(list(D = c(4, 5, 12, 4, 9, 15, 4, 8, 3, 9, 6, 17, 4, 9, 6, 9, 3, 9, 7, 11, 17, 3, 10, 8, 9, 6, 7, 9, 7, 5, 15, 15, 12, 9, 10, 4, 4, 15, 7, 7, 12, 7, 12, 7, 7, 7, 5, 14, 7, 13, 1, 9, 2, 13, 6, 8, 2, 10, 5, 14, 4, 13, 5, 17, 12, 13, 7, 12, 5, 6, 10, 6, 6, 10, 4, 4, 12, 10, 3, 4, 4, 6, 7, 15, 1, 8, 8, 5, 12, 0, 5, 7, 4, 9, 6, 10, 5, 7, 7, 14, 3, 8, 15, 14, 7, 8, 7, 8, 8, 10, 9, 2, 7, 8, 2, 6, 7, 9, 3, 20, 10, 10, 4, 2, 8, 10, 10, 8, 8, 12, 8, 6, 16, 10, 5, 1, 1, 5, 3, 11, 4, 9, 16, 3, 1, 6, 5, 5, 7, 11, 11, 5, 7, 5, 3, 2, 3, 0, 3, 0, 4, 1, 12, 16, 9, 0, 7, 0, 11, 7, 9, 4, 16, 9, 10, 0, 1, 9, 15, 6, 8, 6, 4, 6, 7, 5, 7, 14, 16, 5, 8, 1, 8, 2, 10, 9, 6, 11, 3, 16, 3, 6, 8, 12, 5, 1, 1, 3, 3, 1, 5, 15, 4, 2, 2, 6, 5, 0, 0, 0, 3, 0, 16, 0, 9, 0, 0, 8, 1, 2, 2, 3, 4, 17, 4, 1, 4, 6, 4, 3, 15, 2, 2, 13, 1, 9, 7, 7, 13, 10, 11, 2, 15, 7), Day = c(159, 159, 159, 159, 166, 175, 161, 168, 161, 166, 161, 166, 161, 161, 161, 175, 161, 175, 161, 165, 176, 161, 163, 161, 168, 161, 161, 161, 161, 161, 165, 176, 175, 176, 163, 175, 163, 168, 163, 176, 176, 165, 176, 175, 161, 163, 163, 168, 163, 175, 167, 176, 167, 165, 165, 169, 165, 169, 165, 161, 165, 175, 165, 176, 175, 167, 167, 175, 167, 164, 167, 164, 181, 164, 167, 164, 176, 164, 167, 164, 167, 164, 167, 175, 167, 173, 176, 173, 178, 167, 173, 172, 173, 178, 178, 172, 181, 182, 173, 162, 162, 173, 178, 173, 172, 162, 173, 162, 173, 162, 173, 170, 178, 166, 166, 162, 166, 177, 166, 170, 166, 172, 172, 166, 172, 166, 174, 162, 164, 162, 170, 164, 170, 164, 170, 164, 177, 164, 164, 174, 174, 162, 170, 162, 172, 162, 165, 162, 165, 177, 172, 162, 170, 162, 170, 174, 165, 174, 166, 172, 174, 172, 174, 170, 170, 165, 170, 174, 174, 172, 174, 172, 174, 165, 170, 165, 170, 174, 172, 174, 172, 175, 175, 170, 171, 174, 174, 174, 172, 175, 171, 175, 174, 174, 174, 175, 172, 171, 171, 174, 160, 175, 160, 171, 170, 175, 170, 170, 160, 160, 160, 171, 171, 171, 171, 160, 160, 160, 171, 171, 176, 171, 176, 176, 171, 176, 171, 176, 176, 176, 176, 159, 166, 159, 159, 166, 168, 169, 159, 168, 169, 166, 163, 180, 163, 165, 164, 180, 166, 166, 164, 164, 177, 166), NDVI = c(0.187, 0.2, 0.379, 0.253, 0.356, 0.341, 0.268, 0.431, 0.282, 0.181, 0.243, 0.327, 0.26, 0.232, 0.438, 0.275, 0.169, 0.288, 0.138, 0.404, 0.386, 0.194, 0.266, 0.23, 0.333, 0.234, 0.258, 0.333, 0.234, 0.096, 0.354, 0.394, 0.304, 0.162, 0.565, 0.348, 0.345, 0.226, 0.316, 0.312, 0.333, 0.28, 0.325, 0.243, 0.194, 0.29, 0.221, 0.217, 0.122, 0.289, 0.475, 0.048, 0.416, 0.481, 0.159, 0.238, 0.183, 0.28, 0.32, 0.288, 0.24, 0.287, 0.363, 0.367, 0.24, 0.55, 0.441, 0.34, 0.295, 0.23, 0.32, 0.184, 0.306, 0.232, 0.289, 0.341, 0.221, 0.333, 0.17, 0.139, 0.2, 0.204, 0.301, 0.253, -0.08, 0.309, 0.232, 0.23, 0.239, -0.12, 0.26, 0.285, 0.45, 0.348, 0.396, 0.311, 0.318, 0.31, 0.261, 0.441, 0.147, 0.283, 0.339, 0.224, 0.5, 0.265, 0.2, 0.287, 0.398, 0.116, 0.292, 0.045, 0.137, 0.542, 0.171, 0.38, 0.469, 0.325, 0.139, 0.166, 0.247, 0.253, 0.466, 0.26, 0.288, 0.34, 0.288, 0.26, 0.178, 0.274, 0.358, 0.285, 0.225, 0.162, 0.223, 0.301, -0.398, -0.2, 0.239, 0.228, 0.255, 0.166, 0.306, 0.28, 0.279, 0.208, 0.377, 0.413, 0.489, 0.417, 0.333, 0.208, 0.232, 0.431, 0.283, 0.241, 0.105, 0.18, -0.172, -0.374, 0.25, 0.043, 0.215, 0.204, 0.19, 0.177, -0.106, -0.143, 0.062, 0.462, 0.256, 0.229, 0.314, 0.415, 0.307, 0.238, -0.35, 0.34, 0.275, 0.097, 0.353, 0.214, 0.435, 0.055, -0.289, 0.239, 0.186, 0.135, 0.259, 0.268, 0.258, 0.032, 0.489, 0.389, 0.298, 0.164, 0.325, 0.254, -0.059, 0.524, 0.539, 0.25, 0.175, 0.326, 0.302, -0.047, -0.301, -0.149, 0.358, 0.495, 0.311, 0.235, 0.558, -0.156, 0, 0.146, 0.329, -0.069, -0.352, -0.356, -0.206, -0.179, 0.467, -0.325, 0.39, -0.399, -0.165, 0.267, -0.334, -0.17, 0.58, 0.228, 0.234, 0.351, 0.3, -0.018, 0.125, 0.176, 0.322, 0.246, 0.376, -0.185, 0.342, 0.142, -0.075, 0.186, 0.333, 0.112, 0.272, 0.277, 0.203, 0.37, 0.465, 0.425), VegS = c(14, 11, 18, 21, 31, 20, 11, 17, 10, 15, 27, 8, 17, 13, 16, 9, 10, 16, 10, 15, 11, 9, 14, 11, 11, 10, 24, 18, 12, 6, 25, 21, 25, 8, 14, 18, 11, 16, 20, 16, 10, 16, 18, 14, 13, 11, 15, 23, 11, 28, 12, 17, 12, 18, 10, 15, 7, 15, 9, 16, 18, 16, 20, 18, 12, 19, 16, 18, 20, 15, 24, 9, 15, 9, 16, 14, 17, 14, 7, 9, 9, 12, 13, 15, 14, 11, 17, 8, 14, 15, 12, 8, 10, 12, 8, 16, 15, 22, 16, 21, 10, 15, 20, 14, 27, 21, 19, 22, 21, 11, 13, 10, 13, 14, 9, 22, 22, 20, 12, 16, 20, 19, 26, 14, 13, 23, 14, 22, 19, 15, 28, 16, 20, 25, 10, 19, 0, 10, 8, 11, 17, 13, 17, 23, 37, 32, 19, 26, 12, 11, 24, 11, 21, 25, 8, 15, 21, 31, 17, 0, 12, 6, 23, 19, 29, 14, 9, 0, 18, 23, 20, 15, 15, 17, 27, 17, 2, 24, 17, 16, 26, 11, 23, 24, 10, 26, 21, 12, 20, 12, 29, 22, 20, 16, 41, 19, 27, 28, 10, 35, 28, 23, 14, 5, 23, 15, 17, 12, 11, 24, 11, 14, 7, 7, 27, 17, 15, 10, 16, 2, 11, 21, 18, 15, 8, 17, 10, 18, 15, 18, 31, 9, 14, 11, 19, 22, 8, 19, 17, 18, 25, 11, 17, 32, 25, 18, 21, 19, 35, 14, 29, 9, 28, 14), T = c(13, 15.4, 15.6, 12.3, 12.7, 13.3, 6, 13, 9.2, 17.8, 9.4, 14.2, 8.7, 14, 8.6, 13.9, 9.2, 15.1, 9.4, 16.5, 14, 11.5, 15.5, 13.3, 12.7, 14, 10.5, 14, 10.1, 16.7, 15.2, 11.2, 11.7, 17.9, 13.3, 13.9, 8.7, 16.7, 7.8, 7.9, 10.9, 15.5, 14, 14.1, 14.3, 13.3, 11.6, 16.5, 12.7, 12.6, 8.3, 9, 12.4, 15, 11.8, 14.1, 10.5, 12.4, 10.5, 17.5, 9.2, 16.3, 5.3, 5.9, 11.9, 8.9, 7.7, 15.2, 8.6, 13.2, 9.5, 15.8, 12.5, 13.8, 10.7, 10.5, 7.7, 11, 9.3, 14.6, 12, 15.4, 12.3, 14, 8.3, 19.8, 15.5, 14.3, 9, 7.6, 15.3, 12.8, 14.4, 14, 10.5, 8.9, 13.4, 12.8, 12.9, 11.2, 13.1, 10, 12.4, 15.4, 7.6, 14.9, 13.1, 11.1, 8.6, 13.6, 8.4, 11.5, 12.5, 15.6, 8.3, 9.6, 8.7, 9.7, 10.5, 12.8, 8.6, 12.7, 6.7, 8.5, 9.9, 8.3, 12.8, 9.8, 10.5, 10.7, 9.6, 11.1, 13.7, 9.5, 8.8, 3.4, 10.2, 3.5, 8.4, 11.9, 12.3, 10.2, 13.1, 8.4, 3.1, 7.2, 13.2, 7.6, 11.3, 13.5, 7.8, 5.5, 10.7, 4.8, 7.9, 13.5, 13.5, 2.1, 7.2, 9.7, 12, 9.2, 13.2, 12, 17.1, 7.9, 12.7, 11.8, 16, 6.4, 12.9, 8.1, 12.6, 10.2, 13.3, 8.5, 9.7, 9.9, 18.2, 11, 8.4, 1.5, 7.3, 10.6, 13.6, 8.4, 7.2, 13, 15.5, 9.7, 13.2, 5.9, 9.5, 10.4, 12.9, 2.6, 17.2, 15.4, 10.5, 6.7, 6.6, 7.6, 10.5, 15.6, 10.4, 5.1, 11, 9.7, 4.2, 3.6, 8.5, 11.5, 8.4, 6.9, 11, 10.4, 3.4, 3.2, 5.5, 2.4, 11.2, 2.6, 15.1, 16, 13.7, 10.5, 3.5, 13.4, 11.5, 12.3, 13.9, 14.5, 12.8, 16.8, 16.9, 13.5, 17.2, 12.5, 12.4, 11.8, 12, 10.9, 6.7, 10.9, 2.3, 5.2, 13.1, 12.1, 13.9, 12.9, 7.2, 12.5, 16, 11.7), Hemlock = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0), Alpine = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0), Snow = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("D", "Day", "NDVI", "VegS", "T", "Hemlock", "Alpine", "Snow"), row.names = as.integer(c(NA, 254)), class = "data.frame") #Running a regression. library(MASS) fit <- glm.nb(D ~ Day + NDVI + VegS + T + Hemlock + Alpine + Snow, data = data) summary(fit, correlation = FALSE) Call: glm.nb(formula = D ~ Day + NDVI + VegS + T + Hemlock + Alpine + Snow, data = data, init.theta = 11.3494468596771, link = log) Deviance Residuals: Min 1Q Median 3Q Max -3.7451 -0.7196 -0.1958 0.5389 2.7096 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.882684 0.929598 -3.101 0.001929 ** Day 0.020325 0.005540 3.669 0.000244 *** NDVI 1.353361 0.221471 6.111 9.91e-10 *** VegS 0.016731 0.004931 3.393 0.000691 *** T 0.074189 0.009491 7.817 5.42e-15 *** Hemlock -0.588858 0.174980 -3.365 0.000765 *** Alpine -0.452199 0.099296 -4.554 5.26e-06 *** Snow -1.902610 0.735708 -2.586 0.009707 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for Negative Binomial(11.3494) family taken to be 1) Null deviance: 515.16 on 253 degrees of freedom Residual deviance: 278.89 on 246 degrees of freedom AIC: 1300.1 Number of Fisher Scoring iterations: 1 Theta: 11.35 Std. Err.: 2.71 2 x log-likelihood: -1282.075 #Plotting observed versus predicted values. pdf(file="ObsVPred.pdf", width=4, height=4, family="Times", pointsize=11) par(mar = c(5,5,1,1), pch=1) plot(data$D, fit$fitted.values, main="", ylab=expression(italic(D)[predicted]), xlab=expression(italic(D)[observed])) abline(a=0,b=1, lty=2) lines(lowess(data$D, fit$fitted.values)) dev.off() #This appears to be a decent explanatory model, but as a predictive model it is systematically biased. It is biased high at low observed values of D and biased low at high values observed values of D.
ObsVPred.pdf
Description: Adobe PDF document
______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.