Hi all,

I use regression trees (RT)(and also random forests RF) for fitting models of species distributions based on characteristics of the environment. I know that both, the abundance of the species at different locations and also the environmental predictors are strongly spatially autocorrelated. When the regression tree (I use rpart in R) decides upon a split, it selects the one particular split among all possible splits in all independent variables (e.g., July temperature > 25 C) that leads to the lowest sum of the variances of the dependent variable in the two groups (ie, the variance of the dependent variable is calculated within each resultant group and the sum of these two variances is minimized by selecting the optimal split). However, the estimation of these variances depends on the an estimation of the degrees of freedom which is of course an over-estimate if autocorrelation is present. My point is, that the whole splitting process, and therefore also variable selection process, should suffer severely from the spatial autocorrelation in the dependent variable. Also, RT , are pruned according to an error estimate by cross-validation, which should also suffer from autocorrelation. Given that the information of the 10% omitted locations is not really completely removed from the training set (it is still present in the values of the correlated neighbors) the error estimates from CV should be overly optimistic.

Has anyone dealt with this problem in this context or a different context? Is there anyway to adjust the degrees of freedom and therefore variance estimates? Would weighting the cases according to neighborhood relationships help (which would be easy to implement in rpart)?

Thanks,

Volker

+
+ To post a message to the list, send it to [email protected]
+ To unsubscribe, send email to majordomo@ jrc.it with no subject and "unsubscribe 
ai-geostats" in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the 
list
+ As a general service to list users, please remember to post a summary of any 
useful responses to your questions.
+ Support to the forum can be found at http://www.ai-geostats.org/

Reply via email to