Hi all,
I use regression trees (RT)(and also random forests RF) for fitting
models of species distributions based on characteristics of the
environment. I know that both, the abundance of the species at different
locations and also the environmental predictors are strongly spatially
autocorrelated. When the regression tree (I use rpart in R) decides upon
a split, it selects the one particular split among all possible splits
in all independent variables (e.g., July temperature > 25 C) that leads
to the lowest sum of the variances of the dependent variable in the two
groups (ie, the variance of the dependent variable is calculated within
each resultant group and the sum of these two variances is minimized by
selecting the optimal split). However, the estimation of these variances
depends on the an estimation of the degrees of freedom which is of
course an over-estimate if autocorrelation is present. My point is, that
the whole splitting process, and therefore also variable selection
process, should suffer severely from the spatial autocorrelation in the
dependent variable. Also, RT , are pruned according to an error estimate
by cross-validation, which should also suffer from autocorrelation.
Given that the information of the 10% omitted locations is not really
completely removed from the training set (it is still present in the
values of the correlated neighbors) the error estimates from CV should
be overly optimistic.
Has anyone dealt with this problem in this context or a different
context? Is there anyway to adjust the degrees of freedom and therefore
variance estimates? Would weighting the cases according to neighborhood
relationships help (which would be easy to implement in rpart)?
Thanks,
Volker
+
+ To post a message to the list, send it to [email protected]
+ To unsubscribe, send email to majordomo@ jrc.it with no subject and "unsubscribe
ai-geostats" in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the
list
+ As a general service to list users, please remember to post a summary of any
useful responses to your questions.
+ Support to the forum can be found at http://www.ai-geostats.org/