Volker, yes I have tried RT and RF with spatial data.
You may want to try n-fold cross validation with e.g. 9 folds, and
choose the folds such that they come from non-overlapping regions. That
might remove the optimism stemming from spatial correlation, at least to
some extent.
--
Edzer
Volker Bahn wrote:
Hi all,
I use regression trees (RT)(and also random forests RF) for fitting
models of species distributions based on characteristics of the
environment. I know that both, the abundance of the species at
different locations and also the environmental predictors are strongly
spatially autocorrelated. When the regression tree (I use rpart in R)
decides upon a split, it selects the one particular split among all
possible splits in all independent variables (e.g., July temperature >
25 C) that leads to the lowest sum of the variances of the dependent
variable in the two groups (ie, the variance of the dependent variable
is calculated within each resultant group and the sum of these two
variances is minimized by selecting the optimal split). However, the
estimation of these variances depends on the an estimation of the
degrees of freedom which is of course an over-estimate if
autocorrelation is present. My point is, that the whole splitting
process, and therefore also variable selection process, should suffer
severely from the spatial autocorrelation in the dependent variable.
Also, RT , are pruned according to an error estimate by
cross-validation, which should also suffer from autocorrelation. Given
that the information of the 10% omitted locations is not really
completely removed from the training set (it is still present in the
values of the correlated neighbors) the error estimates from CV should
be overly optimistic.
Has anyone dealt with this problem in this context or a different
context? Is there anyway to adjust the degrees of freedom and
therefore variance estimates? Would weighting the cases according to
neighborhood relationships help (which would be easy to implement in
rpart)?
Thanks,
Volker
+
+ To post a message to the list, send it to [email protected]
+ To unsubscribe, send email to majordomo@ jrc.it with no subject and
"unsubscribe ai-geostats" in the message body. DO NOT SEND
Subscribe/Unsubscribe requests to the list
+ As a general service to list users, please remember to post a
summary of any useful responses to your questions.
+ Support to the forum can be found at http://www.ai-geostats.org/
+
+ To post a message to the list, send it to [email protected]
+ To unsubscribe, send email to majordomo@ jrc.it with no subject and "unsubscribe
ai-geostats" in the message body. DO NOT SEND Subscribe/Unsubscribe requests to the
list
+ As a general service to list users, please remember to post a summary of any
useful responses to your questions.
+ Support to the forum can be found at http://www.ai-geostats.org/