|
This is a summary of my questions and of the
answers I collected. The issue is the development of a geostatistical model to
predict density of wildlife population interpolating density estimates collected
by mark-resight and radiotelemetry:
My question to Ai-Geostat 28/06/01:
1) I'haven't primary variables measures, I have
estimates. It was suggested me to use a variogram model to study spatial
dependence that could be different by zero for lag=0, as of course in such a
situation an exact interpolator could not be the best solution. But how could I
use such a model in a linear model of coregionalization? My covariates are
measured, not estimated.
Is it better to use my estimates as measures
(so using a classic variogram = 0 for lag=0) or to discard the linear model of
coregionalization, estimating my error variance by cross-validating
results?
Brian Gray 28/06/01
My only question (I'm not a cokriging expert) is regarding the
potential
for measurement error: if deer are continually on the move, then I wonder if you might, by force, end up with a substantial positioning error contribution to the nugget? cheers, Brian Gray Isobel Clark 28/06/01
You say that you primary variables are estimates. Is
there any way in which you can assess the reliability of the estimates. For example, do you have repeated estimates at the same locations? Or do you have estimates very close together. In practice, the nugget effect is a composite of all random-type errors including inherent variation in the variable being measured. One component of this nugget effect should be the variance between repeated estimates at the same location. If you can put a number to this, you can do your geostatistics as follows: a) model the semi-variogram as usual. b) Subtract the 'repeatability' variance from this nugget effect c) Carry out the kriging using the 'reduced' semi-variogram d) To your estimation variances, add twice the 'repeatability variance' In this way you will admit the original estimation error into the final assessment of your confidence for predictions without having to compromise your other variables. My question 29/06/01: My next question: these estimates don't share the same accuracy, there
are confidence intervals narrower or larger then others. So, what the variance
to account for in the semi-variogram? An average of my estimates variance?
The largest?
Isobel Clark 30/06/01
For the 'estimates variance', the classical way to
approach this is as follows (from traditional statistics, not geostatistics): For each sample location, calculate your average estimated density; For each repeat measurement within that sample location, calculate the difference between this measurement and square it; Repeat for all sample sites; Add up all sums of squares and divide by the original number of samples (all repeats) minus the number of sample locations. This is the best estimate for the 'within sample location' variance. This should be subtracted from the bugget effect and twice this value added back on to all kriging variances. If you want to are worried about whether the 'within sample location' variance is actually stationary over all sample locations, you should also calculate the variance of all the repeat measurements around the global average for all sample locations. That is, the ordinary statistical variance but including all of the original estimated densities before you averaged them by site. A standard F ratio test between the two variances will tell you if they are really different. See any basic statistics book or our Practical Geostatistics 2000, Chapter 5. My question 28/06/01 My density estimate is obtained by 'averaging' the position of deer
by radiotelemetry, i.e., given a population in a place, I put some boundaries on a map, and I count the fraction of positions of the radiotagged sample of the population that are inside these boundaries. My aim is to obtain in this way the 'average density' in that place. Do you think this procedure could avoid the problems that deer mobility can give to the reliability of the confidence interval? Brian Gray 28/06/01
Your approach should work fine provided that you
are interested in
averages rather than in individuals. And, since you are working with averages, your confidence interval should be narrower. If you get simultaneous locations on all deer, then you may have a different situation than if your locations arrive over time. In the latter case and if you work with individuals, any increase in your relative nugget may arise from location/positioning error My question 28/06/01
As I never listened about positioning error
contribution to the nugget, where could I
find some references? Brian Gray 28/06/01
The nugget may derive from measurement error,
positioning error or
from small scale variation--or a combination of the three. For example, I work with oyster infection rates--which are a function of oyster age. If I measure oysters that are infinitely close to one another, I am still not guaranteed that they will have the same infection level. Reason: different ages, different life histories, etc. This latter issue is a small scale issue. If I say oyster 1 is at location y and it's really at location y + 1, then I have a positional error. If I measure the infection as i but its really j then I have a measurement error. Chiles and Delfiner 1999, as I recall discuss these issues more. Donald E. Myers 28/06/01 Reference on nugget effect and positioning error,
see
"Geostatistics: modeling spatial uncertainty", J.-P. Chiles and P. Delfiner, J. Wiley and sons My question 28/06/01
2) the sill of my variograms are equal or
larger than primary variable variance (so, more or less twice the semivariance).
It is probably because of a trend in the density, that decreased with time. The
primary variable (deer density) is probably a second order stationary one, at
least for a much larger area than my study area, being the last surrounded
by many kilometers of deer suitable habitat. But it behaves like non
stationary in the few square kilometers of interest and in the few years of
sampling. May I ignore this problem or do I have to incorporate the
trend?
Isobel Clark 28/06/01
If you are seeing a sill, the problem is not
non-stationarity in the sense of a trend. If it is important at all, it is more likely to be caused by a discontinuity in the study area or a change in some characteristic of the habitat in the area. Trend shows as a rising parabola, not a sill. If the cross validation stage works, the height of the sill is not an important factor. Remember when carrying out the cross validation, you should be using the increased kriging variance as described above. My question 28/06/01
3) when I cross-validate my predictions
(obtained with linear model of coregionalization and ordinary cokriging) I
obtain enough good results. But I argue that perhaps they are even better than
it could seem. Not only because of problems of all cross-validations, but
because I have to compare my predictions not with actual measures, but with
estimates. Probably the average error is influenced by both uncertainties. Given
that I know the confidence intervals of my primary variable estimates, how could
I account for them to estimate correctly the average error of my prediction
model?
my doubt: imagine I collect some estimates of a
variable, and I have un uncertainty about them, say a 95% confidence
interval of 20, and I know this uncertainty. Now I develop a kriging model to
predict my variable, and I cross-validate it. Well, even if I had a perfect
model, completely precise and accurate, I should have a MAE of 10, more or less.
Do someone think this is correct? And, given that I know the uncertainty about
my estimates, is there a way to 'correct' the MAE of my cross-validation
accounting for it?
Isobel Clark 30/06/01
You need to keep your mind clear between your original
uncertainty in the estimates and the kriging error. There really is no reason why the kriging error should be less than your 'estimates' error. If fact, I would be surprised if this were so. You are trying to estimate the value at a location from other samples. This prediction error will be in addition to yoru sample value uncertainty and could be orders of magnitude higher. Thank you very much and hoping to listen you soon. All new ideas,
or remarks or criticism about what listed above will be welcome.
Daniele |
