This is a summary of my questions and of the answers I collected. The issue is the development of a geostatistical model to predict density of wildlife population interpolating density estimates collected by mark-resight and radiotelemetry:
 
My question to Ai-Geostat 28/06/01:
 
1) I'haven't primary variables measures, I have estimates. It was suggested me to use a variogram model to study spatial dependence that could be different by zero for lag=0, as of course in such a situation an exact interpolator could not be the best solution. But how could I use such a model in a linear model of coregionalization? My covariates are measured, not estimated.
Is it better to use my estimates as measures (so using a classic variogram = 0 for lag=0) or to discard the linear model of coregionalization, estimating my error variance by cross-validating results?
 
Brian Gray  28/06/01
 
My only question (I'm not a cokriging expert) is regarding the potential
for measurement error:  if deer are continually on the move, then I
wonder if you might, by force, end up with a substantial positioning
error contribution to the nugget?  cheers, Brian Gray
Isobel Clark 28/06/01
 
You say that you primary variables are estimates. Is
there any way in which you can assess the reliability
of the estimates. For example, do you have repeated
estimates at the same locations? Or do you have
estimates very close together. In practice, the nugget
effect is a composite of all random-type errors
including inherent variation in the variable being
measured. One component of this nugget effect should
be the variance between repeated estimates at the same
location. If you can put a number to this, you can do
your geostatistics as follows:

a) model the semi-variogram as usual.
b) Subtract the 'repeatability' variance from this
nugget effect
c) Carry out the kriging using the 'reduced'
semi-variogram
d) To your estimation variances, add twice the
'repeatability variance'

In this way you will admit the original estimation
error into the final assessment of your confidence for
predictions without having to compromise your other
variables.

My question    29/06/01:
 
My next question: these estimates don't share the same accuracy, there are confidence intervals narrower or larger then others. So, what the variance to account for in the semi-variogram? An average of my estimates variance? The largest?
 
Isobel Clark  30/06/01
 
For the 'estimates variance', the classical way to
approach this is as follows (from traditional
statistics, not geostatistics):

For each sample location, calculate your average
estimated density;
For each repeat measurement within that sample
location, calculate the difference between this
measurement and square it;
Repeat for all sample sites;
Add up all sums of squares and divide by the original
number of samples (all repeats) minus the number of
sample locations.

This is the best estimate for the 'within sample
location'  variance. This should be subtracted from
the bugget effect and twice this value added back on
to all kriging variances.

If you want to are worried about whether the 'within
sample location' variance is actually stationary over
all sample locations, you should also calculate the
variance of all the repeat measurements around the
global average for all sample locations. That is, the
ordinary statistical variance but including all of the
original estimated densities before you averaged them
by site.

A standard F ratio test between the two variances will
tell you if they are really different. See any basic
statistics book or our Practical Geostatistics 2000,
Chapter 5.



My question 28/06/01
 
My density estimate is obtained by 'averaging' the position of deer by
radiotelemetry, i.e., given a population in a place, I put some boundaries
on a map, and I count the fraction of positions of the radiotagged sample of
the population that are inside these boundaries. My aim is to obtain in this
way the 'average density' in that place.
Do you think this procedure could avoid the problems that deer mobility can
give to the reliability of the confidence interval?
 
 
Brian Gray 28/06/01
 
Your approach should work fine provided that you are interested in
averages rather than in individuals.  And, since you are working with
averages, your confidence interval should be narrower.  If you get
simultaneous locations on all deer, then you may have a different
situation than if your locations arrive over time.  In the latter
case and if you work with individuals, any increase in your relative
nugget may arise from location/positioning error
 
My question 28/06/01
 
As I never listened about positioning error contribution to the nugget, where could I
find some references?
Brian Gray 28/06/01
 
The nugget may derive from measurement error, positioning error or
from small scale variation--or a combination of the three.  For
example, I work with oyster infection rates--which are a function of
oyster age.  If I measure oysters that are infinitely close to one
another, I am still not guaranteed that they will have the same
infection level.  Reason:  different ages, different life histories,
etc.  This latter issue is a small scale issue.  If I say oyster 1 is
at location y and it's really at location y + 1, then I have a
positional error.  If I measure the infection as i but its really j
then I have a measurement error.  Chiles and Delfiner 1999, as I
recall discuss these issues more. 

Donald E. Myers 28/06/01
 
Reference on nugget effect and positioning error, see

"Geostatistics: modeling spatial uncertainty",  J.-P. Chiles and P. Delfiner,  J. Wiley and sons
 
 
My question 28/06/01
 
2) the sill of my variograms are equal or larger than primary variable variance (so, more or less twice the semivariance). It is probably because of a trend in the density, that decreased with time. The primary variable (deer density) is probably a second order stationary one, at least for a much larger area than my study area, being the last surrounded by  many kilometers of deer suitable habitat. But it behaves like non stationary in the few square kilometers of interest and in the few years of sampling. May I ignore this problem or do I have to incorporate the trend?
 
Isobel Clark 28/06/01
 
If you are seeing a sill, the problem is not
non-stationarity in the sense of a trend. If it is
important at all, it is more likely to be caused by a
discontinuity in the study area or a change in some
characteristic of the habitat in the area. Trend shows
as a rising  parabola, not a sill. If the cross
validation stage works, the height of the sill is not
an important factor. Remember when carrying out the
cross validation, you should be using the increased
kriging variance as described above.
 
My question 28/06/01
 
3) when I cross-validate my predictions (obtained with linear model of coregionalization and ordinary cokriging) I obtain enough good results. But I argue that perhaps they are even better than it could seem. Not only because of problems of all cross-validations, but because I have to compare my predictions not with actual measures, but with estimates. Probably the average error is influenced by both uncertainties. Given that I know the confidence intervals of my primary variable estimates, how could I account for them to estimate correctly the average error of my prediction model?
my doubt: imagine I collect some estimates of a variable, and I have un uncertainty about them, say a 95% confidence interval of 20, and I know this uncertainty. Now I develop a kriging model to predict my variable, and I cross-validate it. Well, even if I had a perfect model, completely precise and accurate, I should have a MAE of 10, more or less. Do someone think this is correct? And, given that I know the uncertainty about my estimates, is there a way to 'correct' the MAE of my cross-validation accounting for it?
 
Isobel Clark 30/06/01
 
You need to keep your mind clear between your original
uncertainty in the estimates and the kriging error.
There really is no reason why the kriging error should
be less than your 'estimates' error. If fact, I would
be surprised if this were so. You are trying to
estimate the value at a location from other samples.
This prediction error will be in addition to yoru
sample value uncertainty and could be orders of
magnitude higher.
 
 
Thank you very much and hoping to listen you soon. All new ideas, or remarks or criticism about what listed above will be welcome.
 
Daniele




Reply via email to