Hello Mike,
> The distances in each group vary but are "grouped".
> I wish to identify any records that are likely to be errors, the
> criteria for an error is that they form outliers in this group of
> values.
The complexity of your solution will depend on how critical you want your "cut"
between good and
bad data to be. Your "toolbox" for working out a suitable model for your application
could include,
minimum, maximum, mean, median and standard deviation. If the division between good
and bad
is slight or ambiguous, you may need more sophisticated statistical methods.
For the sample you gave {25, 34, 56, 57, 59, 60, 62, 1189, 1190 }, a very simple
solution would be
to use the mean of maximum and minimum. In this case this would give 607.5. This cut
figure
isolates the top two samples, which are presumably the duff ones. That's a really
simple method...
you may find you need much more to handle the errors in other data sets. Also note
that no
matter how good the model for the analysis is, it may get it wrong in some
circumstances!
Regards,
Warren Vick
Europa Technologies Ltd, U.K.
www.europa-tech.com
----------------------------------------------------------------------
To unsubscribe from this list, send e-mail to [EMAIL PROTECTED] and put
"unsubscribe MAPINFO-L" in the message body, or contact [EMAIL PROTECTED]