Re: MI:Identifying Outliners

Warren Vick, Europa Technologies Ltd. Fri, 8 Jan 1999 09:30:53 -0500
Hello Mike,

> The distances in each group vary but are "grouped".
> I wish to identify any records that are likely to be errors, the
> criteria for an error is that they form outliers in this group of
> values. 

The complexity of your solution will depend on how critical you want your "cut" 
between good and 
bad data to be. Your "toolbox" for working out a suitable model for your application 
could include, 
minimum, maximum, mean, median and standard deviation. If the division between good 
and bad 
is slight or ambiguous, you may need more sophisticated statistical methods.

For the sample you gave {25, 34, 56, 57, 59, 60, 62, 1189, 1190 }, a very simple 
solution would be 
to use the mean of maximum and minimum. In this case this would give 607.5. This cut 
figure 
isolates the top two samples, which are presumably the duff ones. That's a really 
simple method... 
you may find you need much more to handle the errors in other data sets.  Also note 
that no 
matter how good the model for the analysis is, it may get it wrong in some 
circumstances!

Regards,
Warren Vick
Europa Technologies Ltd, U.K.
www.europa-tech.com
----------------------------------------------------------------------
To unsubscribe from this list, send e-mail to [EMAIL PROTECTED] and put
"unsubscribe MAPINFO-L" in the message body, or contact [EMAIL PROTECTED]
Re: MI:Identifying Outliners

Reply via email to