You might also look at "winsorizing" if you know what reasonable mins and
maxs are:

   winsorize
4 : 0
NB.* winsorize: cap high and low outliers of vec y at x (low, high)
level(s).
  'low high'=. (1 _1*0 1+<.0.5+x*#y){/:~y
  low>.high<.y
NB.EG 0.01 winsorize ?10000$1000001  NB. Top and bottom 1%
)
   'wins*' names 3
winsorize      winsorizeAt    winsorizeAtPct winsorizeAtVal
   winsorizeAt
4 : 0
   grd=. /:y
   y=. (y<:y{~x{grd) }y,:y{~x{grd
   y=. (y>:y{~(-x){grd) }y,:y{~(-x){grd
)
   winsorizeAtPct
4 : 0
   pcts=. <.0.5+x*#grd=. /:y
NB.   (y<.y{~grd{~-1>._1{pcts)>.y{~grd{~0>.<:0{pcts
   (y>.y{~grd{~0>.<:0{pcts)<.y{~grd{~-1>._1{pcts
NB.EG (0 0 2 3 5 6 6 8 17 17)-:0.2 0.2 winsorizeAtPct _9 0 2 3 5 6 6 8 17 93
NB.EG (0 0 2 3 5 6 6 8 8 8)-:0.2 0.3 winsorizeAtPct _9 0 2 3 5 6 6 8 17 93
NB.EG (2 2 2 3 4 5 6 7 8 9 9 9)-: 0.25 0.25 winsorizeAtPct i.12
)
    winsorizeAtVal
(] <: [: >./ [) >: [: <./ [

On Tue, Jan 10, 2012 at 10:14 AM, Romilly Cocking <[email protected]
> wrote:

> Agreed, you must understand the likely origins of outliers. One example -
> published share price time series may not take account of share
> splits/merges, which will cause the reported price to increase or decrease
> from the time of the event; and certain classes of investment may be
> re-denominated from pence to pounds, or from one currency to another. I
> know of a number of occasions when trading systems have been thrown off
> kilter by data scrubbing algorithms that discarded legitimate data as
> outliers.
>
> On 10 January 2012 14:38, Donna Y <[email protected]> wrote:
>
> > There are lots of possibletechniques for discarding outliers.  The
> > important thing is to know about the reason for their ocurence.  Is it
> > caused by some type of error in the generation or collection of the data
> or
> > is it actually important information.  You might rather concentrate on
> the
> > outliers exactly because they deviate from the norm and have the
> potential
> > of to effect quite different from normal behavior.
> >
> > Donna
> > [email protected]
> >
> >
> > On 2012-01-09, at 7:49 PM, Roger Hui <[email protected]> wrote:
> >
> > > I wonder if there are well-known techniques in statistics for dealing
> > with
> > > the following problem.
> > >
> > >      t
> > > 11 10 10 10 10 11 10 10 10 10 9 11 10 11 10 10 11 10 11 10 11 10 10
> > >      11 10 11 10 10 10 11 10 74 11 11 14 11 11 10 12 11 15 14 12 11
> > >      11 11 11 11 10 12 11 11 11 10 11 11 11 10 11 11 10 11 161241 49
> > >      32 12 11 11 12 10 11 10 12 11 12 11 11 12 11 11 12 11 11 11 12
> > >      11 11 12 11 11 11 11 11 11 11 10 11 11 12 12
> > >
> > > t is a set of samples from a noisy source which is supposed to give the
> > > same integer answer.  Obviously, 161241 is an "outlier", and it is
> likely
> > > that 74, 49, or even 32 are outliers too.  Are there standard
> techniques
> > > for discarding outliers to clean up the data, before the application of
> > > statistical tests such as the means test or large sample test?
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
>
>
>
> --
> http://tinyurl.com/rareblog
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>



-- 
Devon McCormick, CFA
^me^ at acm.
org is my
preferred e-mail
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to