[ 
https://issues.apache.org/jira/browse/METRON-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539183#comment-16539183
 ] 

ASF GitHub Bot commented on METRON-1364:
----------------------------------------

Github user JonZeolla commented on the issue:

    https://github.com/apache/metron/pull/870
  
    Happy to put some effort into running through this/testing if it's ready to 
go @cestella 


> Add an implementation of Robust PCA outlier detection
> -----------------------------------------------------
>
>                 Key: METRON-1364
>                 URL: https://issues.apache.org/jira/browse/METRON-1364
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Casey Stella
>            Assignee: Casey Stella
>            Priority: Major
>
> With short circuiting in Stellar, we have the opportunity to delve into more 
> computationally intensive outlier detection techniques.  Generally these 
> would be executed only if simpler outlier detection techniques indicated an 
> outlier (e.g. statistical outlier tests).
> As the first one of these supported, I'd suggest a Robust PCA based technique 
> similar to Netflix's Surus.  See 
> https://medium.com/netflix-techblog/rad-outlier-detection-on-big-data-d6b0494371cc
>  and 
> https://metamarkets.com/2012/algorithmic-trendspotting-the-meaning-of-interesting/
>  for more detail.
> It should be noted that there are some caveats with this approach around 
> sparsity and orderedness.  
> Regarding sparsity,this outlier detection algorithm presumes dense output, 
> which is not the case for data spanning profiles (e.g. the profiler does not 
> write out data every period if no data was seen). To deal with this, I am 
> suggesting a modification to the profiler to allow PROFILE_GET to return a 
> default value.  That will be done in a separate JIRA.
> Regarding well-orderedness, this is an outlier detector for time series data, 
> so it is sensitive to order to a certain extent.  Given its computational 
> intensity, it is likely to be used with a sample of the data to shrink the 
> size of the data.  To that end, uniform sampling is not sensible here, but 
> rather a biased sample for recency.  Without this, you may get poor results 
> from this outlier detector.  This sampler should be done in a separate JIRA, 
> but I will ensure the infrastructure to add it is contributed in METRON-1350.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to