[
https://issues.apache.org/jira/browse/METRON-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539183#comment-16539183
]
ASF GitHub Bot commented on METRON-1364:
----------------------------------------
Github user JonZeolla commented on the issue:
https://github.com/apache/metron/pull/870
Happy to put some effort into running through this/testing if it's ready to
go @cestella
> Add an implementation of Robust PCA outlier detection
> -----------------------------------------------------
>
> Key: METRON-1364
> URL: https://issues.apache.org/jira/browse/METRON-1364
> Project: Metron
> Issue Type: Improvement
> Reporter: Casey Stella
> Assignee: Casey Stella
> Priority: Major
>
> With short circuiting in Stellar, we have the opportunity to delve into more
> computationally intensive outlier detection techniques. Generally these
> would be executed only if simpler outlier detection techniques indicated an
> outlier (e.g. statistical outlier tests).
> As the first one of these supported, I'd suggest a Robust PCA based technique
> similar to Netflix's Surus. See
> https://medium.com/netflix-techblog/rad-outlier-detection-on-big-data-d6b0494371cc
> and
> https://metamarkets.com/2012/algorithmic-trendspotting-the-meaning-of-interesting/
> for more detail.
> It should be noted that there are some caveats with this approach around
> sparsity and orderedness.
> Regarding sparsity,this outlier detection algorithm presumes dense output,
> which is not the case for data spanning profiles (e.g. the profiler does not
> write out data every period if no data was seen). To deal with this, I am
> suggesting a modification to the profiler to allow PROFILE_GET to return a
> default value. That will be done in a separate JIRA.
> Regarding well-orderedness, this is an outlier detector for time series data,
> so it is sensitive to order to a certain extent. Given its computational
> intensity, it is likely to be used with a sample of the data to shrink the
> size of the data. To that end, uniform sampling is not sensible here, but
> rather a biased sample for recency. Without this, you may get poor results
> from this outlier detector. This sampler should be done in a separate JIRA,
> but I will ensure the infrastructure to add it is contributed in METRON-1350.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)