yes, Griffin team will make a doc about contributing points(interfaces) for measures.
Will let you know when it is ready. Thanks, William On Mon, May 14, 2018 at 5:13 PM, Enrico D'Urso <[email protected]> wrote: > Hi, > > Yes, it sounds a very good idea. I am pretty interested in the topic. > Is there an ongoing discussion that I can start to look at? > > Thanks, > > Enrico > > On 5/13/18, 2:58 AM, "William Guo" <[email protected]> wrote: > > hi Enrico, > > Yes, since we have released 0.2.0 recently. > > Our next plan will include enhance measures, including support anomaly > detection. > > > Would you like to contribute this feature together? > > > Thanks, > William > > On Sat, May 12, 2018 at 12:22 AM, Enrico D'Urso (JIRA) < > [email protected]> > wrote: > > > > > [ https://issues.apache.org/jira/browse/GRIFFIN-160?page= > > com.atlassian.jira.plugin.system.issuetabpanels:comment- > > tabpanel&focusedCommentId=16472199#comment-16472199 ] > > > > Enrico D'Urso commented on GRIFFIN-160: > > --------------------------------------- > > > > Hi, > > > > there are several ways to go for anomaly detection implementation. > > > > The point is to have numerical data. If you want to apply AD against > > non-numerical data you have to map string to number somehow. > > > > However, as Griffin uses Spark as the engine, I think K-Means can be > an > > option. > > > > Basically, you have your data: you normalise it, decide the number of > > clusters, apply K-means, finally check the distance from final > centroids to > > search for anomalies. MLlib fully supports it. > > > > Otherwise just get the mean and std and search for samples that are > 3sd+ > > far from the mean. > > > > More complicated stuff can be done using Covariance matrix and > Gaussian > > distribution, more info here [https://www.coursera.org/ > > learn/machine-learning/lecture/C8IJp/helpUrl] > > > > but am not sure if doable in a distributed environment. > > > > > > > > Thanks, > > > > Enrico > > > > > > > > > Anomaly detection for thousands of tables > > > ----------------------------------------- > > > > > > Key: GRIFFIN-160 > > > URL: https://issues.apache.org/ > jira/browse/GRIFFIN-160 > > > Project: Griffin (Incubating) > > > Issue Type: New Feature > > > Reporter: William Guo > > > Assignee: William Guo > > > Priority: Major > > > > > > Hi team, > > > > > > I am trying find the Griffin road map, and here it is [ > > https://cwiki.apache.org/confluence/display/GRIFFIN/0.+Roadmap], is > this > > the latest version? > > > > > > We have thousands of tables need to applied for data quality > validation, > > is there any simple machine learning algorithm can be applied to > detect the > > data quality issue instead of build a lot measures? Will this be > added in > > the Griffin road map if possible? > > > > > > Thanks, Randy > > > > > > > > > > > -- > > This message was sent by Atlassian JIRA > > (v7.6.3#76005) > > > > >
