Hi, Michael. Happy to help if time permits.
I looked at your CyberSquatting use case as it was something that was similar to work I've done before. Our approach was different, though. We calculate the distribution of character n-grams of our domain names and then find domain names with similar distributions using JensenShanning. It's more computationally expensive than the Bloom Filters you use but it doesn't depend on a prescribed list of similar names (what if somebody thinks up a new way of bastardizing our domain names?). We got good results with it although the model did require some tuning. Regards, Phillip On Fri, Jun 7, 2019 at 6:56 PM Michael Miklavcic < michael.miklav...@gmail.com> wrote: > Welcome Phillip! > > +1 to all of what Otto said. > > One area where we would like to expand, and your DS skills could be useful, > is in some of the analytics use cases. For example, I'm currently working > on finishing up a PR for a data sketch function that can estimate top-k > results in an infinite streaming data set. We've got functions around > things like median absolute deviation, and I wrote a wrapper for a hyper > log log plus implementation. It would be awesome to get some contributions > to our use cases that leverage some of our existing analytics functions as > well as new implementations. Here's a forensic clustering use case, for > example - > https://github.com/apache/metron/tree/master/use-cases/forensic_clustering > . > This is a bit more advanced for someone new to the project, but I'm just > sharing opportunities and a roadmap of things that would be extremely > beneficial. > > Best, > Mike Miklavcic > > On Fri, Jun 7, 2019 at 11:14 AM Otto Fowler <ottobackwa...@gmail.com> > wrote: > > > Hi Phillip and welcome! > > > > You certainly can help. > > > > As far as the status, there are several ways to look at it aren’t there? > > > > - we have active committers and reviewers, but not as many as we would > like > > - we have an active users like, but not insanely so > > - metron _is_ part of a large bistro package HCP / cloudera ? and > deployed, > > supported and used there > > > > > > A lot of work is going on right now in the following areas > > > > - refactoring the UI technologies and testability > > - making the build better, speed, shading etc > > - de-coupling storm from our core technologies so we can introduce or > > support alternative processing ( spark for example ) > > - work on our deployment / dev experience ( vagrant/docker ) > > > > and other things I’ve not thought of, and have PR’s etc. > > > > we can always use help with documentation, reviews, testing, how-to’s or > > development :) > > > > > > > > On June 7, 2019 at 12:40:38, Phillip Henry (londonjava...@gmail.com) > > wrote: > > > > Hello, all. > > > > The Metron docs mentioned something about introducing oneself upon > joining > > the mailing list so here goes. I'm a Scala/Java/Data Engineer/Scientist > > working in a SOC at a leading telco. > > > > I've only recently come across Metron and was wondering what the > community > > was like, how healthy the project is (I see commits in the last few days > so > > that's good) and wondering if I can help. > > > > Regards, > > > > Phillip > > >