Re: Question about the customization of Metron with my machine learining algo.

smlabs Wed, 07 Jun 2017 03:29:28 -0700

Hello Casey,

your explanations (and Matt's one with the other email) help me. By the way, if 
I could, I need more details.


The origins of my questions are both conceptual (Metron is absolutely a new 
tool for me) and practical (e.g., I didn't find any guideline that explain 
where ML model should be stored to run with Metron. In which folder I mean.)

* VM vs cluster:
o I pointed out the need to use a cluster. Which is the main reason? Is it 
linked to the performance? I mean the processing resources needed to run Metron?
o In this stage, in which I'm learning Metron making also some experiment I 
would install it into the VM.
+ Can I install Metron into Ubuntu 16.04 using vagrant? (implementation 
question)
* VM:
o Which Ubuntu do I have to use? 16.04?
o Which version of Metron do I have to use?
o Which version of NIFI do I have to install?
o Is there any additional tool that I do have to install?
* Model deployment:
o I would use NIFI as tool to get data from my network.
+ Is it any recommendation? (implementation question)
o Data collected with NIFI should be sent to Metron. Reading Metron 
architecture 
(https://cwiki.apache.org/confluence/display/METRON/Metron+Architecture), it 
seems possible.
+ I'm little confused about the data-flow at this point. You pointed out two 
caveats to parse data and then fill in the  ML model. Can you please explain me 
something more? (conceptual/implementation question).
o As my first test I would try a packet classifier ML model, but again after 
your two caveats in which you said that only the second method is support I 
don't understand if I could classify packet that comes from NIFI's probes.
+ Can you help me on this point?
* REST API (linked also to Matt's email):
o Ok, I'm not so string with this interface so my questions would be really 
basic.
+ Reading your example: 
https://gist.github.com/cestella/8dd83031b8898a732b6a5a60fce1b616 I understand 
that I should develop my ML model in python.
+ Can I reuse the file rest.py pointing to my new model?
* About the steps to follow. I copy your indications in RED  as follows and my 
question are in BLACK:
o Anyway, so for you to use your own ML model, you'd do the following:

1. Ingest the sensor data source that you want to ingest into a kafka topic --> 
Can I use NIFI? Is it a transparent process for me or is there some code to be 
write?
2. Create or reuse one of the existing parsers that we support to convert the 
data from your data source --> I do not understand. Do you refer to Stellar? I 
don't undrestand what Stellar it is.
3. Create your model (see 
https://gist.github.com/cestella/8dd83031b8898a732b6a5a60fce1b616 as an example)
4. refer to your model from stellar
1. In the example I mentioned, we're doing that at 
https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service#adjust-configurations-for-squid-to-call-model
2. You might consider doing it in the enrichment topology, but to get you 
started, doing it as a field trasnformation as in the example should suffice
* Dataworks summit:
o You said that your speech is public, didn't it?
+ Do you know if I could follow it offline from somewhere link?
* Blog:
o Which is the blog that you are referring to?

So, in summary I would test an ML network packets classification model. Most of 
my question are to understand where I should put my hands to have one VM that 
runs Metron.

In this stage, as newbe for Metron I would use Metron as a tool, focusing on 
the ML model in Python.

Thank you in advance for your useful answers.

Best Regards,
Simone

> Il 6 giugno 2017 alle 19.43 Casey Stella <[email protected]> ha scritto:
> 
>     So, first off, it's not a basic question at all and thanks for asking it. 
>  I'm sure if it's not clear to you, then it's not clear to many and bears 
> some reinforcement and clarification.
> 
>         * Metron does indeed enable the deployment and use of machine 
> learning models on data ingested into Metron
>         * Metron runs atop Hadoop (storm + kafka + hdfs + hbase), so you 
> likely wouldn't run this successfully on a VM, but rather a cluster.  We do 
> support running Metron for demonstration purposes and development purposes 
> inside a VM, but that's not a production configuration, I'd like to make 
> clear.
>     Models deployed via MaaS can be interacted with via Stellar on data 
> ingested into Metron under a couple caveats.  There are two ways to ingest 
> data into Metron:
>         * Via a packet capture sensor (fastcapa) to Kafka to the pcap storm 
> topology, which writes directly to HDFS with no preamble or enrichment
>         * Via another, lower velocity sensor (e.g. bro for deep packet 
> inspection or yaf for flow data) which is routed to a parser topology, then 
> to enrichment and finally to indexing
>     We do not, at present, support interacting with models (or, indeed, any 
> enrichment) on raw packet data (the first case above).  We do, however, 
> support it on the second usecase.  The example at 
> https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service#example
>  
> https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service#example
>     demonstrates ingesting web proxy data and using a dummy machine learning 
> model to pick out domains which are synthetic and likely to represent 
> communication to a botnet (the DGA model in that example is crude and could 
> easily be replaced with the example I posed earlier, btw).
> 
>     Anyway, so for you to use your own ML model, you'd do the following:
>        1. Ingest the sensor data source that you want to ingest into a kafka 
> topic
>        2. Create or reuse one of the existing parsers that we support to 
> convert the data from your data source
>        3. Create your model (see 
> https://gist.github.com/cestella/8dd83031b8898a732b6a5a60fce1b616 
> https://gist.github.com/cestella/8dd83031b8898a732b6a5a60fce1b616 as an 
> example)
>        4. refer to your model from stellar
>              1. In the example I mentioned, we're doing that at 
> https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service#adjust-configurations-for-squid-to-call-model
>              2. You might consider doing it in the enrichment topology, but 
> to get you started, doing it as a field trasnformation as in the example 
> should suffice
>     Hopefully that'll clear some things up.  I'm about to give a talk about 
> this next week at Dataworks summit, so I'll be sure to follow-up here with 
> the deck.  There's also a blog post that will eventually be going out with 
> this walked through more directly.
> 
>     If I missed osmething or if something isn't clear yet, I'll be sure to 
> keep at it. :)
> 
>     Best,
> 
>     Casey
> 
>     On Mon, Jun 5, 2017 at 1:21 PM, <[email protected] mailto:[email protected] 
> > wrote:
> 
>         > > 
> >         Hello Casey,
> > 
> >         your answer makes something more clear, but not at all.
> > 
> >         My question about ML models was because somewhere on the web I read 
> > that Metron comes with ML.
> >         But maybe it's better to say that it supports ML models.
> > 
> >         If I understood well, I can run Metron in a virtual machine 
> > connected to my network. With NIFI I can select the protocols/packets that 
> > I would store (similar as Wireshark does).
> > 
> >         Then, I do not understand how to fill the data in to the ML 
> > algorithm.
> > 
> >         Can you try to explain me something more, or indicate any tutorial 
> > that can explain the implementation process.
> > 
> >         For example if I have an SVM algo that I would test into Metron and 
> > that ML algortihm has been developed in python using scikit-py.
> > 
> >         How can I do that?
> > 
> >         Thank you and I'm sorry for the very basic question.
> > 
> >         Best Regards,
> > 
> >         Simone
> > 
> >             > > > 
> > >             Il 5 giugno 2017 alle 18.45 Casey Stella <[email protected] 
> > > mailto:[email protected] > ha scritto:
> > > 
> > >             We do not ship any ML models currently with metron, just the 
> > > infrastructure
> > >             to deploy your own models and interact with those models from 
> > > within
> > >             Metron. That being said, you might be interested in
> > >             
> > > https://gist.github.com/cestella/8dd83031b8898a732b6a5a60fce1b616 
> > > https://gist.github.com/cestella/8dd83031b8898a732b6a5a60fce1b616 That's
> > >             the code to take a DGA model written in scikit-learn from
> > >             
> > > https://github.com/ClickSecurity/data_hacking/tree/master/dga_detection 
> > > https://github.com/ClickSecurity/data_hacking/tree/master/dga_detection 
> > > and
> > >             suitable for deployment via MaaS.
> > > 
> > >             If you want more information about MaaS, I'll be giving a 
> > > talk on it next
> > >             week at DataWorks Summit and that deck will be public.
> > > 
> > >             On Mon, Jun 5, 2017 at 12:09 PM, <[email protected] 
> > > mailto:[email protected] > wrote:
> > > 
> > >                 > > > > 
> > > >                 Hello Simon,
> > > > 
> > > >                 thank you for your prompt replay and for the link as 
> > > > well.
> > > > 
> > > >                 I'm more confortable with Python.
> > > > 
> > > >                 May I ask you if there is any example in python that I 
> > > > use as template to
> > > >                 receive network packets and then implement the machine 
> > > > learning algorithm?
> > > > 
> > > >                 Moreover, where can I find documentation about the ML 
> > > > algorithm already
> > > >                 implemeneted into the Metron?
> > > > 
> > > >                 Best Regards,
> > > > 
> > > >                 Simone
> > > > 
> > > >                     > > > > > 
> > > > >                     Il 5 giugno 2017 alle 18.00 Simon Elliston Ball <
> > > > >                     [email protected] 
> > > > > mailto:[email protected] > ha scritto:
> > > > > 
> > > > >                     Hi Simone, and welcome to the community.
> > > > > 
> > > > >                     There are a number of extension points in Metron, 
> > > > > the key ones being
> > > > >                     around machine learning. I suggest taking a look 
> > > > > at
> > > > >                     
> > > > > https://github.com/apache/metron/tree/master/metron- 
> > > > > https://github.com/apache/metron/tree/master/metron-
> > > > >                     analytics/metron-maas-service for more 
> > > > > information about the model as a
> > > > >                     service. This is the bit that helps you add 
> > > > > models in pretty much any
> > > > >                     language that will run in a yarn container 
> > > > > (python, R and spark models are
> > > > >                     probably the most popular).
> > > > > 
> > > > >                     Hope that helps, and looking forward to hearing 
> > > > > more about your
> > > > >                     research, and any contributions you feel like 
> > > > > adding to the community.
> > > > > 
> > > > >                     Simon
> > > > > 
> > > > >                         > > > > > > 
> > > > > >                             > > > > > > > 
> > > > > > >                             On 5 Jun 2017, at 16:54, 
> > > > > > > [email protected] mailto:[email protected] mailto:
> > > > > > >                             [email protected] 
> > > > > > > mailto:[email protected] wrote:
> > > > > > > 
> > > > > > >                         > > > > > > 
> > > > > >                         Dear community,
> > > > > > 
> > > > > >                         my name is Simone and I'm researcher in the 
> > > > > > field of
> > > > > >                         cybersecurity.
> > > > > > 
> > > > > >                         I've just read about Apache Metron and I 
> > > > > > would ask:
> > > > > > 
> > > > > >                             * does it use machine learning or 
> > > > > > artificial intelligence?
> > > > > > 
> > > > > >                             * can I extend the machine learining 
> > > > > > algo already present into
> > > > > >                               the Metron with mines?
> > > > > > 
> > > > > >                             * which is the language that I have to 
> > > > > > use to extend Metron
> > > > > >                               with my algorithms?
> > > > > > 
> > > > > >                               Thank you.
> > > > > > 
> > > > > >                               Best Regards,
> > > > > > 
> > > > > >                               Simone
> > > > > > 
> > > > > >                               >
> > > > > > 
> > > > > >                     > > > > > 
> > > > >                 > > > > 
> > > >             > > > 
> > >         > > 
> >     > 
>

Re: Question about the customization of Metron with my machine learining algo.

Reply via email to