To explore the idea of merging metron-api into metron-rest and running pcap
queries inside our REST application, I created a simple test here:
https://github.com/merrimanr/incubator-metron/tree/pcap-rest-test.  A
summary of what's included:

   - Added pcap as a dependency in the metron-rest pom.xml
   - Added a pcap query controller endpoint at
   http://node1:8082/swagger-ui.html#!/pcap-query-controller/queryUsingGET
   - Added a pcap query service that runs a simple, hardcoded query

Generate some pcap data using pycapa (
https://github.com/apache/metron/tree/master/metron-sensors/pycapa) and the
pcap topology (
https://github.com/apache/metron/tree/master/metron-platform/metron-pcap-backend#starting-the-topology).
After this initial setup there should be data in HDFS at
"/apps/metron/pcap".  I believe this should be enough to exercise the
issue.  Just hit the endpoint referenced above.  I tested this in an
already running full dev by building and deploying the metron-rest jar.  I
did not rebuild full dev with this change but I would still expect it to
work.  Let me know if it doesn't.

The first error I see when I hit this endpoint is:

java.lang.NoClassDefFoundError:
org/apache/hadoop/yarn/webapp/YarnJacksonJaxbJsonProvider.

Here are the things I've tried so far:

   - Run the REST application with the YARN jar command since this is how
   all our other YARN/MR-related applications are started (metron-api, MAAS,
   pcap query, etc).  I wouldn't expect this to work since we have runtime
   dependencies on our shaded elasticsearch and parser jars and I'm not aware
   of a way to add additional jars to the classpath with the YARN jar command
   (is there a way?).  Either way I get this error:

18/05/04 19:49:56 WARN reflections.Reflections: could not create Dir using
jarFile from url file:/usr/hdp/2.6.4.0-91/hadoop/lib/ojdbc6.jar. skipping.
java.lang.NullPointerException


   - I tried adding `yarn classpath` and `hadoop classpath` to the
   classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST start script).  I
   get this error:

java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.shaded.org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider


   - I searched for the class in the previous attempt but could not find it
   in full dev:

find / -name "*.jar" 2>/dev/null | xargs grep
org/apache/hadoop/hbase/shaded/org/codehaus/jackson/jaxrs/JacksonJaxbJsonProvider
2>/dev/null


   - Further up in the stack trace I see the error happens when initiating
   the org.apache.hadoop.yarn.util.timeline.TimelineUtils class.  I tried
   setting "yarn.timeline-service.enabled" in Ambari to false and then I get
   this error:

Unable to parse
'/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework' as a
URI, check the setting for mapreduce.application.framework.path


   - I've tried adding different hadoop, hbase, yarn and mapreduce Maven
   dependencies without any success
      - hadoop-yarn-client
      - hadoop-yarn-common
      - hadoop-mapreduce-client-core
      - hadoop-yarn-server-common
      - hadoop-yarn-api
      - hbase-server

I will keep exploring other possible solutions.  Let me know if anyone has
any ideas.

On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <ottobackwa...@gmail.com> wrote:

> I can imagine a new generic service(s) capability whose job ( pun intended
> ) is to
> abstract the submittal, tracking, and storage of results to yarn.
>
> It would be extended with storage providers, queue provider, possibly some
> set of policies or rather strategies.
>
> The pcap ‘report’ would be a client to that service, the specializes the
> service operation for the way we want pcap to work.
>
> We can then re-use the generic service for other long running yarn
> things…..
>
>
> On May 7, 2018 at 09:56:51, Otto Fowler (ottobackwa...@gmail.com) wrote:
>
> RE: Tracking v. users
>
> The submittal and tracking can associate the submitter with the yarn job
> and track that,
> regardless of the yarn credentials.
>
> IE> if all submittals and monitoring are by the same yarn user ( Metron )
> from a single or
> co-operative set of services, that service can maintain the mapping.
>
>
>
> On May 7, 2018 at 09:39:52, Ryan Merriman (merrim...@gmail.com) wrote:
>
> Otto, your use case makes sense to me. We'll have to think about how to
> manage the user to job relationships. I'm assuming YARN jobs will be
> submitted as the metron service user so YARN won't keep track of this for
> us. Is that assumption correct? Do you have any ideas for doing that?
>
> Mike, I can start a feature branch and experiment with merging metron-api
> into metron-rest. That should allow us to collaborate on any issues or
> challenges. Also, can you expand on your idea to manage external
> dependencies as a special module? That seems like a very attractive option
> to me.
>
> On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <ottobackwa...@gmail.com>
> wrote:
>
> > From my response on the other thread, but applicable to the backend
> stuff:
> >
> > "The PCAP Query seems more like PCAP Report to me. You are generating a
> > report based on parameters.
> > That report is something that takes some time and external process to
> > generate… ie you have to wait for it.
> >
> > I can almost imagine a flow where you:
> >
> > * Are in the AlertUI
> > * Ask to generate a PCAP report based on some selected alerts/meta-alert,
> > possibly picking from on or more report ‘templates’
> > that have query options etc
> > * The report request is ‘queued’, that is dispatched to be be
> > executed/generated
> > * You as a user have a ‘queue’ of your report results, and when the
> report
> > is done it is queued there
> > * We ‘monitor’ the report/queue press through the yarn rest ( report
> > info/meta has the yarn details )
> > * You can select the report from your queue and view it either in a new
> UI
> > or custom component
> > * You can then apply a different ‘view’ to the report or work with the
> > report data
> > * You can print / save etc
> > * You can associate the report with the alerts ( again in the report info
> > ) with…. a ‘case’ or ‘ticket’ or investigation something or other
> >
> >
> > We can introduce extensibility into the report templates, report views (
> > thinks that work with the json data of the report )
> >
> > Something like that.”
> >
> > Maybe we can do :
> >
> > template -> query parameters -> script => yarn info
> > yarn info + query info + alert context + yarn status => report info ->
> > stored in a user’s ‘report queue’
> > report persistence added to report info
> > metron-rest -> api to monitor the queue, read results ( page ), etc etc
> >
> >
> > On May 4, 2018 at 09:23:39, Ryan Merriman (merrim...@gmail.com) wrote:
> >
> > I started a separate thread on Pcap UI considerations and user
> > requirements
> > at Otto's request. This should help us keep these two related but
> separate
> > discussions focused.
> >
> > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <michelsum...@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > >
> > >
> > > (Youhouuu my first reply on this kind of mail chain^^)
> > >
> > >
> > >
> > > If I may, I would like to share my view on the following 3 points.
> > >
> > > - Backend:
> > >
> > > The current metron-api is totally seperate, it will be logic for me to
> > have
> > > it at the same place as the others rest api. Especially when more
> > security
> > > will be added, it will not be needed to do the job twice.
> > > The current implementation send back a pcap object which still need to
> > be
> > > decoded. In the opensoc, the decoding was done with tshard on the
> > frontend.
> > > It will be good to have this decoding happening directly on the backend
> > to
> > > not create a load on frontend. An option will be to install tshark on
> > the
> > > rest server and to use to convert the pcap to xml and then to a json
> > that
> > > will be send to the frontend.
> > >
> > > I tried to start directly the map/reduce job to search over all the
> pcap
> > > data from the rest server and as Ryan mention it, we had trouble. I
> will
> > > try to find back the error.
> > >
> > > Then in the POC, what we tried is to use the pcap_query script and this
> > > work fine. I just modified it that he sends back directly the job_id of
> > > yarn and not waiting that the job is finished. Then it will allow the
> UI
> > > and the rest server to know what the status of the research by querying
> > the
> > > yarn rest api. This will allow the UI and the rest server to be async
> > > without any blocking phase. What do you think about that?
> > >
> > >
> > >
> > > Having the job submitted directly from the code of the rest server will
> > be
> > > perfect, but it will need a lot of investigation I think (but I'm not
> > the
> > > expert so I might be completely wrong ^^).
> > >
> > > We know that the pcap_query scritp work fine so why not calling it? Is
> > it
> > > that bad? (maybe stupid question, but I really don’t see a lot of
> > drawback)
> > >
> > >
> > >
> > > - Front end:
> > >
> > > Adding the the pcap search to the alert UI is, I think, the easiest way
> > to
> > > move forward. But indeed, it will then be the “Alert UI and pcapquery”.
> > > Maybe the name of the UI should just change to something like
> > “Monitoring &
> > > Investigation UI” ?
> > >
> > >
> > >
> > > Is there any roadmap or plan for the different UI? I mean did you
> > already
> > > had discussion on how you see the ui evolving with the new feature that
> > > will come in the future?
> > >
> > >
> > >
> > > - Microservices:
> > >
> > >
> > >
> > > What do you mean exactly by microservices? Is it to separate all the
> > > features in different projects? Or something like having the different
> > > components in container like kubernet? (again maybe stupid question,
> but
> > I
> > > don’t clearly understand what you mean J )
> > >
> > >
> > >
> > > Michel
> > >
> >
> >
>
>

Reply via email to