To explore the idea of merging metron-api into metron-rest and running pcap queries inside our REST application, I created a simple test here: https://github.com/merrimanr/incubator-metron/tree/pcap-rest-test. A summary of what's included:
- Added pcap as a dependency in the metron-rest pom.xml - Added a pcap query controller endpoint at http://node1:8082/swagger-ui.html#!/pcap-query-controller/queryUsingGET - Added a pcap query service that runs a simple, hardcoded query Generate some pcap data using pycapa ( https://github.com/apache/metron/tree/master/metron-sensors/pycapa) and the pcap topology ( https://github.com/apache/metron/tree/master/metron-platform/metron-pcap-backend#starting-the-topology). After this initial setup there should be data in HDFS at "/apps/metron/pcap". I believe this should be enough to exercise the issue. Just hit the endpoint referenced above. I tested this in an already running full dev by building and deploying the metron-rest jar. I did not rebuild full dev with this change but I would still expect it to work. Let me know if it doesn't. The first error I see when I hit this endpoint is: java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/webapp/YarnJacksonJaxbJsonProvider. Here are the things I've tried so far: - Run the REST application with the YARN jar command since this is how all our other YARN/MR-related applications are started (metron-api, MAAS, pcap query, etc). I wouldn't expect this to work since we have runtime dependencies on our shaded elasticsearch and parser jars and I'm not aware of a way to add additional jars to the classpath with the YARN jar command (is there a way?). Either way I get this error: 18/05/04 19:49:56 WARN reflections.Reflections: could not create Dir using jarFile from url file:/usr/hdp/2.6.4.0-91/hadoop/lib/ojdbc6.jar. skipping. java.lang.NullPointerException - I tried adding `yarn classpath` and `hadoop classpath` to the classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST start script). I get this error: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.shaded.org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider - I searched for the class in the previous attempt but could not find it in full dev: find / -name "*.jar" 2>/dev/null | xargs grep org/apache/hadoop/hbase/shaded/org/codehaus/jackson/jaxrs/JacksonJaxbJsonProvider 2>/dev/null - Further up in the stack trace I see the error happens when initiating the org.apache.hadoop.yarn.util.timeline.TimelineUtils class. I tried setting "yarn.timeline-service.enabled" in Ambari to false and then I get this error: Unable to parse '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework' as a URI, check the setting for mapreduce.application.framework.path - I've tried adding different hadoop, hbase, yarn and mapreduce Maven dependencies without any success - hadoop-yarn-client - hadoop-yarn-common - hadoop-mapreduce-client-core - hadoop-yarn-server-common - hadoop-yarn-api - hbase-server I will keep exploring other possible solutions. Let me know if anyone has any ideas. On Mon, May 7, 2018 at 9:02 AM, Otto Fowler <ottobackwa...@gmail.com> wrote: > I can imagine a new generic service(s) capability whose job ( pun intended > ) is to > abstract the submittal, tracking, and storage of results to yarn. > > It would be extended with storage providers, queue provider, possibly some > set of policies or rather strategies. > > The pcap ‘report’ would be a client to that service, the specializes the > service operation for the way we want pcap to work. > > We can then re-use the generic service for other long running yarn > things….. > > > On May 7, 2018 at 09:56:51, Otto Fowler (ottobackwa...@gmail.com) wrote: > > RE: Tracking v. users > > The submittal and tracking can associate the submitter with the yarn job > and track that, > regardless of the yarn credentials. > > IE> if all submittals and monitoring are by the same yarn user ( Metron ) > from a single or > co-operative set of services, that service can maintain the mapping. > > > > On May 7, 2018 at 09:39:52, Ryan Merriman (merrim...@gmail.com) wrote: > > Otto, your use case makes sense to me. We'll have to think about how to > manage the user to job relationships. I'm assuming YARN jobs will be > submitted as the metron service user so YARN won't keep track of this for > us. Is that assumption correct? Do you have any ideas for doing that? > > Mike, I can start a feature branch and experiment with merging metron-api > into metron-rest. That should allow us to collaborate on any issues or > challenges. Also, can you expand on your idea to manage external > dependencies as a special module? That seems like a very attractive option > to me. > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler <ottobackwa...@gmail.com> > wrote: > > > From my response on the other thread, but applicable to the backend > stuff: > > > > "The PCAP Query seems more like PCAP Report to me. You are generating a > > report based on parameters. > > That report is something that takes some time and external process to > > generate… ie you have to wait for it. > > > > I can almost imagine a flow where you: > > > > * Are in the AlertUI > > * Ask to generate a PCAP report based on some selected alerts/meta-alert, > > possibly picking from on or more report ‘templates’ > > that have query options etc > > * The report request is ‘queued’, that is dispatched to be be > > executed/generated > > * You as a user have a ‘queue’ of your report results, and when the > report > > is done it is queued there > > * We ‘monitor’ the report/queue press through the yarn rest ( report > > info/meta has the yarn details ) > > * You can select the report from your queue and view it either in a new > UI > > or custom component > > * You can then apply a different ‘view’ to the report or work with the > > report data > > * You can print / save etc > > * You can associate the report with the alerts ( again in the report info > > ) with…. a ‘case’ or ‘ticket’ or investigation something or other > > > > > > We can introduce extensibility into the report templates, report views ( > > thinks that work with the json data of the report ) > > > > Something like that.” > > > > Maybe we can do : > > > > template -> query parameters -> script => yarn info > > yarn info + query info + alert context + yarn status => report info -> > > stored in a user’s ‘report queue’ > > report persistence added to report info > > metron-rest -> api to monitor the queue, read results ( page ), etc etc > > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman (merrim...@gmail.com) wrote: > > > > I started a separate thread on Pcap UI considerations and user > > requirements > > at Otto's request. This should help us keep these two related but > separate > > discussions focused. > > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul <michelsum...@gmail.com> > > wrote: > > > > > Hello, > > > > > > > > > > > > (Youhouuu my first reply on this kind of mail chain^^) > > > > > > > > > > > > If I may, I would like to share my view on the following 3 points. > > > > > > - Backend: > > > > > > The current metron-api is totally seperate, it will be logic for me to > > have > > > it at the same place as the others rest api. Especially when more > > security > > > will be added, it will not be needed to do the job twice. > > > The current implementation send back a pcap object which still need to > > be > > > decoded. In the opensoc, the decoding was done with tshard on the > > frontend. > > > It will be good to have this decoding happening directly on the backend > > to > > > not create a load on frontend. An option will be to install tshark on > > the > > > rest server and to use to convert the pcap to xml and then to a json > > that > > > will be send to the frontend. > > > > > > I tried to start directly the map/reduce job to search over all the > pcap > > > data from the rest server and as Ryan mention it, we had trouble. I > will > > > try to find back the error. > > > > > > Then in the POC, what we tried is to use the pcap_query script and this > > > work fine. I just modified it that he sends back directly the job_id of > > > yarn and not waiting that the job is finished. Then it will allow the > UI > > > and the rest server to know what the status of the research by querying > > the > > > yarn rest api. This will allow the UI and the rest server to be async > > > without any blocking phase. What do you think about that? > > > > > > > > > > > > Having the job submitted directly from the code of the rest server will > > be > > > perfect, but it will need a lot of investigation I think (but I'm not > > the > > > expert so I might be completely wrong ^^). > > > > > > We know that the pcap_query scritp work fine so why not calling it? Is > > it > > > that bad? (maybe stupid question, but I really don’t see a lot of > > drawback) > > > > > > > > > > > > - Front end: > > > > > > Adding the the pcap search to the alert UI is, I think, the easiest way > > to > > > move forward. But indeed, it will then be the “Alert UI and pcapquery”. > > > Maybe the name of the UI should just change to something like > > “Monitoring & > > > Investigation UI” ? > > > > > > > > > > > > Is there any roadmap or plan for the different UI? I mean did you > > already > > > had discussion on how you see the ui evolving with the new feature that > > > will come in the future? > > > > > > > > > > > > - Microservices: > > > > > > > > > > > > What do you mean exactly by microservices? Is it to separate all the > > > features in different projects? Or something like having the different > > > components in container like kubernet? (again maybe stupid question, > but > > I > > > don’t clearly understand what you mean J ) > > > > > > > > > > > > Michel > > > > > > > > >