Sweet! That's great news. The pom changes are a lot simpler than I expected. Very nice.
On Tue, May 8, 2018 at 4:35 PM, Ryan Merriman <merrim...@gmail.com> wrote: > Finally figured it out. Commit is here: > https://github.com/merrimanr/incubator-metron/commit/ > 22fe5e9ff3c167b42ebeb7a9f1000753a409aff1 > > It came down to figuring out the right combination of maven dependencies > and passing in the HDP version to REST as a Java system property. I also > included some HDFS setup tasks. I tested this in full dev and can now > successfully run a pcap query and get results. All you should have to do > is generate some pcap data first. > > On Tue, May 8, 2018 at 4:17 PM, Michael Miklavcic < > michael.miklav...@gmail.com> wrote: > > > @Ryan - pulled your branch and experimented with a few things. In doing > so, > > it dawned on me that by adding the yarn and hadoop classpath, you > probably > > didn't introduce a new classpath issue, rather you probably just moved > onto > > the next classpath issue, ie hbase per your exception about hbase jaxb. > > Anyhow, I put up a branch with some pom changes worth trying in > conjunction > > with invoking the rest app startup via "/usr/bin/yarn jar" > > > > https://github.com/mmiklavc/metron/tree/ryan-rest-test > > > > https://github.com/mmiklavc/metron/commit/5ca23580fc6e043fafae2327c80b65 > > b20ca1c0c9 > > > > Mike > > > > > > On Tue, May 8, 2018 at 7:44 AM, Simon Elliston Ball < > > si...@simonellistonball.com> wrote: > > > > > That would be a step closer to something more like a micro-service > > > architecture. However, I would want to make sure we think about the > > > operational complexity, and mpack implications of having another server > > > installed and running somewhere on the cluster (also, ssl, kerberos, > etc > > > etc requirements for that service). > > > > > > On 8 May 2018 at 14:27, Ryan Merriman <merrim...@gmail.com> wrote: > > > > > > > +1 to having metron-api as it's own service and using a gateway type > > > > pattern. > > > > > > > > On Tue, May 8, 2018 at 8:13 AM, Otto Fowler <ottobackwa...@gmail.com > > > > > > wrote: > > > > > > > > > Why not have metron-api as it’s own service and use a ‘gateway’ > type > > > > > pattern in rest? > > > > > > > > > > > > > > > On May 8, 2018 at 08:45:33, Ryan Merriman (merrim...@gmail.com) > > wrote: > > > > > > > > > > Moving the yarn classpath command earlier in the classpath now > gives > > > this > > > > > error: > > > > > > > > > > Caused by: java.lang.NoSuchMethodError: > > > > > javax.servlet.ServletContext.getVirtualServerName()Ljava/ > > lang/String; > > > > > > > > > > I will experiment with other combinations, I suspect we will need > > > > > finer-grain control over the order. > > > > > > > > > > The grep matches class names inside jar files. I use this all the > > time > > > > and > > > > > it's really useful. > > > > > > > > > > The metron-rest jar is already shaded. > > > > > > > > > > Reverse engineering the yarn jar command was the next thing I was > > going > > > > to > > > > > try. Will let you know how it goes. > > > > > > > > > > On Tue, May 8, 2018 at 12:36 AM, Michael Miklavcic < > > > > > michael.miklav...@gmail.com> wrote: > > > > > > > > > > > What order did you add the hadoop or yarn classpath? The "shaded" > > > > > package > > > > > > stands out to me in this name "org.apache.hadoop.hbase.*shaded* > > > > > > .org.codehaus.jackson.jaxrs.JacksonJaxbJsonProvider." Maybe try > > > adding > > > > > > those packages earlier on the classpath. > > > > > > > > > > > > I think that find command needs a "jar tvf", otherwise you're > > looking > > > > > for a > > > > > > class name in jar file names. > > > > > > > > > > > > Have you tried shading the rest jar? > > > > > > > > > > > > I'd also look at the classpath you get when running "yarn jar" to > > > start > > > > > the > > > > > > existing pcap service, per the instructions in > > metron-api/README.md. > > > > > > > > > > > > > > > > > > On Mon, May 7, 2018 at 3:28 PM, Ryan Merriman < > merrim...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > To explore the idea of merging metron-api into metron-rest and > > > > running > > > > > > pcap > > > > > > > queries inside our REST application, I created a simple test > > here: > > > > > > > https://github.com/merrimanr/incubator-metron/tree/pcap- > > rest-test. > > > A > > > > > > > summary of what's included: > > > > > > > > > > > > > > - Added pcap as a dependency in the metron-rest pom.xml > > > > > > > - Added a pcap query controller endpoint at > > > > > > > http://node1:8082/swagger-ui.html#!/pcap-query-controller/ > > > > > > queryUsingGET > > > > > > > - Added a pcap query service that runs a simple, hardcoded > query > > > > > > > > > > > > > > Generate some pcap data using pycapa ( > > > > > > > https://github.com/apache/metron/tree/master/metron- > > sensors/pycapa > > > ) > > > > > and > > > > > > > the > > > > > > > pcap topology ( > > > > > > > https://github.com/apache/metron/tree/master/metron- > > > > > > > platform/metron-pcap-backend#starting-the-topology). > > > > > > > After this initial setup there should be data in HDFS at > > > > > > > "/apps/metron/pcap". I believe this should be enough to > exercise > > > the > > > > > > > issue. Just hit the endpoint referenced above. I tested this in > > an > > > > > > > already running full dev by building and deploying the > > metron-rest > > > > > jar. > > > > > > I > > > > > > > did not rebuild full dev with this change but I would still > > expect > > > it > > > > > to > > > > > > > work. Let me know if it doesn't. > > > > > > > > > > > > > > The first error I see when I hit this endpoint is: > > > > > > > > > > > > > > java.lang.NoClassDefFoundError: > > > > > > > org/apache/hadoop/yarn/webapp/YarnJacksonJaxbJsonProvider. > > > > > > > > > > > > > > Here are the things I've tried so far: > > > > > > > > > > > > > > - Run the REST application with the YARN jar command since this > > is > > > > how > > > > > > > all our other YARN/MR-related applications are started > > (metron-api, > > > > > > > MAAS, > > > > > > > pcap query, etc). I wouldn't expect this to work since we have > > > > > > runtime > > > > > > > dependencies on our shaded elasticsearch and parser jars and > I'm > > > not > > > > > > > aware > > > > > > > of a way to add additional jars to the classpath with the YARN > > jar > > > > > > > command > > > > > > > (is there a way?). Either way I get this error: > > > > > > > > > > > > > > 18/05/04 19:49:56 WARN reflections.Reflections: could not > create > > > Dir > > > > > > using > > > > > > > jarFile from url file:/usr/hdp/2.6.4.0-91/ > hadoop/lib/ojdbc6.jar. > > > > > > skipping. > > > > > > > java.lang.NullPointerException > > > > > > > > > > > > > > > > > > > > > - I tried adding `yarn classpath` and `hadoop classpath` to the > > > > > > > classpath in /usr/metron/0.4.3/bin/metron-rest.sh (REST start > > > > > > > script). I > > > > > > > get this error: > > > > > > > > > > > > > > java.lang.ClassNotFoundException: > > > > > > > org.apache.hadoop.hbase.shaded.org.codehaus.jackson. > > > > > > > jaxrs.JacksonJaxbJsonProvider > > > > > > > > > > > > > > > > > > > > > - I searched for the class in the previous attempt but could > not > > > find > > > > > > it > > > > > > > in full dev: > > > > > > > > > > > > > > find / -name "*.jar" 2>/dev/null | xargs grep > > > > > > > org/apache/hadoop/hbase/shaded/org/codehaus/jackson/ > > > > > > > jaxrs/JacksonJaxbJsonProvider > > > > > > > 2>/dev/null > > > > > > > > > > > > > > > > > > > > > - Further up in the stack trace I see the error happens when > > > > > > initiating > > > > > > > the org.apache.hadoop.yarn.util.timeline.TimelineUtils class. > I > > > > > > tried > > > > > > > setting "yarn.timeline-service.enabled" in Ambari to false and > > > then > > > > I > > > > > > > get > > > > > > > this error: > > > > > > > > > > > > > > Unable to parse > > > > > > > '/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr- > > framework' > > > > as > > > > > a > > > > > > > URI, check the setting for mapreduce.application. > framework.path > > > > > > > > > > > > > > > > > > > > > - I've tried adding different hadoop, hbase, yarn and mapreduce > > > Maven > > > > > > > dependencies without any success > > > > > > > - hadoop-yarn-client > > > > > > > - hadoop-yarn-common > > > > > > > - hadoop-mapreduce-client-core > > > > > > > - hadoop-yarn-server-common > > > > > > > - hadoop-yarn-api > > > > > > > - hbase-server > > > > > > > > > > > > > > I will keep exploring other possible solutions. Let me know if > > > anyone > > > > > > has > > > > > > > any ideas. > > > > > > > > > > > > > > On Mon, May 7, 2018 at 9:02 AM, Otto Fowler < > > > ottobackwa...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > I can imagine a new generic service(s) capability whose job ( > > pun > > > > > > > intended > > > > > > > > ) is to > > > > > > > > abstract the submittal, tracking, and storage of results to > > yarn. > > > > > > > > > > > > > > > > It would be extended with storage providers, queue provider, > > > > > possibly > > > > > > > some > > > > > > > > set of policies or rather strategies. > > > > > > > > > > > > > > > > The pcap ‘report’ would be a client to that service, the > > > > specializes > > > > > > the > > > > > > > > service operation for the way we want pcap to work. > > > > > > > > > > > > > > > > We can then re-use the generic service for other long running > > > yarn > > > > > > > > things….. > > > > > > > > > > > > > > > > > > > > > > > > On May 7, 2018 at 09:56:51, Otto Fowler ( > > ottobackwa...@gmail.com > > > ) > > > > > > wrote: > > > > > > > > > > > > > > > > RE: Tracking v. users > > > > > > > > > > > > > > > > The submittal and tracking can associate the submitter with > the > > > > yarn > > > > > > job > > > > > > > > and track that, > > > > > > > > regardless of the yarn credentials. > > > > > > > > > > > > > > > > IE> if all submittals and monitoring are by the same yarn > user > > ( > > > > > > Metron ) > > > > > > > > from a single or > > > > > > > > co-operative set of services, that service can maintain the > > > > mapping. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On May 7, 2018 at 09:39:52, Ryan Merriman ( > merrim...@gmail.com > > ) > > > > > wrote: > > > > > > > > > > > > > > > > Otto, your use case makes sense to me. We'll have to think > > about > > > > how > > > > > to > > > > > > > > manage the user to job relationships. I'm assuming YARN jobs > > will > > > > be > > > > > > > > submitted as the metron service user so YARN won't keep track > > of > > > > > this > > > > > > for > > > > > > > > us. Is that assumption correct? Do you have any ideas for > doing > > > > > that? > > > > > > > > > > > > > > > > Mike, I can start a feature branch and experiment with > merging > > > > > > metron-api > > > > > > > > into metron-rest. That should allow us to collaborate on any > > > issues > > > > > or > > > > > > > > challenges. Also, can you expand on your idea to manage > > external > > > > > > > > dependencies as a special module? That seems like a very > > > attractive > > > > > > > option > > > > > > > > to me. > > > > > > > > > > > > > > > > On Fri, May 4, 2018 at 8:39 AM, Otto Fowler < > > > > ottobackwa...@gmail.com> > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > From my response on the other thread, but applicable to the > > > > > backend > > > > > > > > stuff: > > > > > > > > > > > > > > > > > > "The PCAP Query seems more like PCAP Report to me. You are > > > > > > generating a > > > > > > > > > report based on parameters. > > > > > > > > > That report is something that takes some time and external > > > > process > > > > > to > > > > > > > > > generate… ie you have to wait for it. > > > > > > > > > > > > > > > > > > I can almost imagine a flow where you: > > > > > > > > > > > > > > > > > > * Are in the AlertUI > > > > > > > > > * Ask to generate a PCAP report based on some selected > > > > > > > alerts/meta-alert, > > > > > > > > > possibly picking from on or more report ‘templates’ > > > > > > > > > that have query options etc > > > > > > > > > * The report request is ‘queued’, that is dispatched to be > be > > > > > > > > > executed/generated > > > > > > > > > * You as a user have a ‘queue’ of your report results, and > > when > > > > > the > > > > > > > > report > > > > > > > > > is done it is queued there > > > > > > > > > * We ‘monitor’ the report/queue press through the yarn > rest ( > > > > > report > > > > > > > > > info/meta has the yarn details ) > > > > > > > > > * You can select the report from your queue and view it > > either > > > in > > > > > a > > > > > > new > > > > > > > > UI > > > > > > > > > or custom component > > > > > > > > > * You can then apply a different ‘view’ to the report or > work > > > > with > > > > > > the > > > > > > > > > report data > > > > > > > > > * You can print / save etc > > > > > > > > > * You can associate the report with the alerts ( again in > the > > > > > report > > > > > > > info > > > > > > > > > ) with…. a ‘case’ or ‘ticket’ or investigation something or > > > other > > > > > > > > > > > > > > > > > > > > > > > > > > > We can introduce extensibility into the report templates, > > > report > > > > > > views > > > > > > > ( > > > > > > > > > thinks that work with the json data of the report ) > > > > > > > > > > > > > > > > > > Something like that.” > > > > > > > > > > > > > > > > > > Maybe we can do : > > > > > > > > > > > > > > > > > > template -> query parameters -> script => yarn info > > > > > > > > > yarn info + query info + alert context + yarn status => > > report > > > > > info > > > > > > -> > > > > > > > > > stored in a user’s ‘report queue’ > > > > > > > > > report persistence added to report info > > > > > > > > > metron-rest -> api to monitor the queue, read results ( > page > > ), > > > > > etc > > > > > > etc > > > > > > > > > > > > > > > > > > > > > > > > > > > On May 4, 2018 at 09:23:39, Ryan Merriman ( > > merrim...@gmail.com > > > ) > > > > > > wrote: > > > > > > > > > > > > > > > > > > I started a separate thread on Pcap UI considerations and > > user > > > > > > > > > requirements > > > > > > > > > at Otto's request. This should help us keep these two > related > > > but > > > > > > > > separate > > > > > > > > > discussions focused. > > > > > > > > > > > > > > > > > > On Fri, May 4, 2018 at 7:19 AM, Michel Sumbul < > > > > > > michelsum...@gmail.com> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (Youhouuu my first reply on this kind of mail chain^^) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If I may, I would like to share my view on the following > 3 > > > > > points. > > > > > > > > > > > > > > > > > > > > - Backend: > > > > > > > > > > > > > > > > > > > > The current metron-api is totally seperate, it will be > > logic > > > > for > > > > > me > > > > > > > to > > > > > > > > > have > > > > > > > > > > it at the same place as the others rest api. Especially > > when > > > > > more > > > > > > > > > security > > > > > > > > > > will be added, it will not be needed to do the job twice. > > > > > > > > > > The current implementation send back a pcap object which > > > still > > > > > need > > > > > > > to > > > > > > > > > be > > > > > > > > > > decoded. In the opensoc, the decoding was done with > tshard > > on > > > > > the > > > > > > > > > frontend. > > > > > > > > > > It will be good to have this decoding happening directly > on > > > the > > > > > > > backend > > > > > > > > > to > > > > > > > > > > not create a load on frontend. An option will be to > install > > > > > tshark > > > > > > on > > > > > > > > > the > > > > > > > > > > rest server and to use to convert the pcap to xml and > then > > > to a > > > > > > json > > > > > > > > > that > > > > > > > > > > will be send to the frontend. > > > > > > > > > > > > > > > > > > > > I tried to start directly the map/reduce job to search > over > > > all > > > > > the > > > > > > > > pcap > > > > > > > > > > data from the rest server and as Ryan mention it, we had > > > > > trouble. I > > > > > > > > will > > > > > > > > > > try to find back the error. > > > > > > > > > > > > > > > > > > > > Then in the POC, what we tried is to use the pcap_query > > > script > > > > > and > > > > > > > this > > > > > > > > > > work fine. I just modified it that he sends back directly > > the > > > > > > job_id > > > > > > > of > > > > > > > > > > yarn and not waiting that the job is finished. Then it > will > > > > > allow > > > > > > the > > > > > > > > UI > > > > > > > > > > and the rest server to know what the status of the > research > > > by > > > > > > > querying > > > > > > > > > the > > > > > > > > > > yarn rest api. This will allow the UI and the rest server > > to > > > be > > > > > > async > > > > > > > > > > without any blocking phase. What do you think about that? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Having the job submitted directly from the code of the > rest > > > > > server > > > > > > > will > > > > > > > > > be > > > > > > > > > > perfect, but it will need a lot of investigation I think > > (but > > > > > I'm > > > > > > not > > > > > > > > > the > > > > > > > > > > expert so I might be completely wrong ^^). > > > > > > > > > > > > > > > > > > > > We know that the pcap_query scritp work fine so why not > > > calling > > > > > it? > > > > > > > Is > > > > > > > > > it > > > > > > > > > > that bad? (maybe stupid question, but I really don’t see > a > > > lot > > > > > of > > > > > > > > > drawback) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Front end: > > > > > > > > > > > > > > > > > > > > Adding the the pcap search to the alert UI is, I think, > the > > > > > easiest > > > > > > > way > > > > > > > > > to > > > > > > > > > > move forward. But indeed, it will then be the “Alert UI > and > > > > > > > pcapquery”. > > > > > > > > > > Maybe the name of the UI should just change to something > > like > > > > > > > > > “Monitoring & > > > > > > > > > > Investigation UI” ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is there any roadmap or plan for the different UI? I mean > > did > > > > > you > > > > > > > > > already > > > > > > > > > > had discussion on how you see the ui evolving with the > new > > > > > feature > > > > > > > that > > > > > > > > > > will come in the future? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Microservices: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What do you mean exactly by microservices? Is it to > > separate > > > > all > > > > > > the > > > > > > > > > > features in different projects? Or something like having > > the > > > > > > > different > > > > > > > > > > components in container like kubernet? (again maybe > stupid > > > > > > question, > > > > > > > > but > > > > > > > > > I > > > > > > > > > > don’t clearly understand what you mean J ) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Michel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > -- > > > simon elliston ball > > > @sireb > > > > > >