I had a feeling it may be that way. Unless anyone else knows of a better approach, it's probably most reasonable to push that into a follow-on JIRA and not over-complicate the current activities.
Jon On Wed, May 9, 2018 at 2:33 PM Michael Miklavcic < michael.miklav...@gmail.com> wrote: > We are limited by Yarn and MapReduce applications in the case of > pause/resume - I could be wrong, but I don't think that's something that's > supported unless you're talking about multiple MR jobs strung together. > > > https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#application > > I don't see anything suggesting "SUSPENDED" or "PAUSED" as we have > available in workflow engines like Oozie. > > "The valid application state can be one of the following: ALL, NEW, > NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED" > > Same goes for MR job commands: > > https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html#job > > Mike > > On Mon, May 7, 2018 at 2:04 PM, zeo...@gmail.com <zeo...@gmail.com> wrote: > > > From my perspective PCAP is primarily used as a follow-on to an alert or > > meta-alert - people very rarely use PCAP for initial hunting. I know > this > > has been brought up by Otto, Mike, and Ryan across the two related > threads > > and I think it's all spot on. Going from an alert or meta-alert to > pulling > > PCAP would by far the primary use case for this in every SOC I've ever > > worked in (not necessarily a representative sample). > > > > I also have some additional thoughts on the feature side after doing some > > brainstorming and talking to two of the SOCs I work most with: > > - Limit the size of the PCAP, not just the date range, and maybe even > have > > a configurable cluster-wide admin max for PCAP retrieval, set to > 0/infinite > > by default. > > - Set priority of PCAP queries. Perhaps there's an automated > > pcap retrieval 'just in case', which should have a lower priority than an > > interactive request via the UI. > > - Ability to pause/resume (not just cancel) jobs. > > - Configurable cluster-wide admin max # of current PCAP queries, set to > > 0/infinite by default. > > - Ability to pull PCAP live off the wire and stream it into a file. > > - Ability to filter PCAP by providing a BPF filter to apply in > server-side > > post-processing (less efficient, but very versatile). > > - Request what PCAP data exists in the cluster (answering "how far back > > can I go?") > > - This is obvious and is probably assumed, but queries based on any set > of > > the network 5 tuple (IPs, Ports, Protocol) with at least 1 required. > > > > Jon > > > > On Fri, May 4, 2018 at 9:44 AM Otto Fowler <ottobackwa...@gmail.com> > > wrote: > > > > > That is the ‘views’ part. > > > > > > We can have options on the data output, if you have output full data, > > then > > > we can have different views and interactions for inspection and level > of > > > detail. > > > > > > > > > > > > On May 4, 2018 at 09:37:13, Michel Sumbul (michelsum...@gmail.com) > > wrote: > > > > > > It can be like a report but also to investigate some case where the > user > > > want to see the whole packet (all the bits and bytes). Like in > wireshark, > > > something interactive no? > > > > > > 2018-05-04 14:33 GMT+01:00 Otto Fowler <ottobackwa...@gmail.com>: > > > > > > > The PCAP Query seems more like PCAP Report to me. You are generating > a > > > > report based on parameters. > > > > That report is something that takes some time and external process to > > > > generate… ie you have to wait for it. > > > > > > > > I can almost imagine a flow where you: > > > > > > > > * Are in the AlertUI > > > > * Ask to generate a PCAP report based on some selected > > alerts/meta-alert, > > > > possibly picking from on or more report ‘templates’ > > > > that have query options etc > > > > * The report request is ‘queued’, that is dispatched to be be > > > > executed/generated > > > > * You as a user have a ‘queue’ of your report results, and when the > > > report > > > > is done it is queued there > > > > * We ‘monitor’ the report/queue press through the yarn rest ( report > > > > info/meta has the yarn details ) > > > > * You can select the report from your queue and view it either in a > new > > > UI > > > > or custom component > > > > * You can then apply a different ‘view’ to the report or work with > the > > > > report data > > > > * You can print / save etc > > > > * You can associate the report with the alerts ( again in the report > > info > > > ) > > > > with…. a ‘case’ or ‘ticket’ or investigation something or other > > > > > > > > > > > > We can introduce extensibility into the report templates, report > views > > ( > > > > thinks that work with the json data of the report ) > > > > > > > > Something like that. > > > > > > > > > > > > On May 4, 2018 at 09:19:15, Ryan Merriman (merrim...@gmail.com) > wrote: > > > > > > > > Continuing a discussion that started in a discuss thread about > exposing > > > > Pcap query capabilities in the back end. How should we expose this > > > feature > > > > to users? Should it be integrated into the Alerts UI or be separate > > > > standalone UI? > > > > > > > > To summarize the general points made in the other thread: > > > > > > > > - Adding this capability to the Alerts UI will make it more of a > > > > composite app. Is that really what we want since we have separate UIs > > for > > > > Alerts and management? > > > > - Would it be better to bring it in on it's own so it can be released > > > > with qualifiers and tested with the right expectations without > > affecting > > > > the Alerts UI? > > > > - There are some use cases that begin with an infosec analyst doing a > > > > search on alerts > > > > followed by them going to query pcap data corresponding to the > > > > threats they're investigating. Would having these features in the > same > > > > UI streamline this process? > > > > > > > > There was also mention of some features we should consider: > > > > > > > > - Pcap queries should be made asynchronous via the UI > > > > - Take care that a user doesn't hit refresh or POST multiple times > and > > > kick > > > > off 50 mapreduce jobs > > > > - Options for managing the YARN queue that is used > > > > - Provide a "cancel" option that kills the MR job, or tell the user > to > > > > go to the CLI to kill their job > > > > - Managing data if multiple users run queries > > > > - Strategy for cleaning up jobs and implementing a TTL (I think this > > one > > > > will be tricky and definitely needs discussion) > > > > - Date range or other query limits > > > > > > > > A couple other features I would add: > > > > > > > > - Ability to paginate through results > > > > - Ability to download results through the UI > > > > - Realtime status of a running job in the UI > > > > > > > > Let me know if I missed any points or did not correctly capture them > > > > here. What > > > > other points do we need to consider? What other features should be > > > > required? Nice to have? > > > > > > > > > -- > > > > Jon > > > -- Jon