We are limited by Yarn and MapReduce applications in the case of pause/resume - I could be wrong, but I don't think that's something that's supported unless you're talking about multiple MR jobs strung together.
https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#application I don't see anything suggesting "SUSPENDED" or "PAUSED" as we have available in workflow engines like Oozie. "The valid application state can be one of the following: ALL, NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED" Same goes for MR job commands: https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html#job Mike On Mon, May 7, 2018 at 2:04 PM, zeo...@gmail.com <zeo...@gmail.com> wrote: > From my perspective PCAP is primarily used as a follow-on to an alert or > meta-alert - people very rarely use PCAP for initial hunting. I know this > has been brought up by Otto, Mike, and Ryan across the two related threads > and I think it's all spot on. Going from an alert or meta-alert to pulling > PCAP would by far the primary use case for this in every SOC I've ever > worked in (not necessarily a representative sample). > > I also have some additional thoughts on the feature side after doing some > brainstorming and talking to two of the SOCs I work most with: > - Limit the size of the PCAP, not just the date range, and maybe even have > a configurable cluster-wide admin max for PCAP retrieval, set to 0/infinite > by default. > - Set priority of PCAP queries. Perhaps there's an automated > pcap retrieval 'just in case', which should have a lower priority than an > interactive request via the UI. > - Ability to pause/resume (not just cancel) jobs. > - Configurable cluster-wide admin max # of current PCAP queries, set to > 0/infinite by default. > - Ability to pull PCAP live off the wire and stream it into a file. > - Ability to filter PCAP by providing a BPF filter to apply in server-side > post-processing (less efficient, but very versatile). > - Request what PCAP data exists in the cluster (answering "how far back > can I go?") > - This is obvious and is probably assumed, but queries based on any set of > the network 5 tuple (IPs, Ports, Protocol) with at least 1 required. > > Jon > > On Fri, May 4, 2018 at 9:44 AM Otto Fowler <ottobackwa...@gmail.com> > wrote: > > > That is the ‘views’ part. > > > > We can have options on the data output, if you have output full data, > then > > we can have different views and interactions for inspection and level of > > detail. > > > > > > > > On May 4, 2018 at 09:37:13, Michel Sumbul (michelsum...@gmail.com) > wrote: > > > > It can be like a report but also to investigate some case where the user > > want to see the whole packet (all the bits and bytes). Like in wireshark, > > something interactive no? > > > > 2018-05-04 14:33 GMT+01:00 Otto Fowler <ottobackwa...@gmail.com>: > > > > > The PCAP Query seems more like PCAP Report to me. You are generating a > > > report based on parameters. > > > That report is something that takes some time and external process to > > > generate… ie you have to wait for it. > > > > > > I can almost imagine a flow where you: > > > > > > * Are in the AlertUI > > > * Ask to generate a PCAP report based on some selected > alerts/meta-alert, > > > possibly picking from on or more report ‘templates’ > > > that have query options etc > > > * The report request is ‘queued’, that is dispatched to be be > > > executed/generated > > > * You as a user have a ‘queue’ of your report results, and when the > > report > > > is done it is queued there > > > * We ‘monitor’ the report/queue press through the yarn rest ( report > > > info/meta has the yarn details ) > > > * You can select the report from your queue and view it either in a new > > UI > > > or custom component > > > * You can then apply a different ‘view’ to the report or work with the > > > report data > > > * You can print / save etc > > > * You can associate the report with the alerts ( again in the report > info > > ) > > > with…. a ‘case’ or ‘ticket’ or investigation something or other > > > > > > > > > We can introduce extensibility into the report templates, report views > ( > > > thinks that work with the json data of the report ) > > > > > > Something like that. > > > > > > > > > On May 4, 2018 at 09:19:15, Ryan Merriman (merrim...@gmail.com) wrote: > > > > > > Continuing a discussion that started in a discuss thread about exposing > > > Pcap query capabilities in the back end. How should we expose this > > feature > > > to users? Should it be integrated into the Alerts UI or be separate > > > standalone UI? > > > > > > To summarize the general points made in the other thread: > > > > > > - Adding this capability to the Alerts UI will make it more of a > > > composite app. Is that really what we want since we have separate UIs > for > > > Alerts and management? > > > - Would it be better to bring it in on it's own so it can be released > > > with qualifiers and tested with the right expectations without > affecting > > > the Alerts UI? > > > - There are some use cases that begin with an infosec analyst doing a > > > search on alerts > > > followed by them going to query pcap data corresponding to the > > > threats they're investigating. Would having these features in the same > > > UI streamline this process? > > > > > > There was also mention of some features we should consider: > > > > > > - Pcap queries should be made asynchronous via the UI > > > - Take care that a user doesn't hit refresh or POST multiple times and > > kick > > > off 50 mapreduce jobs > > > - Options for managing the YARN queue that is used > > > - Provide a "cancel" option that kills the MR job, or tell the user to > > > go to the CLI to kill their job > > > - Managing data if multiple users run queries > > > - Strategy for cleaning up jobs and implementing a TTL (I think this > one > > > will be tricky and definitely needs discussion) > > > - Date range or other query limits > > > > > > A couple other features I would add: > > > > > > - Ability to paginate through results > > > - Ability to download results through the UI > > > - Realtime status of a running job in the UI > > > > > > Let me know if I missed any points or did not correctly capture them > > > here. What > > > other points do we need to consider? What other features should be > > > required? Nice to have? > > > > > > -- > > Jon >