We are limited by Yarn and MapReduce applications in the case of
pause/resume - I could be wrong, but I don't think that's something that's
supported unless you're talking about multiple MR jobs strung together.

https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#application

I don't see anything suggesting "SUSPENDED" or "PAUSED" as we have
available in workflow engines like Oozie.

"The valid application state can be one of the following:  ALL, NEW,
NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED"

Same goes for MR job commands:
https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html#job

Mike

On Mon, May 7, 2018 at 2:04 PM, zeo...@gmail.com <zeo...@gmail.com> wrote:

> From my perspective PCAP is primarily used as a follow-on to an alert or
> meta-alert - people very rarely use PCAP for initial hunting.  I know this
> has been brought up by Otto, Mike, and Ryan across the two related threads
> and I think it's all spot on.  Going from an alert or meta-alert to pulling
> PCAP would by far the primary use case for this in every SOC I've ever
> worked in (not necessarily a representative sample).
>
> I also have some additional thoughts on the feature side after doing some
> brainstorming and talking to two of the SOCs I work most with:
>  - Limit the size of the PCAP, not just the date range, and maybe even have
> a configurable cluster-wide admin max for PCAP retrieval, set to 0/infinite
> by default.
>  - Set priority of PCAP queries.  Perhaps there's an automated
> pcap retrieval 'just in case', which should have a lower priority than an
> interactive request via the UI.
>  - Ability to pause/resume (not just cancel) jobs.
>  - Configurable cluster-wide admin max # of current PCAP queries, set to
> 0/infinite by default.
>  - Ability to pull PCAP live off the wire and stream it into a file.
>  - Ability to filter PCAP by providing a BPF filter to apply in server-side
> post-processing (less efficient, but very versatile).
>  - Request what PCAP data exists in the cluster (answering "how far back
> can I go?")
>  - This is obvious and is probably assumed, but queries based on any set of
> the network 5 tuple (IPs, Ports, Protocol) with at least 1 required.
>
> Jon
>
> On Fri, May 4, 2018 at 9:44 AM Otto Fowler <ottobackwa...@gmail.com>
> wrote:
>
> > That is the ‘views’ part.
> >
> > We can have options on the data output, if you have output full data,
> then
> > we can have different views and interactions for inspection and level of
> > detail.
> >
> >
> >
> > On May 4, 2018 at 09:37:13, Michel Sumbul (michelsum...@gmail.com)
> wrote:
> >
> > It can be like a report but also to investigate some case where the user
> > want to see the whole packet (all the bits and bytes). Like in wireshark,
> > something interactive no?
> >
> > 2018-05-04 14:33 GMT+01:00 Otto Fowler <ottobackwa...@gmail.com>:
> >
> > > The PCAP Query seems more like PCAP Report to me. You are generating a
> > > report based on parameters.
> > > That report is something that takes some time and external process to
> > > generate… ie you have to wait for it.
> > >
> > > I can almost imagine a flow where you:
> > >
> > > * Are in the AlertUI
> > > * Ask to generate a PCAP report based on some selected
> alerts/meta-alert,
> > > possibly picking from on or more report ‘templates’
> > > that have query options etc
> > > * The report request is ‘queued’, that is dispatched to be be
> > > executed/generated
> > > * You as a user have a ‘queue’ of your report results, and when the
> > report
> > > is done it is queued there
> > > * We ‘monitor’ the report/queue press through the yarn rest ( report
> > > info/meta has the yarn details )
> > > * You can select the report from your queue and view it either in a new
> > UI
> > > or custom component
> > > * You can then apply a different ‘view’ to the report or work with the
> > > report data
> > > * You can print / save etc
> > > * You can associate the report with the alerts ( again in the report
> info
> > )
> > > with…. a ‘case’ or ‘ticket’ or investigation something or other
> > >
> > >
> > > We can introduce extensibility into the report templates, report views
> (
> > > thinks that work with the json data of the report )
> > >
> > > Something like that.
> > >
> > >
> > > On May 4, 2018 at 09:19:15, Ryan Merriman (merrim...@gmail.com) wrote:
> > >
> > > Continuing a discussion that started in a discuss thread about exposing
> > > Pcap query capabilities in the back end. How should we expose this
> > feature
> > > to users? Should it be integrated into the Alerts UI or be separate
> > > standalone UI?
> > >
> > > To summarize the general points made in the other thread:
> > >
> > > - Adding this capability to the Alerts UI will make it more of a
> > > composite app. Is that really what we want since we have separate UIs
> for
> > > Alerts and management?
> > > - Would it be better to bring it in on it's own so it can be released
> > > with qualifiers and tested with the right expectations without
> affecting
> > > the Alerts UI?
> > > - There are some use cases that begin with an infosec analyst doing a
> > > search on alerts
> > > followed by them going to query pcap data corresponding to the
> > > threats they're investigating. Would having these features in the same
> > > UI streamline this process?
> > >
> > > There was also mention of some features we should consider:
> > >
> > > - Pcap queries should be made asynchronous via the UI
> > > - Take care that a user doesn't hit refresh or POST multiple times and
> > kick
> > > off 50 mapreduce jobs
> > > - Options for managing the YARN queue that is used
> > > - Provide a "cancel" option that kills the MR job, or tell the user to
> > > go to the CLI to kill their job
> > > - Managing data if multiple users run queries
> > > - Strategy for cleaning up jobs and implementing a TTL (I think this
> one
> > > will be tricky and definitely needs discussion)
> > > - Date range or other query limits
> > >
> > > A couple other features I would add:
> > >
> > > - Ability to paginate through results
> > > - Ability to download results through the UI
> > > - Realtime status of a running job in the UI
> > >
> > > Let me know if I missed any points or did not correctly capture them
> > > here. What
> > > other points do we need to consider? What other features should be
> > > required? Nice to have?
> > >
> >
> --
>
> Jon
>

Reply via email to