I had a feeling it may be that way.  Unless anyone else knows of a better
approach, it's probably most reasonable to push that into a follow-on JIRA
and not over-complicate the current activities.

Jon

On Wed, May 9, 2018 at 2:33 PM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> We are limited by Yarn and MapReduce applications in the case of
> pause/resume - I could be wrong, but I don't think that's something that's
> supported unless you're talking about multiple MR jobs strung together.
>
>
> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#application
>
> I don't see anything suggesting "SUSPENDED" or "PAUSED" as we have
> available in workflow engines like Oozie.
>
> "The valid application state can be one of the following:  ALL, NEW,
> NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED"
>
> Same goes for MR job commands:
>
> https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html#job
>
> Mike
>
> On Mon, May 7, 2018 at 2:04 PM, zeo...@gmail.com <zeo...@gmail.com> wrote:
>
> > From my perspective PCAP is primarily used as a follow-on to an alert or
> > meta-alert - people very rarely use PCAP for initial hunting.  I know
> this
> > has been brought up by Otto, Mike, and Ryan across the two related
> threads
> > and I think it's all spot on.  Going from an alert or meta-alert to
> pulling
> > PCAP would by far the primary use case for this in every SOC I've ever
> > worked in (not necessarily a representative sample).
> >
> > I also have some additional thoughts on the feature side after doing some
> > brainstorming and talking to two of the SOCs I work most with:
> >  - Limit the size of the PCAP, not just the date range, and maybe even
> have
> > a configurable cluster-wide admin max for PCAP retrieval, set to
> 0/infinite
> > by default.
> >  - Set priority of PCAP queries.  Perhaps there's an automated
> > pcap retrieval 'just in case', which should have a lower priority than an
> > interactive request via the UI.
> >  - Ability to pause/resume (not just cancel) jobs.
> >  - Configurable cluster-wide admin max # of current PCAP queries, set to
> > 0/infinite by default.
> >  - Ability to pull PCAP live off the wire and stream it into a file.
> >  - Ability to filter PCAP by providing a BPF filter to apply in
> server-side
> > post-processing (less efficient, but very versatile).
> >  - Request what PCAP data exists in the cluster (answering "how far back
> > can I go?")
> >  - This is obvious and is probably assumed, but queries based on any set
> of
> > the network 5 tuple (IPs, Ports, Protocol) with at least 1 required.
> >
> > Jon
> >
> > On Fri, May 4, 2018 at 9:44 AM Otto Fowler <ottobackwa...@gmail.com>
> > wrote:
> >
> > > That is the ‘views’ part.
> > >
> > > We can have options on the data output, if you have output full data,
> > then
> > > we can have different views and interactions for inspection and level
> of
> > > detail.
> > >
> > >
> > >
> > > On May 4, 2018 at 09:37:13, Michel Sumbul (michelsum...@gmail.com)
> > wrote:
> > >
> > > It can be like a report but also to investigate some case where the
> user
> > > want to see the whole packet (all the bits and bytes). Like in
> wireshark,
> > > something interactive no?
> > >
> > > 2018-05-04 14:33 GMT+01:00 Otto Fowler <ottobackwa...@gmail.com>:
> > >
> > > > The PCAP Query seems more like PCAP Report to me. You are generating
> a
> > > > report based on parameters.
> > > > That report is something that takes some time and external process to
> > > > generate… ie you have to wait for it.
> > > >
> > > > I can almost imagine a flow where you:
> > > >
> > > > * Are in the AlertUI
> > > > * Ask to generate a PCAP report based on some selected
> > alerts/meta-alert,
> > > > possibly picking from on or more report ‘templates’
> > > > that have query options etc
> > > > * The report request is ‘queued’, that is dispatched to be be
> > > > executed/generated
> > > > * You as a user have a ‘queue’ of your report results, and when the
> > > report
> > > > is done it is queued there
> > > > * We ‘monitor’ the report/queue press through the yarn rest ( report
> > > > info/meta has the yarn details )
> > > > * You can select the report from your queue and view it either in a
> new
> > > UI
> > > > or custom component
> > > > * You can then apply a different ‘view’ to the report or work with
> the
> > > > report data
> > > > * You can print / save etc
> > > > * You can associate the report with the alerts ( again in the report
> > info
> > > )
> > > > with…. a ‘case’ or ‘ticket’ or investigation something or other
> > > >
> > > >
> > > > We can introduce extensibility into the report templates, report
> views
> > (
> > > > thinks that work with the json data of the report )
> > > >
> > > > Something like that.
> > > >
> > > >
> > > > On May 4, 2018 at 09:19:15, Ryan Merriman (merrim...@gmail.com)
> wrote:
> > > >
> > > > Continuing a discussion that started in a discuss thread about
> exposing
> > > > Pcap query capabilities in the back end. How should we expose this
> > > feature
> > > > to users? Should it be integrated into the Alerts UI or be separate
> > > > standalone UI?
> > > >
> > > > To summarize the general points made in the other thread:
> > > >
> > > > - Adding this capability to the Alerts UI will make it more of a
> > > > composite app. Is that really what we want since we have separate UIs
> > for
> > > > Alerts and management?
> > > > - Would it be better to bring it in on it's own so it can be released
> > > > with qualifiers and tested with the right expectations without
> > affecting
> > > > the Alerts UI?
> > > > - There are some use cases that begin with an infosec analyst doing a
> > > > search on alerts
> > > > followed by them going to query pcap data corresponding to the
> > > > threats they're investigating. Would having these features in the
> same
> > > > UI streamline this process?
> > > >
> > > > There was also mention of some features we should consider:
> > > >
> > > > - Pcap queries should be made asynchronous via the UI
> > > > - Take care that a user doesn't hit refresh or POST multiple times
> and
> > > kick
> > > > off 50 mapreduce jobs
> > > > - Options for managing the YARN queue that is used
> > > > - Provide a "cancel" option that kills the MR job, or tell the user
> to
> > > > go to the CLI to kill their job
> > > > - Managing data if multiple users run queries
> > > > - Strategy for cleaning up jobs and implementing a TTL (I think this
> > one
> > > > will be tricky and definitely needs discussion)
> > > > - Date range or other query limits
> > > >
> > > > A couple other features I would add:
> > > >
> > > > - Ability to paginate through results
> > > > - Ability to download results through the UI
> > > > - Realtime status of a running job in the UI
> > > >
> > > > Let me know if I missed any points or did not correctly capture them
> > > > here. What
> > > > other points do we need to consider? What other features should be
> > > > required? Nice to have?
> > > >
> > >
> > --
> >
> > Jon
> >
>
-- 

Jon

Reply via email to