Re: [DISCUSS] Profiler Enhancement

zeo...@gmail.com Wed, 07 Feb 2018 08:10:14 -0800

Scenario 2 is one that I'm specifically interested in, I have that exact
use case right now.  I can see Scenario 1 being useful in the future as
well.


I'm also interested in a conversation along the lines of what Otto brought
up (i.e. I would like to re-ingest data to redo parsing, enrichments, etc.)
but happy to keep that conversation separate or for the future.

Really just wanted to comment that this work effort has a huge +1 from me
and is something I've been following.

This work should interact nicely with METRON-1397
<https://github.com/apache/metron/pull/914>, because users may need to
ingest bulk data from time to time that is a result of a system export.

Jon

On Mon, Feb 5, 2018 at 9:38 AM Otto Fowler <ottobackwa...@gmail.com> wrote:

> I think that is fine,  we can use that and work out the UX to manage new or
> replace.  Maybe we can do Profile Compare down the line?
>
> On February 5, 2018 at 09:28:16, Nick Allen (n...@nickallen.org) wrote:
>
> > If we replay a set of data with a new version of a profile I think it
> will always have to be a new profile and not ‘replace’ the old one.
> Series1, Seriers2  etc?
>
> As part of this effort (unless there is a compelling reason) I wouldn't
> change that behavior.  The profile data is stored based on profile name +
> entity + timestamp (I'm glossing over some of the details, but that's
> effectively what happens).  If you change the definition of a profile, but
> the name does not change, then you would replace the existing profile
> data.  If you do not want to replace, then you should change the name of
> the profile.
>
> Now is this the best way to store the data?  I am not sure.  It is a
> complex discussion all by itself, but is something that I would rather
> handle as a separate effort.
>
>
>
> On Fri, Feb 2, 2018 at 5:42 PM, Otto Fowler <ottobackwa...@gmail.com>
> wrote:
>
> > You know, I am going to back this up.
> > I usually thing of replay as replay, profiler or not, but that is not
> true.
> > Replay of data through the full pipeline (parsers/enrichement) has more
> > consequences or concerns, so we can drop this.
> > I don’t want to expand the scope of your idea.  We can reuse/refactor to
> > the other case (parser + enrichment) later.
> > Sorry.
> >
> >
> > ——
> >
> > So, about re-writing.
> > If we replay a set of data with a new version of a profile I think it
> will
> > always have to be a new profile and not ‘replace’
> > the old one.   Series1, Seriers2  etc?
> >
> >
> >
> >
> > On February 2, 2018 at 17:24:46, Nick Allen (n...@nickallen.org) wrote:
> >
> > I think that is definitely a reasonable extension.
> >
> > In this case would we need any additional actions to indicate that data
> > will be overwritten?
> >
> > I am trying to think of other additional needs that this use case has
> over
> > the others.
> >
> > On Feb 2, 2018 12:38 PM, "Otto Fowler" <ottobackwa...@gmail.com> wrote:
> >
> >> Scenario 3:
> >> As a Security ?  I have modified a profile or parser configuration (
> >> replay is replay ), and I want to run the new version
> >> against my old data.
> >>
> >>
> >>
> >> On February 2, 2018 at 12:19:54, Nick Allen (n...@nickallen.org) wrote:
> >>
> >> I have been thinking about an enhancement to the Profiler for quite some
> >> time. Actually, my first pass at defining this was called "Replay
> >> Telemetry through Profiler" back in METRON-594 [1].
> >>
> >> I'd like to first discuss the use case to make sure we start out on the
> >> right foot. Here is how I would define the use cases for this
> >> functionality.
> >>
> >> *> Scenario 1: Model Development*
> >>
> >> As a Security Data Scientist, I want to understand the historical
> >> behaviors
> >> and trends of a profile that I have created so that I can understand if
> it
> >> is valuable for model building.
> >>
> >> There are two possible negative outcomes that the Security Data
> Scientist
> >> must be aware of when creating profiles.
> >>
> >>
> >> - The profile might have been defined incorrectly resulting in a feature
> >> set that does not match reality (a bug in the profile definition).
> >>
> >>
> >> - The profile might have been defined correctly, but the feature set
> >> itself has no predictive value.
> >>
> >> Analyzing the profile over archived, historical telemetry allows the
> >> Security Data Scientist to better to mitigate both of these negative
> >> outcomes.
> >>
> >>
> >> *> Scenario 2: Model Deployment*
> >>
> >> As a Security Platform Engineer, I want to generate a profile using
> >> archived telemetry when I deploy a new model to production so that
> models
> >> depending on that profile can begin to function on day 1.
> >>
> >>
> >>
> >> (Q) Do these make sense? Am I missing anything? Too broad or too narrow?
> >>
> >> Once we nail down the use case(s), I'll delete the old JIRA and create a
> >> new JIRA with the use cases. That would give us a place to start on the
> >> technical details of the implementation.
> >>
> >> [1] https://issues.apache.org/jira/browse/METRON-594
> >>
> >>
>
-- 

Jon

Re: [DISCUSS] Profiler Enhancement

Reply via email to