Scenario 2 is one that I'm specifically interested in, I have that exact use case right now. I can see Scenario 1 being useful in the future as well.
I'm also interested in a conversation along the lines of what Otto brought up (i.e. I would like to re-ingest data to redo parsing, enrichments, etc.) but happy to keep that conversation separate or for the future. Really just wanted to comment that this work effort has a huge +1 from me and is something I've been following. This work should interact nicely with METRON-1397 <https://github.com/apache/metron/pull/914>, because users may need to ingest bulk data from time to time that is a result of a system export. Jon On Mon, Feb 5, 2018 at 9:38 AM Otto Fowler <ottobackwa...@gmail.com> wrote: > I think that is fine, we can use that and work out the UX to manage new or > replace. Maybe we can do Profile Compare down the line? > > On February 5, 2018 at 09:28:16, Nick Allen (n...@nickallen.org) wrote: > > > If we replay a set of data with a new version of a profile I think it > will always have to be a new profile and not ‘replace’ the old one. > Series1, Seriers2 etc? > > As part of this effort (unless there is a compelling reason) I wouldn't > change that behavior. The profile data is stored based on profile name + > entity + timestamp (I'm glossing over some of the details, but that's > effectively what happens). If you change the definition of a profile, but > the name does not change, then you would replace the existing profile > data. If you do not want to replace, then you should change the name of > the profile. > > Now is this the best way to store the data? I am not sure. It is a > complex discussion all by itself, but is something that I would rather > handle as a separate effort. > > > > On Fri, Feb 2, 2018 at 5:42 PM, Otto Fowler <ottobackwa...@gmail.com> > wrote: > > > You know, I am going to back this up. > > I usually thing of replay as replay, profiler or not, but that is not > true. > > Replay of data through the full pipeline (parsers/enrichement) has more > > consequences or concerns, so we can drop this. > > I don’t want to expand the scope of your idea. We can reuse/refactor to > > the other case (parser + enrichment) later. > > Sorry. > > > > > > —— > > > > So, about re-writing. > > If we replay a set of data with a new version of a profile I think it > will > > always have to be a new profile and not ‘replace’ > > the old one. Series1, Seriers2 etc? > > > > > > > > > > On February 2, 2018 at 17:24:46, Nick Allen (n...@nickallen.org) wrote: > > > > I think that is definitely a reasonable extension. > > > > In this case would we need any additional actions to indicate that data > > will be overwritten? > > > > I am trying to think of other additional needs that this use case has > over > > the others. > > > > On Feb 2, 2018 12:38 PM, "Otto Fowler" <ottobackwa...@gmail.com> wrote: > > > >> Scenario 3: > >> As a Security ? I have modified a profile or parser configuration ( > >> replay is replay ), and I want to run the new version > >> against my old data. > >> > >> > >> > >> On February 2, 2018 at 12:19:54, Nick Allen (n...@nickallen.org) wrote: > >> > >> I have been thinking about an enhancement to the Profiler for quite some > >> time. Actually, my first pass at defining this was called "Replay > >> Telemetry through Profiler" back in METRON-594 [1]. > >> > >> I'd like to first discuss the use case to make sure we start out on the > >> right foot. Here is how I would define the use cases for this > >> functionality. > >> > >> *> Scenario 1: Model Development* > >> > >> As a Security Data Scientist, I want to understand the historical > >> behaviors > >> and trends of a profile that I have created so that I can understand if > it > >> is valuable for model building. > >> > >> There are two possible negative outcomes that the Security Data > Scientist > >> must be aware of when creating profiles. > >> > >> > >> - The profile might have been defined incorrectly resulting in a feature > >> set that does not match reality (a bug in the profile definition). > >> > >> > >> - The profile might have been defined correctly, but the feature set > >> itself has no predictive value. > >> > >> Analyzing the profile over archived, historical telemetry allows the > >> Security Data Scientist to better to mitigate both of these negative > >> outcomes. > >> > >> > >> *> Scenario 2: Model Deployment* > >> > >> As a Security Platform Engineer, I want to generate a profile using > >> archived telemetry when I deploy a new model to production so that > models > >> depending on that profile can begin to function on day 1. > >> > >> > >> > >> (Q) Do these make sense? Am I missing anything? Too broad or too narrow? > >> > >> Once we nail down the use case(s), I'll delete the old JIRA and create a > >> new JIRA with the use cases. That would give us a place to start on the > >> technical details of the implementation. > >> > >> [1] https://issues.apache.org/jira/browse/METRON-594 > >> > >> > -- Jon