[
https://issues.apache.org/jira/browse/METRON-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nick Allen updated METRON-1699:
-------------------------------
Fix Version/s: Next + 1
> Create Batch Profiler
> ---------------------
>
> Key: METRON-1699
> URL: https://issues.apache.org/jira/browse/METRON-1699
> Project: Metron
> Issue Type: Improvement
> Reporter: Nick Allen
> Assignee: Nick Allen
> Priority: Major
> Fix For: Next + 1
>
> Attachments: Screen Shot 2018-07-27 at 10.55.27 AM.png, Screen Shot
> 2018-07-27 at 11.07.33 AM.png, Screen Shot 2018-07-27 at 11.10.16 AM.png
>
>
> Create a Batch Profiler that satisfies the following use cases.
> h3. Use Cases
> * As a Security Data Scientist, I want to understand the historical
> behaviors and trends of a profile that I have created so that I can determine
> if I have created a feature set that has predictive value for model building.
> * As a Security Data Scientist, I want to understand the historical
> behaviors and trends of a profile that I have created so that I can determine
> if I have defined the profile correctly and created a feature set that
> matches reality.
> * As a Security Platform Engineer, I want to generate a profile using
> archived telemetry when I deploy a new model to production so that models
> depending on that profile can function on day 1.
> h3. Goal
> * Currently, a profile can only be generated from the telemetry consumed
> *after* the profile was created.
> * The goal would be to enable “profile seeding” which allows profiles to be
> populated from a time *before* the profile was created.
> * A profile would be seeded using the telemetry that has been archived by
> Metron in HDFS.
> * A profile consumer should not be able to distinguish the “seeded” portion
> of a profile.
> !Screen Shot 2018-07-27 at 10.55.27 AM.png!
> h3. Current State
> * There are currently two ports of the Profiler; the Streaming Profiler that
> handles streaming data in Storm and the other that runs in the REPL and
> allows a user to manually build, test, and debug profiles.
> * These ports largely share a common code base in
> metron-analytics/metron-profiler-common.
> * A smaller set of “orchestration” logic is required to maintain each port;
> one for Storm, another for the REPL.
> * Both Profiler ports supports both system time and event time processing.
> !Screen Shot 2018-07-27 at 11.07.33 AM.png!
> h3. Approach
> * Create a third port of the Profiler; the Batch Profiler.
> * The Batch Profiler will be built to run in Spark so that the telemetry can
> be consumed in batch.
> * Allows a user to seed profiles using the JSON telemetry that is archived
> in HDFS by Metron Indexing.
> * Only generates the profile data stored in HBase, not the messages that are
> produced for Threat Triage and Kafka.
> * Any number of profiles can be generated at once, but no dependencies
> between the profiles are supported. A dependency is where one profile is a
> consumer of the profile generated by another.
> * The Batch Profiler must use the timestamps contained within the telemetry;
> it runs on event time. Luckily the Profiler already supports event time.
> * Enable a pluggable mechanism so that telemetry stored in different formats
> can be consumed by the Batch Profiler. For example, the Profiler should be
> able to consume telemetry stored as raw JSON or in other formats like ORC or
> Parquet.
> !Screen Shot 2018-07-27 at 11.10.16 AM.png!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)