I think these are valid cases, but that there is a more general ‘replay’
functionality with other cases as well.  I would think that Metron may
require a general replay story across those cases.

* replay to MaaS much the same as you have here
* replay of data for updated enrichment/triage/threat intel
* running some MaaS, Profiling, Triage/Threat completely and *always* on
demand




On November 29, 2016 at 12:08:36, Nick Allen ([email protected]) wrote:

I would love any feedback from the community. Is this useful? How should
this work? What use cases do you envision? What features do we need to
support this? Feel free to respond in this thread or on the JIRA itself.

METRON-594 <https://issues.apache.org/jira/browse/METRON-594>


The Profiler currently consumes live telemetry, in real-time, as it is
streamed through Metron. A useful extension of this functionality would
allow the Profiler to also consume archived, historical telemetry. Allowing
a user to selectively replay archived, historical raw telemetry through the
Profiler has a number of applications. The following use cases help
describe why this might be useful.

Use Case 1 - Model Development

When developing a new model, I often need a feature set of historical data
on which to train my model. I can either wait days, weeks, months for the
Profiler to generate this based on live data or I could re-run the raw,
historical telemetry through the Profiler to get started immediately. It is
much simpler to use the same mechanism to create this historical data set,
than a separate batch-driven tool to recreate something that approximates
the historical feature set.

Use Case 2 - Model Deployment

When deploying an analytical model to a new environment, like production,
on day 1 there is often no historical data for the model to work with. This
often leaves a gap between when the model is deployed and when that model
is actually useful. If I could replay raw telemetry through the profiler a
historical feature set could be created as part of the deployment process.
This allows my model to start functioning on day 1.

Use Case 3 - Profile Validation

When creating a Profile, it is difficult to understand how the configured
profile might behave against the entire data set. By creating the profile
and watching it consume real-time streaming data, I only have an
understanding of how it behaves on that small segment of data. If I am able
to replay historical telemetry, I can instantly understand how it behaves
on a much larger data set; including all the anomalies and exceptions that
exist in all large data sets.





-- 
Nick Allen <[email protected]>

Reply via email to