[
https://issues.apache.org/jira/browse/METRON-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15751635#comment-15751635
]
ASF GitHub Bot commented on METRON-590:
---------------------------------------
Github user nickwallen commented on the issue:
https://github.com/apache/incubator-metron/pull/395
I want to provide some feedback on @cestella comments on changes to the
Profiler Client API. Before I do that I want to make sure that we're all on
the same page about usage scenarios for this functionality.
##### "Live" Data
The most common use case is creating profiles on live, streaming data. In
this case the processing time and event time will normally remain close, but
could differ under abnormal conditions.
Note that it is still very valuable to use event time processing in this
scenario. Using event time here has the following advantages.
* Profiles are not skewed by high demand that might delay processing
* Allows the Profiler to take planned/unplanned outages and pick up where
it left off
* Produces more accurate behavioral profiles when there is a time
difference between when a behavior occurs and when the telemetry produced to
tell us about that behavior is received. For example, think of a sensor that
collects data in batches or mini-batches where we get data on regular
intervals; every 10 minutes, hourly, etc.
##### Replayed Data
The other use case that this positions us for is creating profiles from
replayed or reprocessed archival data. I am creating a model based on a new
feature that the Profiler is generating for me. When I move that model into
Production, I need a historical view of that feature, to train my model. I can
replay archived telemetry through the Profiler generating that history of my
new feature. I think I put more examples of this in the original JIRA too.
This PR doesn't actually deliver all we need to handle replaying data.
This just provides one critical component. I don't want to give anyone the
impression that this PR allows us to replay data at this point in time.
> Enable Use of Event Time in Profiler
> ------------------------------------
>
> Key: METRON-590
> URL: https://issues.apache.org/jira/browse/METRON-590
> Project: Metron
> Issue Type: Improvement
> Reporter: Nick Allen
> Assignee: Nick Allen
>
> There are at least two different times that are important to consider when
> handling the telemetry messages received by Metron.
> (1) Processing time is the time at which Metron processed the message.
> (2) Event time is the time at which the event actually occurred.
> If Metron is consuming live data and all is well, the processing and event
> times may remain close and consistent. When processing time differs from
> event time the data produced by the Profiler may be inaccurate. There are a
> few scenarios under which these times might differ greatly which would
> negatively impact the feature set produced by the Profiler.
> (1) When the system has experienced an outage, for example, a scheduled
> maintenance window. When restarted a high volume of messages will need to be
> processed by the Profiler. The output of the Profiler will indicate an
> increase in activity, although no change in activity actually occurred on the
> target network. This could happen whether the outage was Metron itself or an
> upstream system that feeds data to Metron.
> (2) If the user attempts to replay historical telemetry through the Profiler,
> the Profiler will attribute the activity to the time period in which it was
> processed. Obviously the activity should be attributed to the time period in
> which the raw telemetry events originated in.
> There are some scenarios when processing time might be preferred and other
> use cases where event time is preferred. The Profiler should be enhanced to
> allow it to produce profiles based on either processing time or event time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)