[
https://issues.apache.org/jira/browse/METRON-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749357#comment-15749357
]
ASF GitHub Bot commented on METRON-590:
---------------------------------------
GitHub user nickwallen opened a pull request:
https://github.com/apache/incubator-metron/pull/395
METRON-590 Enable Use of Event Time in Profiler
## [METRON-590](https://issues.apache.org/jira/browse/METRON-590)
### Changes
* Added event time processing support to the Profiler. Previously the
Profiler only supported processing time aka wall clock time processing. Event
time processing is advantageous as it is not susceptible to skew caused by
heavy processing load, allows the reprocessing/replay of archived telemetry
data, and under certain circumstances can produce a more accurate profile of
entity behavior.
* By default, the Profiler will use event time processing. The Flux
topology definition file must be edited to switch the Profiler to wall clock or
processing time.
* The Profiler is now leveraging Storm's windowing functionality introduced
in Storm 1.x. This provides the core engine for event time processing. This
also provides a means for the use of different window types, like sliding
windows, in the Profiler. This is currently not exposed to users of the
Profiler as the Flux topology definition file must be edited to use different
window types.
* Enhanced the Profiler integration tests which was enabled by the use of
event time processing. The integration tests now generate 24 hours of
telemetry data at roughly 3 messages per minute, and then flush profile values
every 15 minutes. The entire stream of values generated by the Profiler is
then validated for correctness.
* Created a `ConfigurationManager` that can be used to read the latest
configuration changes in a remote data store like Zookeeper. The default
implementation, `ZkConfigurationManager` replicates the functionality that is
embedded in the `ConfiguredBolt` base class. The Profiler bolts can no longer
subclass `ConfiguredBolt` as it subclasses Storm's `BaseRichBolt` which will
not work for the Profiler bolts.
* The usability of the Profiler was enhanced to better support active
profiles that are subsequently edited by the user. Changes should be handled
seamlessly by the Profiler. This is especially helpful when a mistake is made
when creating a profile, which then needs to be fixed and updated. The
Profiler was also made more resilient to failures specific to a single Profile
or Tuple. Individual failures should not impact other Profiles or Tuples.
### Testing
Tested on a multi-node AWS cluster and the Quick Dev environment. Created,
edited, and deleted multiple profile definitions as the Profiler was running
and responding to the changes.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/nickwallen/incubator-metron METRON-590
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-metron/pull/395.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #395
----
commit cca756b7781ee7058edadfa84777bce7286d7817
Author: Nick Allen <[email protected]>
Date: 2016-12-07T20:14:07Z
METRON-590 Enable Use of Event Time in Profiler
----
> Enable Use of Event Time in Profiler
> ------------------------------------
>
> Key: METRON-590
> URL: https://issues.apache.org/jira/browse/METRON-590
> Project: Metron
> Issue Type: Improvement
> Reporter: Nick Allen
> Assignee: Nick Allen
>
> There are at least two different times that are important to consider when
> handling the telemetry messages received by Metron.
> (1) Processing time is the time at which Metron processed the message.
> (2) Event time is the time at which the event actually occurred.
> If Metron is consuming live data and all is well, the processing and event
> times may remain close and consistent. When processing time differs from
> event time the data produced by the Profiler may be inaccurate. There are a
> few scenarios under which these times might differ greatly which would
> negatively impact the feature set produced by the Profiler.
> (1) When the system has experienced an outage, for example, a scheduled
> maintenance window. When restarted a high volume of messages will need to be
> processed by the Profiler. The output of the Profiler will indicate an
> increase in activity, although no change in activity actually occurred on the
> target network. This could happen whether the outage was Metron itself or an
> upstream system that feeds data to Metron.
> (2) If the user attempts to replay historical telemetry through the Profiler,
> the Profiler will attribute the activity to the time period in which it was
> processed. Obviously the activity should be attributed to the time period in
> which the raw telemetry events originated in.
> There are some scenarios when processing time might be preferred and other
> use cases where event time is preferred. The Profiler should be enhanced to
> allow it to produce profiles based on either processing time or event time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)