Github user nickwallen commented on the issue:
https://github.com/apache/incubator-metron/pull/395
I want to provide some feedback on @cestella comments on changes to the
Profiler Client API. Before I do that I want to make sure that we're all on
the same page about usage scenarios for this functionality.
##### "Live" Data
The most common use case is creating profiles on live, streaming data. In
this case the processing time and event time will normally remain close, but
could differ under abnormal conditions.
Note that it is still very valuable to use event time processing in this
scenario. Using event time here has the following advantages.
* Profiles are not skewed by high demand that might delay processing
* Allows the Profiler to take planned/unplanned outages and pick up where
it left off
* Produces more accurate behavioral profiles when there is a time
difference between when a behavior occurs and when the telemetry produced to
tell us about that behavior is received. For example, think of a sensor that
collects data in batches or mini-batches where we get data on regular
intervals; every 10 minutes, hourly, etc.
##### Replayed Data
The other use case that this positions us for is creating profiles from
replayed or reprocessed archival data. I am creating a model based on a new
feature that the Profiler is generating for me. When I move that model into
Production, I need a historical view of that feature, to train my model. I can
replay archived telemetry through the Profiler generating that history of my
new feature. I think I put more examples of this in the original JIRA too.
This PR doesn't actually deliver all we need to handle replaying data.
This just provides one critical component. I don't want to give anyone the
impression that this PR allows us to replay data at this point in time.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---