GitHub user nickwallen opened a pull request:

    https://github.com/apache/metron/pull/965

    METRON-590 Enable Use of Event Time in Profiler

    This enables the use of event time processing in the Profiler.
    
    By default, the Profiler will still use processing time.  If you configure 
the profiler with a `timestampField` then it will extract the timestamps from 
that field contained within the incoming telemetry.
    
    ## Manual Testing
    
    
    
    1. Launch a development environment.  Shutdown Indexing, Elasticsearch, 
Kibana, YARN, and MapReduce2 to avoid any resource issues.
    
    1. Using Ambari, change the following settings and restart the Profiler.
    
        Set the "Period Duration" to 1 minute.
        Set the "Window Duration" to 15 seconds.
        Set the "Window Lag" to 30 seconds.
    
    1. Replace `/opt/sensor-stubs/bin/start-bro-stub` with the following.
    
        Instead of adding the current time into each Bro message, this will add 
a timestamp from 1 day ago.
        ```
        #
        # how long to delay between each 'batch' in seconds.
        #
        DELAY=${1:-2}
    
        #
        # how many messages to send in each 'batch'.  the messages are drawn 
randomly
        # from the entire set of canned data.
        #
        COUNT=${2:-10}
    
        INPUT="/opt/sensor-stubs/data/bro.out"
        PRODUCER="/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh"
        TOPIC="bro"
    
        while true; do
    
          # transform the bro timestamp and push to kafka
          SEARCH="\"ts\"\:[0-9]\+\."
          REPLACE="\"ts\"\:`date -d '1 day ago' +'%s'`\."
          shuf -n $COUNT $INPUT | sed -e "s/$SEARCH/$REPLACE/g" | $PRODUCER 
--broker-list node1:6667 --topic $TOPIC
    
          sleep $DELAY
        done
        ```
    
    1. Restart the Bro Sensor Stub.
    
        ```
        service sensor-stubs stop
        service sensor-stubs start bro
        ```
    
    1. Open up the REPL and configure the Profiler like so.
    
        Notice that we are setting the 'timestampField' within the Profiler 
configuration.  This will tell the Profiler to extract the timestamp from this 
field rather than using system time.
        ```
        [Stellar]>>> conf := SHELL_EDIT(conf)
        {
          "profiles": [
            {
              "profile": "hello-world",
              "onlyif": "source.type == 'bro'",
              "foreach": "'global'",
              "init":    { "count": "0" },
              "update":  { "count": "count + 1" },
              "result":  "count"
            }
          ],
          "timestampField": "timestamp"
        }
        [Stellar]>>>
        [Stellar]>>>
        [Stellar]>>> CONFIG_PUT("PROFILER",conf)
        ```
    
    1. Query the Profiler data store.  This will take a minute or so until you 
see a value written.
    
        ```
        [Stellar]>>> PROFILE_GET("hello-world", "global", PROFILE_FIXED(2, 
"DAYS"))
        []
        [Stellar]>>> PROFILE_GET("hello-world", "global", PROFILE_FIXED(2, 
"DAYS"))
        [200]
        ```
    
    1. Now query back just a couple hours instead.  Notice that you should get 
no results.  This indicates that the Profiler successfully used the timestamp 
from the Bro data which contained day old values.
    
        ```
        [Stellar]>>> PROFILE_GET("hello-world", "global", PROFILE_FIXED(2, 
"HOURS"))
        []
        ```
    
    1. Now change the Profiler configuration to remove the "timestampField" 
setting.  This will switch the Profiler back to using system aka processing 
time.
    
        ```
        [Stellar]>>> conf := SHELL_EDIT(conf)
        {
          "profiles": [
            {
              "profile": "hello-world",
              "onlyif": "source.type == 'bro'",
              "foreach": "'global'",
              "init":    { "count": "0" },
              "update":  { "count": "count + 1" },
              "result":  "count"
            }
          ]
        }
        [Stellar]>>>
        [Stellar]>>> CONFIG_PUT("PROFILER",conf)
        ```
    
    1. The Profiler will pick-up the change after the next flush event.  Query 
for profile data in the past few minutes.  This shows that the Profiler has 
switched back to use system time aka processing time.
    
        ```
        [Stellar]>>> PROFILE_GET("hello-world", "global", PROFILE_FIXED(2, 
"MINUTES"))
        [180, 190]
        ```
    
    1. In Storm you can also set logging to DEBUG for 
"org.apache.metron.profiler". This will output detailed worker logs that allows 
you to also verify that the profiler is using the correct timestamps.
    
    
    
    ## Pull Request Checklist
    
    - [ ] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
    - [ ] Does your PR title start with METRON-XXXX where XXXX is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
    - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?
    - [ ] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
    - [ ] Have you included steps or a guide to how the change may be verified 
and tested manually?
    - [ ] Have you ensured that the full suite of tests and checks have been 
executed in the root metron 
    - [ ] Have you written or updated unit tests and or integration tests to 
verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
    - [ ] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nickwallen/metron METRON-590-2018

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/965.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #965
    
----
commit 64343bc0d99880ac8bb17137a9226c3f44417da7
Author: Nick Allen <nick@...>
Date:   2018-02-13T14:52:54Z

    METRON-590 Enable Use of Event Time in Profiler

----


---

Reply via email to