[ 
https://issues.apache.org/jira/browse/METRON-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422827#comment-16422827
 ] 

ASF GitHub Bot commented on METRON-1505:
----------------------------------------

GitHub user nickwallen opened a pull request:

    https://github.com/apache/metron/pull/977

    METRON-1505 Intermittent Profiler Integration Test Failure

    ### Problem
    
    The integration tests were failing intermittently when Storm unexpectedly 
expired messages generated by the integration tests.  When Storm expired these 
messages they were never received by the Profiler bolts, which caused the 
integration tests to fail.
    
    ### Root Cause
    
    Storm's event window mechanism was not configured correctly to use the 
timestamp extracted from the telemetry message.  Storm was instead defaulting 
to system time.  
    
    If the time when the downstream `ProfileBuilderBolt` processed a message 
differed significantly enough from when the upstream `ProfileSplitterBolt` 
processed the message, the message would be errantly expired by Storm.  
    
    This is why the problem could only be replicated when run in Travis, a 
resource constrained environment.  When run on any other environment, the 
system time when these two events occur will not differ enough for Storm to 
mistakenly expire the test messages.
    
    This did not necessarily matter for the core functioning of the Profiler, 
as the Profiler itself continued to use the correct event timestamps.  This bug 
only affected significantly out-of-order messages and the flushing of expired 
profiles for the integration tests.
    
    ### The Fix
    
    The simple fix was to ensure that Storm uses the correct event timestamp 
field.  Doing this highlighted another problem.  Storm does not work correctly 
when using tick tuples along with an event timestamp field.  Storm will attempt 
to extract an event timestamp from the tick tuple, which will not exist and 
cause the entire topology to fail.
    
    This meant that I could not use tick tuples.  To work around this, I 
created a separate thread that flushes the expired profiles regularly.  The 
separate thread introduces thread safety concerns, so I also needed to perform 
some locking.
    
    ### Changes
    
    Most of these changes were done in separate commits to making review easier.
    
    1. Added a separate thread to the `ProfileBuilderBolt` to flush expired 
profiles regularly.  This is the core fix to the integration test bug.
    
    2. Corrected the key generated to cache `ProfileBuilder` objects.  This 
previously relied on the underlying `ProfileConfig.toString` method which was 
error prone and slow.  It now uses the hash key.
    
    3. Reduced the number of Profiler integration tests.  There is now one 
integration test that tests event time processing and another that tests the 
same profile using processing time.
    
        Previously there were a number of different profiles that were tested.  
This was necessary before as the integration tests were the only effective way 
to test different profile logic.  Since then, significant refactoring has 
occurred which allowed the same logic to be tested in unit tests rather than in 
integration tests.  
    
        This allowed me to clean-up these tests which reduces run time and 
complexity in the integration tests.
    
    4. Added some simple debug logging to `HBaseBolt`.
    
    ## Pull Request Checklist
    
    - [ ] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
    - [ ] Does your PR title start with METRON-XXXX where XXXX is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
    - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?
    - [ ] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
    - [ ] Have you included steps or a guide to how the change may be verified 
and tested manually?
    - [ ] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder
    - [ ] Have you written or updated unit tests and or integration tests to 
verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
    - [ ] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nickwallen/metron METRON-1505

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/977.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #977
    
----
commit e9185e2249bfcca1c47d3809e05bf742a73ff80e
Author: Nick Allen <nick@...>
Date:   2018-03-26T12:13:26Z

    Added debug logging to HbaseBolt

commit ab3e6ad2cefb236a3de8a7c3dec3c100761ffc57
Author: Nick Allen <nick@...>
Date:   2018-03-26T15:34:13Z

    Corrected cache key usage and logging

commit 1cda9523fe65ce8ef9ec40305b873e6ed620cc78
Author: Nick Allen <nick@...>
Date:   2018-03-23T22:48:38Z

    A separate timer thread is used to flush expired profiles.

----


> Intermittent Profiler Integration Test Failure
> ----------------------------------------------
>
>                 Key: METRON-1505
>                 URL: https://issues.apache.org/jira/browse/METRON-1505
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Nick Allen
>            Assignee: Nick Allen
>            Priority: Major
>
> The Profiler integration tests which use processing time fail intermittently 
> when run in Travis CI.
> {code:java}
> 2018-03-22 22:00:35 DEBUG FixedFrequencyFlushSignal:67 - Flush counters reset
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for 
> message with timestamp=1521756035759
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for 
> message with timestamp=1521756035802
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for 
> message with timestamp=1521756035806
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; 
> profile=example2, entity=10.0.0.2, timestamp=1521756035807
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for 
> message with timestamp=1521756035807
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; 
> profile=example2, entity=10.0.0.2, timestamp=1521756035808
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for 
> message with timestamp=1521756035808
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; 
> profile=example2, entity=10.0.0.2, timestamp=1521756035813
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for 
> message with timestamp=1521756035813
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; 
> profile=example2, entity=10.0.0.3, timestamp=1521756035814
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for 
> message with timestamp=1521756035814
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; 
> profile=example2, entity=10.0.0.3, timestamp=1521756035816
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for 
> message with timestamp=1521756035816
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; 
> profile=example2, entity=10.0.0.3, timestamp=1521756035817
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for 
> message with timestamp=1521756035817
> 2018-03-22 22:00:41 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756041122}
> 2018-03-22 22:00:41 DEBUG WindowManager:212 - [6] events expired from window.
> 2018-03-22 22:00:41 DEBUG WindowManager:214 - invoking 
> windowLifecycleListener.onExpiry
> 2018-03-22 22:00:41 DEBUG WindowManager:147 - No events in the window, 
> skipping onActivation
> 2018-03-22 22:00:46 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756046122}
> 2018-03-22 22:00:46 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:00:46 DEBUG WindowManager:147 - No events in the window, 
> skipping onActivation
> 2018-03-22 22:00:51 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756051122}
> 2018-03-22 22:00:51 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:00:51 DEBUG WindowManager:147 - No events in the window, 
> skipping onActivation
> 2018-03-22 22:00:56 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756056122}
> 2018-03-22 22:00:56 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:00:56 DEBUG WindowManager:144 - invoking 
> windowLifecycleListener onActivation, [1] events in window.
> 2018-03-22 22:00:56 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1 
> tuple(s), 0 expired, 1 new
> 2018-03-22 22:00:56 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s).
> 2018-03-22 22:00:56 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles 
> and found 0 measurement(s).
> 2018-03-22 22:00:56 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms 
> until flush; currentTime=0, flushTime=0
> 2018-03-22 22:01:01 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756061122}
> 2018-03-22 22:01:01 DEBUG WindowManager:212 - [1] events expired from window.
> 2018-03-22 22:01:01 DEBUG WindowManager:214 - invoking 
> windowLifecycleListener.onExpiry
> 2018-03-22 22:01:01 DEBUG WindowManager:147 - No events in the window, 
> skipping onActivation
> 2018-03-22 22:01:06 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756066122}
> 2018-03-22 22:01:06 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:06 DEBUG WindowManager:147 - No events in the window, 
> skipping onActivation
> 2018-03-22 22:01:11 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756071122}
> 2018-03-22 22:01:11 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:11 DEBUG WindowManager:147 - No events in the window, 
> skipping onActivation
> 2018-03-22 22:01:16 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756076122}
> 2018-03-22 22:01:16 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:16 DEBUG WindowManager:144 - invoking 
> windowLifecycleListener onActivation, [1] events in window.
> 2018-03-22 22:01:16 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1 
> tuple(s), 0 expired, 1 new
> 2018-03-22 22:01:16 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s).
> 2018-03-22 22:01:16 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles 
> and found 0 measurement(s).
> 2018-03-22 22:01:16 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms 
> until flush; currentTime=0, flushTime=0
> 2018-03-22 22:01:21 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756081122}
> 2018-03-22 22:01:21 DEBUG WindowManager:212 - [1] events expired from window.
> 2018-03-22 22:01:21 DEBUG WindowManager:214 - invoking 
> windowLifecycleListener.onExpiry
> 2018-03-22 22:01:21 DEBUG WindowManager:147 - No events in the window, 
> skipping onActivation
> 2018-03-22 22:01:26 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756086122}
> 2018-03-22 22:01:26 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:26 DEBUG WindowManager:147 - No events in the window, 
> skipping onActivation
> 2018-03-22 22:01:31 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756091122}
> 2018-03-22 22:01:31 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:31 DEBUG WindowManager:147 - No events in the window, 
> skipping onActivation
> 2018-03-22 22:01:36 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756096122}
> 2018-03-22 22:01:36 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:36 DEBUG WindowManager:144 - invoking 
> windowLifecycleListener onActivation, [1] events in window.
> 2018-03-22 22:01:36 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1 
> tuple(s), 0 expired, 1 new
> 2018-03-22 22:01:36 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s).
> 2018-03-22 22:01:36 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles 
> and found 0 measurement(s).
> 2018-03-22 22:01:36 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms 
> until flush; currentTime=0, flushTime=0
> 2018-03-22 22:01:41 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756101122}
> 2018-03-22 22:01:41 DEBUG WindowManager:212 - [1] events expired from window.
> 2018-03-22 22:01:41 DEBUG WindowManager:214 - invoking 
> windowLifecycleListener.onExpiry
> 2018-03-22 22:01:41 DEBUG WindowManager:147 - No events in the window, 
> skipping onActivation
> 2018-03-22 22:01:46 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756106122}
> 2018-03-22 22:01:46 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:46 DEBUG WindowManager:147 - No events in the window, 
> skipping onActivation
> 2018-03-22 22:01:51 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756111122}
> 2018-03-22 22:01:51 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:51 DEBUG WindowManager:147 - No events in the window, 
> skipping onActivation
> 2018-03-22 22:01:56 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756116122}
> 2018-03-22 22:01:56 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:56 DEBUG WindowManager:144 - invoking 
> windowLifecycleListener onActivation, [1] events in window.
> 2018-03-22 22:01:56 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1 
> tuple(s), 0 expired, 1 new
> 2018-03-22 22:01:56 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s).
> 2018-03-22 22:01:56 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles 
> and found 0 measurement(s).
> 2018-03-22 22:01:56 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms 
> until flush; currentTime=0, flushTime=0
> 2018-03-22 22:02:01 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756121122}
> 2018-03-22 22:02:01 DEBUG WindowManager:212 - [1] events expired from window.
> 2018-03-22 22:02:01 DEBUG WindowManager:214 - invoking 
> windowLifecycleListener.onExpiry
> 2018-03-22 22:02:01 DEBUG WindowManager:147 - No events in the window, 
> skipping onActivation
> 2018-03-22 22:02:06 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756126122}
> 2018-03-22 22:02:06 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:02:06 DEBUG WindowManager:147 - No events in the window, 
> skipping onActivation
> 2018-03-22 22:02:11 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756131122}
> 2018-03-22 22:02:11 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:02:11 DEBUG WindowManager:147 - No events in the window, 
> skipping onActivation
> 2018-03-22 22:02:16 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756136122}
> 2018-03-22 22:02:16 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:02:16 DEBUG WindowManager:144 - invoking 
> windowLifecycleListener onActivation, [1] events in window.
> 2018-03-22 22:02:16 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1 
> tuple(s), 0 expired, 1 new
> 2018-03-22 22:02:16 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s).
> 2018-03-22 22:02:16 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles 
> and found 0 measurement(s).
> 2018-03-22 22:02:16 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms 
> until flush; currentTime=0, flushTime=0
> 2018-03-22 22:02:21 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756141122}
> 2018-03-22 22:02:21 DEBUG WindowManager:212 - [1] events expired from window.
> 2018-03-22 22:02:21 DEBUG WindowManager:214 - invoking 
> windowLifecycleListener.onExpiry
> 2018-03-22 22:02:21 DEBUG WindowManager:147 - No events in the window, 
> skipping onActivation
> 2018-03-22 22:02:26 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756146122}
> 2018-03-22 22:02:26 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:02:26 DEBUG WindowManager:147 - No events in the window, 
> skipping onActivation
> 2018-03-22 22:02:31 DEBUG WindowManager:189 - Scan events, eviction policy 
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756151122}
> 2018-03-22 22:02:31 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:02:31 DEBUG WindowManager:147 - No events in the window, 
> skipping onActivation
> 2018-03-22 22:02:34 DEBUG WindowManager:154 - Shutting down WindowManager
> 2018-03-22 22:02:46 DEBUG FixedFrequencyFlushSignal:67 - Flush counters 
> reset{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to