[ https://issues.apache.org/jira/browse/METRON-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422827#comment-16422827 ]
ASF GitHub Bot commented on METRON-1505: ---------------------------------------- GitHub user nickwallen opened a pull request: https://github.com/apache/metron/pull/977 METRON-1505 Intermittent Profiler Integration Test Failure ### Problem The integration tests were failing intermittently when Storm unexpectedly expired messages generated by the integration tests. When Storm expired these messages they were never received by the Profiler bolts, which caused the integration tests to fail. ### Root Cause Storm's event window mechanism was not configured correctly to use the timestamp extracted from the telemetry message. Storm was instead defaulting to system time. If the time when the downstream `ProfileBuilderBolt` processed a message differed significantly enough from when the upstream `ProfileSplitterBolt` processed the message, the message would be errantly expired by Storm. This is why the problem could only be replicated when run in Travis, a resource constrained environment. When run on any other environment, the system time when these two events occur will not differ enough for Storm to mistakenly expire the test messages. This did not necessarily matter for the core functioning of the Profiler, as the Profiler itself continued to use the correct event timestamps. This bug only affected significantly out-of-order messages and the flushing of expired profiles for the integration tests. ### The Fix The simple fix was to ensure that Storm uses the correct event timestamp field. Doing this highlighted another problem. Storm does not work correctly when using tick tuples along with an event timestamp field. Storm will attempt to extract an event timestamp from the tick tuple, which will not exist and cause the entire topology to fail. This meant that I could not use tick tuples. To work around this, I created a separate thread that flushes the expired profiles regularly. The separate thread introduces thread safety concerns, so I also needed to perform some locking. ### Changes Most of these changes were done in separate commits to making review easier. 1. Added a separate thread to the `ProfileBuilderBolt` to flush expired profiles regularly. This is the core fix to the integration test bug. 2. Corrected the key generated to cache `ProfileBuilder` objects. This previously relied on the underlying `ProfileConfig.toString` method which was error prone and slow. It now uses the hash key. 3. Reduced the number of Profiler integration tests. There is now one integration test that tests event time processing and another that tests the same profile using processing time. Previously there were a number of different profiles that were tested. This was necessary before as the integration tests were the only effective way to test different profile logic. Since then, significant refactoring has occurred which allowed the same logic to be tested in unit tests rather than in integration tests. This allowed me to clean-up these tests which reduces run time and complexity in the integration tests. 4. Added some simple debug logging to `HBaseBolt`. ## Pull Request Checklist - [ ] Is there a JIRA ticket associated with this PR? If not one needs to be created at [Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel). - [ ] Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [ ] Has your PR been rebased against the latest commit within the target branch (typically master)? - [ ] Have you included steps to reproduce the behavior or problem that is being changed or addressed? - [ ] Have you included steps or a guide to how the change may be verified and tested manually? - [ ] Have you ensured that the full suite of tests and checks have been executed in the root metron folder - [ ] Have you written or updated unit tests and or integration tests to verify your changes? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent? You can merge this pull request into a Git repository by running: $ git pull https://github.com/nickwallen/metron METRON-1505 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/metron/pull/977.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #977 ---- commit e9185e2249bfcca1c47d3809e05bf742a73ff80e Author: Nick Allen <nick@...> Date: 2018-03-26T12:13:26Z Added debug logging to HbaseBolt commit ab3e6ad2cefb236a3de8a7c3dec3c100761ffc57 Author: Nick Allen <nick@...> Date: 2018-03-26T15:34:13Z Corrected cache key usage and logging commit 1cda9523fe65ce8ef9ec40305b873e6ed620cc78 Author: Nick Allen <nick@...> Date: 2018-03-23T22:48:38Z A separate timer thread is used to flush expired profiles. ---- > Intermittent Profiler Integration Test Failure > ---------------------------------------------- > > Key: METRON-1505 > URL: https://issues.apache.org/jira/browse/METRON-1505 > Project: Metron > Issue Type: Bug > Reporter: Nick Allen > Assignee: Nick Allen > Priority: Major > > The Profiler integration tests which use processing time fail intermittently > when run in Travis CI. > {code:java} > 2018-03-22 22:00:35 DEBUG FixedFrequencyFlushSignal:67 - Flush counters reset > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035759 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035802 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for > message with timestamp=1521756035806 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035807 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035807 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035808 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035808 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.2, timestamp=1521756035813 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035813 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035814 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035814 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035816 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035816 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; > profile=example2, entity=10.0.0.3, timestamp=1521756035817 > 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for > message with timestamp=1521756035817 > 2018-03-22 22:00:41 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756041122} > 2018-03-22 22:00:41 DEBUG WindowManager:212 - [6] events expired from window. > 2018-03-22 22:00:41 DEBUG WindowManager:214 - invoking > windowLifecycleListener.onExpiry > 2018-03-22 22:00:41 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:00:46 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756046122} > 2018-03-22 22:00:46 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:00:46 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:00:51 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756051122} > 2018-03-22 22:00:51 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:00:51 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:00:56 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756056122} > 2018-03-22 22:00:56 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:00:56 DEBUG WindowManager:144 - invoking > windowLifecycleListener onActivation, [1] events in window. > 2018-03-22 22:00:56 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1 > tuple(s), 0 expired, 1 new > 2018-03-22 22:00:56 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s). > 2018-03-22 22:00:56 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles > and found 0 measurement(s). > 2018-03-22 22:00:56 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms > until flush; currentTime=0, flushTime=0 > 2018-03-22 22:01:01 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756061122} > 2018-03-22 22:01:01 DEBUG WindowManager:212 - [1] events expired from window. > 2018-03-22 22:01:01 DEBUG WindowManager:214 - invoking > windowLifecycleListener.onExpiry > 2018-03-22 22:01:01 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:01:06 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756066122} > 2018-03-22 22:01:06 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:01:06 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:01:11 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756071122} > 2018-03-22 22:01:11 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:01:11 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:01:16 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756076122} > 2018-03-22 22:01:16 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:01:16 DEBUG WindowManager:144 - invoking > windowLifecycleListener onActivation, [1] events in window. > 2018-03-22 22:01:16 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1 > tuple(s), 0 expired, 1 new > 2018-03-22 22:01:16 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s). > 2018-03-22 22:01:16 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles > and found 0 measurement(s). > 2018-03-22 22:01:16 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms > until flush; currentTime=0, flushTime=0 > 2018-03-22 22:01:21 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756081122} > 2018-03-22 22:01:21 DEBUG WindowManager:212 - [1] events expired from window. > 2018-03-22 22:01:21 DEBUG WindowManager:214 - invoking > windowLifecycleListener.onExpiry > 2018-03-22 22:01:21 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:01:26 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756086122} > 2018-03-22 22:01:26 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:01:26 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:01:31 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756091122} > 2018-03-22 22:01:31 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:01:31 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:01:36 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756096122} > 2018-03-22 22:01:36 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:01:36 DEBUG WindowManager:144 - invoking > windowLifecycleListener onActivation, [1] events in window. > 2018-03-22 22:01:36 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1 > tuple(s), 0 expired, 1 new > 2018-03-22 22:01:36 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s). > 2018-03-22 22:01:36 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles > and found 0 measurement(s). > 2018-03-22 22:01:36 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms > until flush; currentTime=0, flushTime=0 > 2018-03-22 22:01:41 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756101122} > 2018-03-22 22:01:41 DEBUG WindowManager:212 - [1] events expired from window. > 2018-03-22 22:01:41 DEBUG WindowManager:214 - invoking > windowLifecycleListener.onExpiry > 2018-03-22 22:01:41 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:01:46 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756106122} > 2018-03-22 22:01:46 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:01:46 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:01:51 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756111122} > 2018-03-22 22:01:51 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:01:51 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:01:56 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756116122} > 2018-03-22 22:01:56 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:01:56 DEBUG WindowManager:144 - invoking > windowLifecycleListener onActivation, [1] events in window. > 2018-03-22 22:01:56 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1 > tuple(s), 0 expired, 1 new > 2018-03-22 22:01:56 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s). > 2018-03-22 22:01:56 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles > and found 0 measurement(s). > 2018-03-22 22:01:56 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms > until flush; currentTime=0, flushTime=0 > 2018-03-22 22:02:01 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756121122} > 2018-03-22 22:02:01 DEBUG WindowManager:212 - [1] events expired from window. > 2018-03-22 22:02:01 DEBUG WindowManager:214 - invoking > windowLifecycleListener.onExpiry > 2018-03-22 22:02:01 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:02:06 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756126122} > 2018-03-22 22:02:06 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:02:06 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:02:11 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756131122} > 2018-03-22 22:02:11 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:02:11 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:02:16 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756136122} > 2018-03-22 22:02:16 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:02:16 DEBUG WindowManager:144 - invoking > windowLifecycleListener onActivation, [1] events in window. > 2018-03-22 22:02:16 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1 > tuple(s), 0 expired, 1 new > 2018-03-22 22:02:16 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s). > 2018-03-22 22:02:16 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles > and found 0 measurement(s). > 2018-03-22 22:02:16 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms > until flush; currentTime=0, flushTime=0 > 2018-03-22 22:02:21 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756141122} > 2018-03-22 22:02:21 DEBUG WindowManager:212 - [1] events expired from window. > 2018-03-22 22:02:21 DEBUG WindowManager:214 - invoking > windowLifecycleListener.onExpiry > 2018-03-22 22:02:21 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:02:26 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756146122} > 2018-03-22 22:02:26 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:02:26 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:02:31 DEBUG WindowManager:189 - Scan events, eviction policy > TimeEvictionPolicy{windowLength=5000, referenceTime=1521756151122} > 2018-03-22 22:02:31 DEBUG WindowManager:212 - [0] events expired from window. > 2018-03-22 22:02:31 DEBUG WindowManager:147 - No events in the window, > skipping onActivation > 2018-03-22 22:02:34 DEBUG WindowManager:154 - Shutting down WindowManager > 2018-03-22 22:02:46 DEBUG FixedFrequencyFlushSignal:67 - Flush counters > reset{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)