[
https://issues.apache.org/jira/browse/METRON-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422827#comment-16422827
]
ASF GitHub Bot commented on METRON-1505:
----------------------------------------
GitHub user nickwallen opened a pull request:
https://github.com/apache/metron/pull/977
METRON-1505 Intermittent Profiler Integration Test Failure
### Problem
The integration tests were failing intermittently when Storm unexpectedly
expired messages generated by the integration tests. When Storm expired these
messages they were never received by the Profiler bolts, which caused the
integration tests to fail.
### Root Cause
Storm's event window mechanism was not configured correctly to use the
timestamp extracted from the telemetry message. Storm was instead defaulting
to system time.
If the time when the downstream `ProfileBuilderBolt` processed a message
differed significantly enough from when the upstream `ProfileSplitterBolt`
processed the message, the message would be errantly expired by Storm.
This is why the problem could only be replicated when run in Travis, a
resource constrained environment. When run on any other environment, the
system time when these two events occur will not differ enough for Storm to
mistakenly expire the test messages.
This did not necessarily matter for the core functioning of the Profiler,
as the Profiler itself continued to use the correct event timestamps. This bug
only affected significantly out-of-order messages and the flushing of expired
profiles for the integration tests.
### The Fix
The simple fix was to ensure that Storm uses the correct event timestamp
field. Doing this highlighted another problem. Storm does not work correctly
when using tick tuples along with an event timestamp field. Storm will attempt
to extract an event timestamp from the tick tuple, which will not exist and
cause the entire topology to fail.
This meant that I could not use tick tuples. To work around this, I
created a separate thread that flushes the expired profiles regularly. The
separate thread introduces thread safety concerns, so I also needed to perform
some locking.
### Changes
Most of these changes were done in separate commits to making review easier.
1. Added a separate thread to the `ProfileBuilderBolt` to flush expired
profiles regularly. This is the core fix to the integration test bug.
2. Corrected the key generated to cache `ProfileBuilder` objects. This
previously relied on the underlying `ProfileConfig.toString` method which was
error prone and slow. It now uses the hash key.
3. Reduced the number of Profiler integration tests. There is now one
integration test that tests event time processing and another that tests the
same profile using processing time.
Previously there were a number of different profiles that were tested.
This was necessary before as the integration tests were the only effective way
to test different profile logic. Since then, significant refactoring has
occurred which allowed the same logic to be tested in unit tests rather than in
integration tests.
This allowed me to clean-up these tests which reduces run time and
complexity in the integration tests.
4. Added some simple debug logging to `HBaseBolt`.
## Pull Request Checklist
- [ ] Is there a JIRA ticket associated with this PR? If not one needs to
be created at [Metron
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [ ] Does your PR title start with METRON-XXXX where XXXX is the JIRA
number you are trying to resolve? Pay particular attention to the hyphen "-"
character.
- [ ] Has your PR been rebased against the latest commit within the target
branch (typically master)?
- [ ] Have you included steps to reproduce the behavior or problem that is
being changed or addressed?
- [ ] Have you included steps or a guide to how the change may be verified
and tested manually?
- [ ] Have you ensured that the full suite of tests and checks have been
executed in the root metron folder
- [ ] Have you written or updated unit tests and or integration tests to
verify your changes?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] Have you verified the basic functionality of the build by building
and running locally with Vagrant full-dev environment or the equivalent?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/nickwallen/metron METRON-1505
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/metron/pull/977.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #977
----
commit e9185e2249bfcca1c47d3809e05bf742a73ff80e
Author: Nick Allen <nick@...>
Date: 2018-03-26T12:13:26Z
Added debug logging to HbaseBolt
commit ab3e6ad2cefb236a3de8a7c3dec3c100761ffc57
Author: Nick Allen <nick@...>
Date: 2018-03-26T15:34:13Z
Corrected cache key usage and logging
commit 1cda9523fe65ce8ef9ec40305b873e6ed620cc78
Author: Nick Allen <nick@...>
Date: 2018-03-23T22:48:38Z
A separate timer thread is used to flush expired profiles.
----
> Intermittent Profiler Integration Test Failure
> ----------------------------------------------
>
> Key: METRON-1505
> URL: https://issues.apache.org/jira/browse/METRON-1505
> Project: Metron
> Issue Type: Bug
> Reporter: Nick Allen
> Assignee: Nick Allen
> Priority: Major
>
> The Profiler integration tests which use processing time fail intermittently
> when run in Travis CI.
> {code:java}
> 2018-03-22 22:00:35 DEBUG FixedFrequencyFlushSignal:67 - Flush counters reset
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for
> message with timestamp=1521756035759
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for
> message with timestamp=1521756035802
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for
> message with timestamp=1521756035806
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message;
> profile=example2, entity=10.0.0.2, timestamp=1521756035807
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for
> message with timestamp=1521756035807
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message;
> profile=example2, entity=10.0.0.2, timestamp=1521756035808
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for
> message with timestamp=1521756035808
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message;
> profile=example2, entity=10.0.0.2, timestamp=1521756035813
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for
> message with timestamp=1521756035813
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message;
> profile=example2, entity=10.0.0.3, timestamp=1521756035814
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for
> message with timestamp=1521756035814
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message;
> profile=example2, entity=10.0.0.3, timestamp=1521756035816
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for
> message with timestamp=1521756035816
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message;
> profile=example2, entity=10.0.0.3, timestamp=1521756035817
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for
> message with timestamp=1521756035817
> 2018-03-22 22:00:41 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756041122}
> 2018-03-22 22:00:41 DEBUG WindowManager:212 - [6] events expired from window.
> 2018-03-22 22:00:41 DEBUG WindowManager:214 - invoking
> windowLifecycleListener.onExpiry
> 2018-03-22 22:00:41 DEBUG WindowManager:147 - No events in the window,
> skipping onActivation
> 2018-03-22 22:00:46 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756046122}
> 2018-03-22 22:00:46 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:00:46 DEBUG WindowManager:147 - No events in the window,
> skipping onActivation
> 2018-03-22 22:00:51 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756051122}
> 2018-03-22 22:00:51 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:00:51 DEBUG WindowManager:147 - No events in the window,
> skipping onActivation
> 2018-03-22 22:00:56 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756056122}
> 2018-03-22 22:00:56 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:00:56 DEBUG WindowManager:144 - invoking
> windowLifecycleListener onActivation, [1] events in window.
> 2018-03-22 22:00:56 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1
> tuple(s), 0 expired, 1 new
> 2018-03-22 22:00:56 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s).
> 2018-03-22 22:00:56 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles
> and found 0 measurement(s).
> 2018-03-22 22:00:56 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms
> until flush; currentTime=0, flushTime=0
> 2018-03-22 22:01:01 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756061122}
> 2018-03-22 22:01:01 DEBUG WindowManager:212 - [1] events expired from window.
> 2018-03-22 22:01:01 DEBUG WindowManager:214 - invoking
> windowLifecycleListener.onExpiry
> 2018-03-22 22:01:01 DEBUG WindowManager:147 - No events in the window,
> skipping onActivation
> 2018-03-22 22:01:06 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756066122}
> 2018-03-22 22:01:06 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:06 DEBUG WindowManager:147 - No events in the window,
> skipping onActivation
> 2018-03-22 22:01:11 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756071122}
> 2018-03-22 22:01:11 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:11 DEBUG WindowManager:147 - No events in the window,
> skipping onActivation
> 2018-03-22 22:01:16 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756076122}
> 2018-03-22 22:01:16 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:16 DEBUG WindowManager:144 - invoking
> windowLifecycleListener onActivation, [1] events in window.
> 2018-03-22 22:01:16 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1
> tuple(s), 0 expired, 1 new
> 2018-03-22 22:01:16 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s).
> 2018-03-22 22:01:16 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles
> and found 0 measurement(s).
> 2018-03-22 22:01:16 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms
> until flush; currentTime=0, flushTime=0
> 2018-03-22 22:01:21 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756081122}
> 2018-03-22 22:01:21 DEBUG WindowManager:212 - [1] events expired from window.
> 2018-03-22 22:01:21 DEBUG WindowManager:214 - invoking
> windowLifecycleListener.onExpiry
> 2018-03-22 22:01:21 DEBUG WindowManager:147 - No events in the window,
> skipping onActivation
> 2018-03-22 22:01:26 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756086122}
> 2018-03-22 22:01:26 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:26 DEBUG WindowManager:147 - No events in the window,
> skipping onActivation
> 2018-03-22 22:01:31 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756091122}
> 2018-03-22 22:01:31 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:31 DEBUG WindowManager:147 - No events in the window,
> skipping onActivation
> 2018-03-22 22:01:36 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756096122}
> 2018-03-22 22:01:36 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:36 DEBUG WindowManager:144 - invoking
> windowLifecycleListener onActivation, [1] events in window.
> 2018-03-22 22:01:36 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1
> tuple(s), 0 expired, 1 new
> 2018-03-22 22:01:36 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s).
> 2018-03-22 22:01:36 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles
> and found 0 measurement(s).
> 2018-03-22 22:01:36 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms
> until flush; currentTime=0, flushTime=0
> 2018-03-22 22:01:41 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756101122}
> 2018-03-22 22:01:41 DEBUG WindowManager:212 - [1] events expired from window.
> 2018-03-22 22:01:41 DEBUG WindowManager:214 - invoking
> windowLifecycleListener.onExpiry
> 2018-03-22 22:01:41 DEBUG WindowManager:147 - No events in the window,
> skipping onActivation
> 2018-03-22 22:01:46 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756106122}
> 2018-03-22 22:01:46 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:46 DEBUG WindowManager:147 - No events in the window,
> skipping onActivation
> 2018-03-22 22:01:51 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756111122}
> 2018-03-22 22:01:51 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:51 DEBUG WindowManager:147 - No events in the window,
> skipping onActivation
> 2018-03-22 22:01:56 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756116122}
> 2018-03-22 22:01:56 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:56 DEBUG WindowManager:144 - invoking
> windowLifecycleListener onActivation, [1] events in window.
> 2018-03-22 22:01:56 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1
> tuple(s), 0 expired, 1 new
> 2018-03-22 22:01:56 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s).
> 2018-03-22 22:01:56 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles
> and found 0 measurement(s).
> 2018-03-22 22:01:56 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms
> until flush; currentTime=0, flushTime=0
> 2018-03-22 22:02:01 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756121122}
> 2018-03-22 22:02:01 DEBUG WindowManager:212 - [1] events expired from window.
> 2018-03-22 22:02:01 DEBUG WindowManager:214 - invoking
> windowLifecycleListener.onExpiry
> 2018-03-22 22:02:01 DEBUG WindowManager:147 - No events in the window,
> skipping onActivation
> 2018-03-22 22:02:06 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756126122}
> 2018-03-22 22:02:06 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:02:06 DEBUG WindowManager:147 - No events in the window,
> skipping onActivation
> 2018-03-22 22:02:11 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756131122}
> 2018-03-22 22:02:11 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:02:11 DEBUG WindowManager:147 - No events in the window,
> skipping onActivation
> 2018-03-22 22:02:16 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756136122}
> 2018-03-22 22:02:16 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:02:16 DEBUG WindowManager:144 - invoking
> windowLifecycleListener onActivation, [1] events in window.
> 2018-03-22 22:02:16 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1
> tuple(s), 0 expired, 1 new
> 2018-03-22 22:02:16 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s).
> 2018-03-22 22:02:16 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles
> and found 0 measurement(s).
> 2018-03-22 22:02:16 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms
> until flush; currentTime=0, flushTime=0
> 2018-03-22 22:02:21 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756141122}
> 2018-03-22 22:02:21 DEBUG WindowManager:212 - [1] events expired from window.
> 2018-03-22 22:02:21 DEBUG WindowManager:214 - invoking
> windowLifecycleListener.onExpiry
> 2018-03-22 22:02:21 DEBUG WindowManager:147 - No events in the window,
> skipping onActivation
> 2018-03-22 22:02:26 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756146122}
> 2018-03-22 22:02:26 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:02:26 DEBUG WindowManager:147 - No events in the window,
> skipping onActivation
> 2018-03-22 22:02:31 DEBUG WindowManager:189 - Scan events, eviction policy
> TimeEvictionPolicy{windowLength=5000, referenceTime=1521756151122}
> 2018-03-22 22:02:31 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:02:31 DEBUG WindowManager:147 - No events in the window,
> skipping onActivation
> 2018-03-22 22:02:34 DEBUG WindowManager:154 - Shutting down WindowManager
> 2018-03-22 22:02:46 DEBUG FixedFrequencyFlushSignal:67 - Flush counters
> reset{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)