GitHub user merrimanr reopened a pull request:
https://github.com/apache/metron/pull/1213
METRON-1681: Decouple the ParserBolt from the Parse execution logic
## Contributor Comments
The primary purpose of this PR is to create a parser container abstraction
that is decoupled from Storm. This parser container (I call it ParserRunner,
anyone have a better name?) is responsible for:
- Instantiating the MessageParser and MessageFilter objects for each sensor.
- Initializing the Stellar environment.
- Accepting a raw binary message along with parser configurations and
calling the MessageParser.parseOptional method for the appropriate sensor type
- Process each message. Most of this logic was migrated from the
ParserBolt.execute method.
- Execute callbacks depending on the processing result of each message
Configuration is external to this abstraction. A configuration supplier is
passed in when initializing and a configuration object is passed in when
processing each message. I believe this was originally done because we want
message processing to be atomic without the configuration unexpectedly
changing. We can easily change to a message supplier during execute if
necessary. A CuratorFramework object is also required for setting up Stellar
but we could easily make this optional.
I decided to keep writing out of the abstraction in this PR. I can
envision different platforms having different requirements or needs for sending
along messages after they are parsed. Therefore the ParserBolt still handles
writing messages to Kafka. If we do decide we want to add writing to our
abstraction we could do it in a follow on PR to keep this from becoming even
bigger.
Since all of the post parsing logic was in the Parserbolt, messages could
be written as they were processed rather than having to wait for all messages
to be processed. To maintain this behavior I added 2 callback functions in the
form of Java Consumers: onSuccess and OnError. The other option would be to
make message processing synchronous and just return a list of results.
This lifecycle of this container looks like:
1. Container is created from a collection of sensor types
2. Container is initialized with the init method that accepts a Curator
client and a configuration Supplier. This in turn sets up Stellar and
instantiates MessageParser and MessageFilter classes.
3. Container is ready and accepts messages for processing and calls the
appropriate callbacks.
Because this splits the ParserBolt into 2 different classes much of the
ParserBoltTest unit test didn't make sense anymore. I ended up essentially
rewriting it and also creating a unit test for ParserRunner. I tried to
represent all the original tests in ParserBoltTest and have 95%+ coverage. If
I missed any cases or made undesirable style changes, let me know.
### Changes Included
- ParserRunner abstraction that is decoupled from the ParserBolt. The
ParserBolt now initializes the ParserRunner and defers parsing to that class.
The only thing required is Metron configuration and a Curator client (which
could be optional)
- Refactored ParserBolt that sets up the ParserRunner, passes message to it
for processing, and writes results to Kafka.
- MessageParser and MessageFilter objects are now created when Storm calls
prepare, avoiding serialization issues
(https://issues.apache.org/jira/browse/METRON-1793)
- Updated unit and integration tests
### Testing
I have done basic testing in full dev:
1. Spin up full dev and verify bro and snort alerts are indexed in ES and
the data looks correct.
2. Test for proper message parser error handling by producing an
unparseable message to the bro topic:
```
echo 'bad message' |
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list
node1:6667 --topic bro
```
You should see a corresponding error message in the ES error index:
```
{
"_index" : "error_index_2018.09.25.20",
"_type" : "error_doc",
"_id" : "df83aebc-506d-404b-aba6-c5a740d07c57",
"_score" : 1.0,
"_source" : {
"exception" : "java.lang.IllegalStateException: Unable to parse
Message: test",
"failed_sensor_type" : "bro",
"stack" : "java.lang.IllegalStateException: Unable to parse
Message: bad message
...
"hostname" : "node1",
"source:type" : "error",
"raw_message" : "test",
"error_hash" :
"9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08",
"error_type" : "parser_error",
"guid" : "df83aebc-506d-404b-aba6-c5a740d07c57",
"message" : "Unable to parse Message: bad message",
"timestamp" : 1537906408288
}
}
```
3. Test for proper invalid message handling setting up global validation.
Add this to the global config:
```
"fieldValidations": [
{
"input": [
"is_alert"
],
"validation": "NOT_EMPTY"
}
]
```
This should cause validation to fail on bro messages and you should see a
corresponding error message in the ES error index:
```
{
"_index" : "error_index_2018.09.25.20",
"_type" : "error_doc",
"_id" : "97c356a3-0fd8-4076-9e9c-0213203565c8",
"_score" : 1.0,
"_source" : {
"failed_sensor_type" : "bro",
"hostname" : "node1",
"source:type" : "error",
...
"error_hash" :
"3e5bc436c661cd7df41d4624b5c7422368373f1e57d201500d731007e146c88a",
"error_type" : "parser_invalid",
"guid" : "97c356a3-0fd8-4076-9e9c-0213203565c8",
"error_fields" : "is_alert",
"timestamp" : 1537907615275
}
}
```
### Next Steps
This PR contains working code but is still needs documentation. I am
planning on testing a couple different parsers in full dev (grok for example)
in addition to what I've already tested. I will be adding inline comments for
some of the less obvious changes or refactors to make it easier to review. My
plan is for any discussion around specific parts of the code to get added as
javadocs eventually. I also think we should add some developer documentation
to make it easier for maintaining and integrating this into other platforms. I
imagine a lot of info in this description would make it in there as well.
The intention for now is to get some feedback on the overall approach and
get people thinking about it. I'm still working on documentation and will add
that soon. Let me know what you think!
## Pull Request Checklist
Thank you for submitting a contribution to Apache Metron.
Please refer to our [Development
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
for the complete guide to follow for contributions.
Please refer also to our [Build Verification
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
for complete smoke testing guides.
In order to streamline the review of the contribution we ask you follow
these guidelines and ask you to double check the following:
### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to
be created at [Metron
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON-XXXX where XXXX is the JIRA
number you are trying to resolve? Pay particular attention to the hyphen "-"
character.
- [x] Has your PR been rebased against the latest commit within the target
branch (typically master)?
### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is
being changed or addressed?
- [ ] Have you included steps or a guide to how the change may be verified
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been
executed in the root metron folder via:
```
mvn -q clean integration-test install &&
dev-utilities/build-utils/verify_licenses.sh
```
- [x] Have you written or updated unit tests and or integration tests to
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [x] Have you verified the basic functionality of the build by building
and running locally with Vagrant full-dev environment or the equivalent?
### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in
which it is rendered by building and verifying the site-book? If not then run
the following commands and the verify changes via
`site-book/target/site/index.html`:
```
cd site-book
mvn site
```
#### Note:
Please ensure that once the PR is submitted, you check travis-ci for build
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up
for your personal repository such that your branches are built there before
submitting a pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/merrimanr/incubator-metron METRON-1681
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/metron/pull/1213.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1213
----
commit 922c76316e5fa2d22976c938f4d4db8c45ec6329
Author: merrimanr <merrimanr@...>
Date: 2018-09-26T16:04:57Z
initial commit
commit 773c34a30f29e7353ecd5fd8bf5fa0f545219eec
Author: merrimanr <merrimanr@...>
Date: 2018-09-26T18:00:11Z
removed commented code
commit 2bbd9962251855158a26c55c45b2b4fbf1d9f9e8
Author: merrimanr <merrimanr@...>
Date: 2018-10-03T21:15:32Z
feedback from nick
commit 2650b9090d41bc9563a2d8e9ed147eefbf3b7591
Author: merrimanr <merrimanr@...>
Date: 2018-10-04T20:35:56Z
removed callbacks
commit 599133dc71d742f928def4eef0dba2121d9a1666
Author: merrimanr <merrimanr@...>
Date: 2018-10-04T21:59:25Z
Merge remote-tracking branch 'mirror/master' into METRON-1681
commit 95b61008ec5e3c6a7628e06947d2d8168bd2765d
Author: merrimanr <merrimanr@...>
Date: 2018-10-09T22:38:10Z
added hashcode method
commit b75e1a39b25361b8c18f73f8326184ebef1d7885
Author: merrimanr <merrimanr@...>
Date: 2018-10-10T17:28:25Z
Merge remote-tracking branch 'mirror/master' into METRON-1681
# Conflicts:
#
metron-platform/metron-parsers/src/main/java/org/apache/metron/parsers/bolt/ParserBolt.java
#
metron-platform/metron-parsers/src/test/java/org/apache/metron/parsers/bolt/ParserBoltTest.java
commit 9f1ce3e479d52f2142efaa9f527e2256ef1a7b38
Author: merrimanr <merrimanr@...>
Date: 2018-10-11T23:07:51Z
resolved conflicts with METRON-1761
----
---