GitHub user merrimanr opened a pull request:
https://github.com/apache/metron/pull/1213
METRON-1681: Decouple the ParserBolt from the Parse execution logic
## Contributor Comments
The primary purpose of this PR is to create a parser container abstraction
that is decoupled from Storm. This parser container (I call it ParserRunner,
anyone have a better name?) is responsible for:
- Instantiating the MessageParser and MessageFilter objects for each sensor.
- Initializing the Stellar environment.
- Accepting a raw binary message along with parser configurations and
calling the MessageParser.parseOptional method for the appropriate sensor type
- Process each message. Most of this logic was migrated from the
ParserBolt.execute method.
- Execute callbacks depending on the processing result of each message
Configuration is external to this abstraction. A configuration supplier is
passed in when initializing and a configuration object is passed in when
processing each message. I believe this was originally done because we want
message processing to be atomic without the configuration unexpectedly
changing. We can easily change to a message supplier during execute if
necessary. A CuratorFramework object is also required for setting up Stellar
but we could easily make this optional.
I decided to keep writing out of the abstraction in this PR. I can
envision different platforms having different requirements or needs for sending
along messages after they are parsed. Therefore the ParserBolt still handles
writing messages to Kafka. If we do decide we want to add writing to our
abstraction we could do it in a follow on PR to keep this from becoming even
bigger.
Since all of the post parsing logic was in the Parserbolt, messages could
be written as they were processed rather than having to wait for all messages
to be processed. To maintain this behavior I added 2 callback functions in the
form of Java Consumers: onSuccess and OnError. The other option would be to
make message processing synchronous and just return a list of results.
This lifecycle of this container looks like:
1. Container is created from a collection of sensor types
2. Container is initialized with the init method that accepts a Curator
client and a configuration Supplier. This in turn sets up Stellar and
instantiates MessageParser and MessageFilter classes.
3. Container is ready and accepts messages for processing and calls the
appropriate callbacks.
Because this splits the ParserBolt into 2 different classes much of the
ParserBoltTest unit test didn't make sense anymore. I ended up essentially
rewriting it and also creating a unit test for ParserRunner. I tried to
represent all the original tests in ParserBoltTest and have 95%+ coverage. If
I missed any cases or made undesirable style changes, let me know.
### Changes Included
- ParserRunner abstraction that is decoupled from the ParserBolt. The
ParserBolt now initializes the ParserRunner and defers parsing to that class.
The only thing required is Metron configuration and a Curator client (which
could be optional)
- Refactored ParserBolt that sets up the ParserRunner, passes message to it
for processing, and writes results to Kafka.
- MessageParser and MessageFilter objects are now created when Storm calls
prepare, avoiding serialization issues
(https://issues.apache.org/jira/browse/METRON-1793)
- Updated unit and integration tests
### Testing
I have done basic testing in full dev and will add a testing plan soon.
You should be able to spin up full dev and see bro and snort data in ES.
### Next Steps
This PR contains working code but is still needs documentation. I am
planning on testing a couple different parsers in full dev (grok for example)
in addition to what I've already tested. I will be adding inline comments for
some of the less obvious changes or refactors to make it easier to review. My
plan is for any discussion around specific parts of the code to get added as
javadocs eventually. I also think we should add some developer documentation
to make it easier for maintaining and integrating this into other platforms. I
imagine a lot of info in this description would make it in there as well.
The intention for now is to get some feedback on the overall approach and
get people thinking about it. I'm still working on documentation and will add
that soon. Let me know what you think!
## Pull Request Checklist
Thank you for submitting a contribution to Apache Metron.
Please refer to our [Development
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
for the complete guide to follow for contributions.
Please refer also to our [Build Verification
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
for complete smoke testing guides.
In order to streamline the review of the contribution we ask you follow
these guidelines and ask you to double check the following:
### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to
be created at [Metron
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON-XXXX where XXXX is the JIRA
number you are trying to resolve? Pay particular attention to the hyphen "-"
character.
- [x] Has your PR been rebased against the latest commit within the target
branch (typically master)?
### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is
being changed or addressed?
- [ ] Have you included steps or a guide to how the change may be verified
and tested manually?
- [x] Have you ensured that the full suite of tests and checks have been
executed in the root metron folder via:
```
mvn -q clean integration-test install &&
dev-utilities/build-utils/verify_licenses.sh
```
- [x] Have you written or updated unit tests and or integration tests to
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [x] Have you verified the basic functionality of the build by building
and running locally with Vagrant full-dev environment or the equivalent?
### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in
which it is rendered by building and verifying the site-book? If not then run
the following commands and the verify changes via
`site-book/target/site/index.html`:
```
cd site-book
mvn site
```
#### Note:
Please ensure that once the PR is submitted, you check travis-ci for build
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up
for your personal repository such that your branches are built there before
submitting a pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/merrimanr/incubator-metron METRON-1681
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/metron/pull/1213.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1213
----
commit 922c76316e5fa2d22976c938f4d4db8c45ec6329
Author: merrimanr <merrimanr@...>
Date: 2018-09-26T16:04:57Z
initial commit
commit 773c34a30f29e7353ecd5fd8bf5fa0f545219eec
Author: merrimanr <merrimanr@...>
Date: 2018-09-26T18:00:11Z
removed commented code
----
---