GitHub user justinleet reopened a pull request:
https://github.com/apache/metron/pull/1099
METRON-1657: Parser aggregation in storm
## Contributor Comments
This PR allows for users to specify multiple parsers to be run in one
aggregated Storm topology.
Essentially, the ParserBolt (and the supporting infrastructure) has been
generalized to take multiple sensors. This gives us a structure where there's
a Storm spout per sensor. These all lead to a parser bolt that delegates based
on the topic metadata and output appropriately.
### Current assumptions/restrictions
- Topic metadata must be enabled in order to be able to delegate properly
in the aggregated case (still unneeded for standard case).
- Configs that are shared across parsers generally apply in a "Last-one
wins" manner. Theres's a couple minor exceptions (e.g. if a parser says Kafka
security is not PLAINTEXT, that'll win over anyone who says PLAINTEXT).
- All error topics for the aggregated parsers are the same. This
restriction could be lifted if we generalize that infrastructure a bit, but I
think it's reasonable to leave as a follow-on if there's enough demand.
- Order matters in how the sensors are specified. "bro,yaf" is not the
same as "yaf,bro". There are two places this matters, configs and the name of
the Storm topology. This could pretty easily be lifted by sorting for the Storm
topology (and I might just go ahead and do it anyway).
### Testing
To ensure that single sensor parsers work, just spin up full dev and ensure
everything is passing data as expected.
On fulldev, the REST API has been altered to accept a comma separated list
in the parser start and parser stop endpoints. Just kill bro, and start up
"bro,yaf". An aggregated parser should be be launched in Storm with the names
of both sensors used, and data should flow through both. Because no UI is
attached to aggregated parsers, I chose not to expand this REST API out for now
until we know what's actually needed to properly manage it. The upshot of this
is that you can't specify advanced configs directly here.
On the CLI, the start_parser_topology.sh can be used to submit
e.g.
```
${METRON_HOME}/bin/start_parser_topology.sh -z node1:2181 -s bro,yaf
```
If you want to override certain parameters via command line this is still
possible:
```
${METRON_HOME}/bin/start_parser_topology.sh -z node1:2181 -s bro,yaf -snt
2,3 -pp 2 -pnt 5
```
This will spin up a topology with a spout number of tasks of 2 for bro, 3
for yaf, along with a parser parallelism of 2 and a parser number of tasks of
5. This can be seen in the Storm UI by looking at the number of executors and
tasks for each Storm component after launch. As a heads up, because this is
full dev and it's only running with one Storm supervisor/worker/etc, it's
likely to be unhappy with the config, but it'll still submit the job.
## Pull Request Checklist
Thank you for submitting a contribution to Apache Metron.
Please refer to our [Development
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
for the complete guide to follow for contributions.
Please refer also to our [Build Verification
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
for complete smoke testing guides.
In order to streamline the review of the contribution we ask you follow
these guidelines and ask you to double check the following:
### For all changes:
- [x] Is there a JIRA ticket associated with this PR? If not one needs to
be created at [Metron
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
- [x] Does your PR title start with METRON-XXXX where XXXX is the JIRA
number you are trying to resolve? Pay particular attention to the hyphen "-"
character.
- [x] Has your PR been rebased against the latest commit within the target
branch (typically master)?
### For code changes:
- [x] Have you included steps to reproduce the behavior or problem that is
being changed or addressed?
- [x] Have you included steps or a guide to how the change may be verified
and tested manually?
- [ ] Have you ensured that the full suite of tests and checks have been
executed in the root metron folder via:
```
mvn -q clean integration-test install &&
dev-utilities/build-utils/verify_licenses.sh
```
- [x] Have you written or updated unit tests and or integration tests to
verify your changes?
- [x] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [x] Have you verified the basic functionality of the build by building
and running locally with Vagrant full-dev environment or the equivalent?
### For documentation related changes:
- [ ] Have you ensured that format looks appropriate for the output in
which it is rendered by building and verifying the site-book? If not then run
the following commands and the verify changes via
`site-book/target/site/index.html`:
```
cd site-book
mvn site
```
#### Note:
Please ensure that once the PR is submitted, you check travis-ci for build
issues and submit an update to your PR as soon as possible.
It is also recommended that [travis-ci](https://travis-ci.org) is set up
for your personal repository such that your branches are built there before
submitting a pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/justinleet/metron parserAgg
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/metron/pull/1099.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1099
----
commit edb1bc1fc6217230ad73442045bc5904fd852309
Author: justinjleet <justinjleet@...>
Date: 2018-06-28T14:28:35Z
wip
commit a7725d16209aabd2a3c8935c694aa4373a34468b
Author: justinjleet <justinjleet@...>
Date: 2018-07-02T12:36:17Z
More wip on getting everything carried through
commit d08993a74350314a2197fb86645268e4590b0edc
Author: justinjleet <justinjleet@...>
Date: 2018-07-03T20:09:56Z
more complete wip, although still work to do
commit 463c38911383a19064e5251dc7bf7327391704c8
Author: justinjleet <justinjleet@...>
Date: 2018-07-04T21:50:09Z
allowing for nullable output topic. should end up being enrichment by
default
commit a04b40e8c72d3e437f4e8363505ff77afabb1394
Author: justinjleet <justinjleet@...>
Date: 2018-07-05T13:19:02Z
Actually committing all the fix from before
commit c6949a074c61a86d57749d526418d1dcbf09a90a
Author: justinjleet <justinjleet@...>
Date: 2018-07-05T14:59:26Z
Stop and Start work in REST now
commit ee1ef16ec9e69ba5859dbd9fe8a63bdf64817142
Author: justinjleet <justinjleet@...>
Date: 2018-07-05T19:30:26Z
Fixing unit test
commit 61592b8c5767238a501dce355933f4df22d4a03b
Author: justinjleet <justinjleet@...>
Date: 2018-07-05T21:04:35Z
Some cleanup
commit 3b7d2423aa6105ac20e943c7a49590b7723a1f74
Author: justinjleet <justinjleet@...>
Date: 2018-07-06T01:48:46Z
More cleanup
commit 8e4263ac5452eaf9b8939f7b7acb8d2ef6404676
Author: justinjleet <justinjleet@...>
Date: 2018-07-08T19:31:48Z
Some cleanup, test additions, couple fixes
commit b106c56c549325935813933eb7f5a0d2f1a024a6
Author: justinjleet <justinjleet@...>
Date: 2018-07-10T12:58:36Z
docs and more cleanup
commit 717ff36fa6ff72a17d8891222ff27778d8405c35
Author: justinjleet <justinjleet@...>
Date: 2018-07-10T13:03:37Z
removing old TODO
commit fd0957ed8269467afa7c4c538cf61f64828faaf8
Author: justinjleet <justinjleet@...>
Date: 2018-07-10T13:05:45Z
Adding note to README
commit bceec862fc6275fb54348b7e18cc14d1903a6769
Author: justinjleet <justinjleet@...>
Date: 2018-07-10T13:44:40Z
Removing Splitter and just using a Java stream
----
---