GitHub user justinleet opened a pull request:

    https://github.com/apache/metron/pull/1099

    METRON-1657: Parser aggregation in storm

    ## Contributor Comments
    This PR allows for users to specify multiple parsers to be run in one 
aggregated Storm topology.
    
    Essentially, the ParserBolt (and the supporting infrastructure) has been 
generalized to take multiple sensors.  This gives us a structure where there's 
a Storm spout per sensor.  These all lead to a parser bolt that delegates based 
on the topic metadata and output appropriately.
    
    ### Current assumptions/restrictions
    
    - Topic metadata must be enabled in order to be able to delegate properly 
in the aggregated case (still unneeded for standard case).
    - Configs that are shared across parsers generally apply in a "Last-one 
wins" manner.  Theres's a couple minor exceptions (e.g. if a parser says Kafka 
security is not PLAINTEXT, that'll win over anyone who says PLAINTEXT).
    - All error topics for the aggregated parsers are the same.  This 
restriction could be lifted if we generalize that infrastructure a bit, but I 
think it's reasonable to leave as a follow-on if there's enough demand.
    - Order matters in how the sensors are specified.  "bro,yaf" is not the 
same as "yaf,bro".  There are two places this matters, configs and the name of 
the Storm topology. This could pretty easily be lifted by sorting for the Storm 
topology (and I might just go ahead and do it anyway).
    
    ### Testing
    To ensure that single sensor parsers work, just spin up full dev and ensure 
everything is passing data as expected.
    
    On fulldev, the REST API has been altered to accept a comma separated list 
in the parser start and parser stop endpoints.  Just kill bro, and start up 
"bro,yaf".  An aggregated parser should be be launched in Storm with the names 
of both sensors used, and data should flow through both.   Because no UI is 
attached to aggregated parsers, I chose not to expand this REST API out for now 
until we know what's actually needed to properly manage it. The upshot of this 
is that you can't specify advanced configs directly here.
    
    On the CLI, the start_parser_topology.sh can be used to submit
    e.g.
    ```
    ${METRON_HOME}/bin/start_parser_topology.sh -z node1:2181 -s bro,yaf
    ```
    
    If you want to override certain parameters via command line this is still 
possible:
    ```
    ${METRON_HOME}/bin/start_parser_topology.sh -z node1:2181 -s bro,yaf -snt 
2,3 -pp 2 -pnt 5
    ```
    This will spin up a topology with a spout number of tasks of 2 for bro, 3 
for yaf, along with a parser parallelism of 2 and a parser number of tasks of 
5.  This can be seen in the Storm UI by looking at the number of executors and 
tasks for each Storm component after launch.  As a heads up, because this is 
full dev and it's only running with one Storm supervisor/worker/etc, it's 
likely to be unhappy with the config, but it'll still submit the job.
    
    ## Pull Request Checklist
    
    Thank you for submitting a contribution to Apache Metron.  
    Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
    Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  
    
    
    In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:
    
    ### For all changes:
    - [ ] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
    - [ ] Does your PR title start with METRON-XXXX where XXXX is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
    - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?
    
    
    ### For code changes:
    - [ ] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
    - [ ] Have you included steps or a guide to how the change may be verified 
and tested manually?
    - [ ] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
      ```
      mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
      ```
    
    - [ ] Have you written or updated unit tests and or integration tests to 
verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
    - [ ] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:
    
      ```
      cd site-book
      mvn site
      ```
    
    #### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
    It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/justinleet/metron parserAgg

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/1099.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1099
    
----
commit edb1bc1fc6217230ad73442045bc5904fd852309
Author: justinjleet <justinjleet@...>
Date:   2018-06-28T14:28:35Z

    wip

commit a7725d16209aabd2a3c8935c694aa4373a34468b
Author: justinjleet <justinjleet@...>
Date:   2018-07-02T12:36:17Z

    More wip on getting everything carried through

commit d08993a74350314a2197fb86645268e4590b0edc
Author: justinjleet <justinjleet@...>
Date:   2018-07-03T20:09:56Z

    more complete wip, although still work to do

commit 463c38911383a19064e5251dc7bf7327391704c8
Author: justinjleet <justinjleet@...>
Date:   2018-07-04T21:50:09Z

    allowing for nullable output topic. should end up being enrichment by 
default

commit a04b40e8c72d3e437f4e8363505ff77afabb1394
Author: justinjleet <justinjleet@...>
Date:   2018-07-05T13:19:02Z

    Actually committing all the fix from before

commit c6949a074c61a86d57749d526418d1dcbf09a90a
Author: justinjleet <justinjleet@...>
Date:   2018-07-05T14:59:26Z

    Stop and Start work in REST now

commit ee1ef16ec9e69ba5859dbd9fe8a63bdf64817142
Author: justinjleet <justinjleet@...>
Date:   2018-07-05T19:30:26Z

    Fixing unit test

commit 61592b8c5767238a501dce355933f4df22d4a03b
Author: justinjleet <justinjleet@...>
Date:   2018-07-05T21:04:35Z

    Some cleanup

commit 3b7d2423aa6105ac20e943c7a49590b7723a1f74
Author: justinjleet <justinjleet@...>
Date:   2018-07-06T01:48:46Z

    More cleanup

commit 8e4263ac5452eaf9b8939f7b7acb8d2ef6404676
Author: justinjleet <justinjleet@...>
Date:   2018-07-08T19:31:48Z

    Some cleanup, test additions, couple fixes

commit b106c56c549325935813933eb7f5a0d2f1a024a6
Author: justinjleet <justinjleet@...>
Date:   2018-07-10T12:58:36Z

    docs and more cleanup

commit 717ff36fa6ff72a17d8891222ff27778d8405c35
Author: justinjleet <justinjleet@...>
Date:   2018-07-10T13:03:37Z

    removing old TODO

commit fd0957ed8269467afa7c4c538cf61f64828faaf8
Author: justinjleet <justinjleet@...>
Date:   2018-07-10T13:05:45Z

    Adding note to README

----


---

Reply via email to