GitHub user merrimanr reopened a pull request:

    https://github.com/apache/metron/pull/1213

    METRON-1681: Decouple the ParserBolt from the Parse execution logic

    ## Contributor Comments
    The primary purpose of this PR is to create a parser container abstraction 
that is decoupled from Storm.  This parser container (I call it ParserRunner, 
anyone have a better name?) is responsible for:
    
    - Instantiating the MessageParser and MessageFilter objects for each sensor.
    - Initializing the Stellar environment.
    - Accepting a raw binary message along with parser configurations and 
calling the MessageParser.parseOptional method for the appropriate sensor type
    - Process each message.  Most of this logic was migrated from the 
ParserBolt.execute method.  
    - Execute callbacks depending on the processing result of each message
    
    Configuration is external to this abstraction.  A configuration supplier is 
passed in when initializing and a configuration object is passed in when 
processing each message.  I believe this was originally done because we want 
message processing to be atomic without the configuration unexpectedly 
changing.  We can easily change to a message supplier during execute if 
necessary.  A CuratorFramework object is also required for setting up Stellar 
but we could easily make this optional.  
    
    I decided to keep writing out of the abstraction in this PR.  I can 
envision different platforms having different requirements or needs for sending 
along messages after they are parsed.  Therefore the ParserBolt still handles 
writing messages to Kafka.  If we do decide we want to add writing to our 
abstraction we could do it in a follow on PR to keep this from becoming even 
bigger.
    
    Since all of the post parsing logic was in the Parserbolt, messages could 
be written as they were processed rather than having to wait for all messages 
to be processed. To maintain this behavior I added 2 callback functions in the 
form of Java Consumers:  onSuccess and OnError.  The other option would be to 
make message processing synchronous and just return a list of results.
    
    This lifecycle of this container looks like:
    
    1. Container is created from a collection of sensor types 
    2. Container is initialized with the init method that accepts a Curator 
client and a configuration Supplier.  This in turn sets up Stellar and 
instantiates MessageParser and MessageFilter classes.
    3. Container is ready and accepts messages for processing and calls the 
appropriate callbacks.
    
    Because this splits the ParserBolt into 2 different classes much of the 
ParserBoltTest unit test didn't make sense anymore.  I ended up essentially 
rewriting it and also creating a unit test for ParserRunner.  I tried to 
represent all the original tests in ParserBoltTest and have 95%+ coverage.  If 
I missed any cases or made undesirable style changes, let me know.
    
    ### Changes Included
    
    - ParserRunner abstraction that is decoupled from the ParserBolt.  The 
ParserBolt now initializes the ParserRunner and defers parsing to that class.  
The only thing required is Metron configuration and a Curator client (which 
could be optional)
    - Refactored ParserBolt that sets up the ParserRunner, passes message to it 
for processing, and writes results to Kafka.
    - MessageParser and MessageFilter objects are now created when Storm calls 
prepare, avoiding serialization issues 
(https://issues.apache.org/jira/browse/METRON-1793)
    - Updated unit and integration tests
    
    ### Testing
    I have done basic testing in full dev:
    
    1. Spin up full dev and verify bro and snort alerts are indexed in ES and 
the data looks correct.
    
    2. Test for proper message parser error handling by producing an 
unparseable message to the bro topic:
    ```
    echo 'bad message' | 
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list 
node1:6667 --topic bro
    ```
    You should see a corresponding error message in the ES error index:
    ```
    {
            "_index" : "error_index_2018.09.25.20",
            "_type" : "error_doc",
            "_id" : "df83aebc-506d-404b-aba6-c5a740d07c57",
            "_score" : 1.0,
            "_source" : {
              "exception" : "java.lang.IllegalStateException: Unable to parse 
Message: test",
              "failed_sensor_type" : "bro",
              "stack" : "java.lang.IllegalStateException: Unable to parse 
Message: bad message
              ...
              "hostname" : "node1",
              "source:type" : "error",
              "raw_message" : "test",
              "error_hash" : 
"9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08",
              "error_type" : "parser_error",
              "guid" : "df83aebc-506d-404b-aba6-c5a740d07c57",
              "message" : "Unable to parse Message: bad message",
              "timestamp" : 1537906408288
            }
    }
    ```
    
    3. Test for proper invalid message handling setting up global validation.  
Add this to the global config:
    ```
    "fieldValidations": [
        {
          "input": [
            "is_alert"
          ],
          "validation": "NOT_EMPTY"
        }
      ]
    ```
    This should cause validation to fail on bro messages and you should see a 
corresponding error message in the ES error index:
    ```
    {
            "_index" : "error_index_2018.09.25.20",
            "_type" : "error_doc",
            "_id" : "97c356a3-0fd8-4076-9e9c-0213203565c8",
            "_score" : 1.0,
            "_source" : {
              "failed_sensor_type" : "bro",
              "hostname" : "node1",
              "source:type" : "error",
              ...
              "error_hash" : 
"3e5bc436c661cd7df41d4624b5c7422368373f1e57d201500d731007e146c88a",
              "error_type" : "parser_invalid",
              "guid" : "97c356a3-0fd8-4076-9e9c-0213203565c8",
              "error_fields" : "is_alert",
              "timestamp" : 1537907615275
            }
    }
    ```
    
     ### Next Steps
    This PR contains working code but is still needs documentation.  I am 
planning on testing a couple different parsers in full dev (grok for example) 
in addition to what I've already tested.  I will be adding inline comments for 
some of the less obvious changes or refactors to make it easier to review.  My 
plan is for any discussion around specific parts of the code to get added as 
javadocs eventually.  I also think we should add some developer documentation 
to make it easier for maintaining and integrating this into other platforms.  I 
imagine a lot of info in this description would make it in there as well.
    
    The intention for now is to get some feedback on the overall approach and 
get people thinking about it.  I'm still working on documentation and will add 
that soon.  Let me know what you think!
    
    ## Pull Request Checklist
    
    Thank you for submitting a contribution to Apache Metron.  
    Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
    Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  
    
    
    In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:
    
    ### For all changes:
    - [x] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
    - [x] Does your PR title start with METRON-XXXX where XXXX is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
    - [x] Has your PR been rebased against the latest commit within the target 
branch (typically master)?
    
    
    ### For code changes:
    - [x] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
    - [ ] Have you included steps or a guide to how the change may be verified 
and tested manually?
    - [x] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
      ```
      mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
      ```
    
    - [x] Have you written or updated unit tests and or integration tests to 
verify your changes?
    - [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
    - [x] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:
    
      ```
      cd site-book
      mvn site
      ```
    
    #### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
    It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/merrimanr/incubator-metron METRON-1681

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/1213.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1213
    
----
commit 922c76316e5fa2d22976c938f4d4db8c45ec6329
Author: merrimanr <merrimanr@...>
Date:   2018-09-26T16:04:57Z

    initial commit

commit 773c34a30f29e7353ecd5fd8bf5fa0f545219eec
Author: merrimanr <merrimanr@...>
Date:   2018-09-26T18:00:11Z

    removed commented code

commit 2bbd9962251855158a26c55c45b2b4fbf1d9f9e8
Author: merrimanr <merrimanr@...>
Date:   2018-10-03T21:15:32Z

    feedback from nick

commit 2650b9090d41bc9563a2d8e9ed147eefbf3b7591
Author: merrimanr <merrimanr@...>
Date:   2018-10-04T20:35:56Z

    removed callbacks

commit 599133dc71d742f928def4eef0dba2121d9a1666
Author: merrimanr <merrimanr@...>
Date:   2018-10-04T21:59:25Z

    Merge remote-tracking branch 'mirror/master' into METRON-1681

commit 95b61008ec5e3c6a7628e06947d2d8168bd2765d
Author: merrimanr <merrimanr@...>
Date:   2018-10-09T22:38:10Z

    added hashcode method

commit b75e1a39b25361b8c18f73f8326184ebef1d7885
Author: merrimanr <merrimanr@...>
Date:   2018-10-10T17:28:25Z

    Merge remote-tracking branch 'mirror/master' into METRON-1681
    
    # Conflicts:
    #   
metron-platform/metron-parsers/src/main/java/org/apache/metron/parsers/bolt/ParserBolt.java
    #   
metron-platform/metron-parsers/src/test/java/org/apache/metron/parsers/bolt/ParserBoltTest.java

commit 9f1ce3e479d52f2142efaa9f527e2256ef1a7b38
Author: merrimanr <merrimanr@...>
Date:   2018-10-11T23:07:51Z

    resolved conflicts with METRON-1761

----


---

Reply via email to