[ 
https://issues.apache.org/jira/browse/METRON-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16629884#comment-16629884
 ] 

ASF GitHub Bot commented on METRON-1795:
----------------------------------------

GitHub user jagdeepsingh2 opened a pull request:

    https://github.com/apache/metron/pull/1214

    METRON-1795 Initial commit for a general purpose regular expressions …

    …parser.
    
    ## Contributor Comments
    [Please place any comments here.  A description of the problem/enhancement, 
how to reproduce the issue, your testing methodology, etc.]
    
    
    ## Pull Request Checklist
    
    Thank you for submitting a contribution to Apache Metron.  
    Please refer to our [Development 
Guidelines](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61332235)
 for the complete guide to follow for contributions.  
    Please refer also to our [Build Verification 
Guidelines](https://cwiki.apache.org/confluence/display/METRON/Verifying+Builds?show-miniview)
 for complete smoke testing guides.  
    
    
    In order to streamline the review of the contribution we ask you follow 
these guidelines and ask you to double check the following:
    
    ### For all changes:
    - [ ] Is there a JIRA ticket associated with this PR? If not one needs to 
be created at [Metron 
Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
    - [ ] Does your PR title start with METRON-XXXX where XXXX is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
    - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?
    
    
    ### For code changes:
    - [ ] Have you included steps to reproduce the behavior or problem that is 
being changed or addressed?
    - [ ] Have you included steps or a guide to how the change may be verified 
and tested manually?
    - [ ] Have you ensured that the full suite of tests and checks have been 
executed in the root metron folder via:
      ```
      mvn -q clean integration-test install && 
dev-utilities/build-utils/verify_licenses.sh 
      ```
    
    - [ ] Have you written or updated unit tests and or integration tests to 
verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
    - [ ] Have you verified the basic functionality of the build by building 
and running locally with Vagrant full-dev environment or the equivalent?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered by building and verifying the site-book? If not then run 
the following commands and the verify changes via 
`site-book/target/site/index.html`:
    
      ```
      cd site-book
      mvn site
      ```
    
    #### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.
    It is also recommended that [travis-ci](https://travis-ci.org) is set up 
for your personal repository such that your branches are built there before 
submitting a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jagdeepsingh2/metron master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/1214.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1214
    
----
commit ed56a3afd135c84e2cab52dda969c8e112f43602
Author: jagdeep <jagdeep.singh.2@...>
Date:   2018-09-27T07:37:19Z

    METRON-1795 Initial commit for a general purpose regular expressions parser.

----


> General Purpose Regex Parser
> ----------------------------
>
>                 Key: METRON-1795
>                 URL: https://issues.apache.org/jira/browse/METRON-1795
>             Project: Metron
>          Issue Type: New Feature
>            Reporter: Jagdeep Singh
>            Priority: Minor
>
> We have implemented a general purpose regex parser for Metron that we are 
> interested in contributing back to the community.
>  
> While the Metron Grok parser provides some regex based capability today, the 
> intention of this general purpose regex parser is to:
>  # Allow for more advanced parsing scenarios (specifically, dealing with 
> multiple regex lines for devices that contain several log formats within them)
>  # Give users and developers of Metron additional options for parsing
>  # With the new parser chaining and regex routing feature available in 
> Metron, this gives some additional flexibility to logically separate a flow 
> by:
>  # Regex routing to segregate logs at a device level and handle envelope 
> unwrapping
>  # This general purpose regex parser to parse an entire device type that 
> contains multiple log formats within the single device (for example, RHEL 
> logs)
> At the high-level control flow is like this:
>  # Identify the record type if incoming raw message.
>  # Find and apply the regular expression of corresponding record type to 
> extract the fields (using named groups). 
>  # Apply the message header regex to extract the fields in the header part of 
> the message (using named groups).
>  
> The parser config uses the following structure:
>   
> {code:java}
> "recordTypeRegex": "(?<process>(?<=\\s)\\b(kernel|syslog)\\b(?=\\[|:))"  
>  "messageHeaderRegex": 
> "(?<syslogpriority>(?<=^<)\\d{1,4}(?=>)).*?(?<timestamp>(?<=>)[A-Za-z]{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(?<syslogHost>(?<=\\s).*?(?=\\s))",
>    "fields": [
>       {
>         "recordType": "kernel",
>         "regex": ".*(?<eventInfo>(?<=\\]|\\w\\:).*?(?=$))"
>       },
>       {
>         "recordType": "syslog",
>         "regex": 
> ".*(?<processid>(?<=PID\\s=\\s).*?(?=\\sLine)).*(?<filePath>(?<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))(?<fileName>.*?(?=\")).*(?<eventInfo>(?<=\").*?(?=$))"
>       }
> ]
> {code}
>  
> Where:
>  * *recordTypeRegex* is used to distinctly identify a record type. It inputs 
> a valid regular expression and may also have named groups, which would be 
> extracted into fields.
>  * *messageHeaderRegex* is used to specify a regular expression to extract 
> fields from a message part which is common across all the messages (i.e, 
> syslog fields, standard headers)
>  * *fields*: json list of objects containing recordType and regex. The 
> expression that is evaluated is based on the output of the recordTypeRegex
>  * Note: *recordTypeRegex* and *messageHeaderRegex* could be specified as 
> lists also (as a JSON array), where the list will be evaluated in order until 
> a matching regular expression is found.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to