[ 
https://issues.apache.org/jira/browse/METRON-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712022#comment-16712022
 ] 

ASF GitHub Bot commented on METRON-1795:
----------------------------------------

Github user nickwallen commented on a diff in the pull request:

    https://github.com/apache/metron/pull/1245#discussion_r239611104
  
    --- Diff: 
metron-platform/metron-parsers/src/test/java/org/apache/metron/parsers/regex/RegularExpressionsParserTest.java
 ---
    @@ -0,0 +1,152 @@
    +/**
    + * Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license
    + * agreements. See the NOTICE file distributed with this work for 
additional information regarding
    + * copyright ownership. The ASF licenses this file to you under the Apache 
License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance with the 
License. You may obtain a
    + * copy of the License at
    + *
    + * http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software 
distributed under the License
    + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF 
ANY KIND, either express
    + * or implied. See the License for the specific language governing 
permissions and limitations under
    + * the License.
    + */
    +package org.apache.metron.parsers.regex;
    +
    +import org.json.simple.JSONObject;
    +import org.json.simple.parser.JSONParser;
    +import org.junit.Before;
    +import org.junit.Test;
    +
    +import java.nio.file.Files;
    +import java.nio.file.Paths;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +
    +import static org.junit.Assert.assertTrue;
    +
    +public class RegularExpressionsParserTest {
    +
    +  private RegularExpressionsParser regularExpressionsParser;
    +  private JSONObject parserConfig;
    +
    +  @Before
    +  public void setUp() throws Exception {
    +    regularExpressionsParser = new RegularExpressionsParser();
    +  }
    +
    +  @Test
    +  public void testSSHDParse() throws Exception {
    +    String message =
    +        "<38>Jun 20 15:01:17 deviceName sshd[11672]: Accepted publickey 
for prod from 22.22.22.22 port 55555 ssh2";
    +
    +    parserConfig = getJsonConfig(
    +        
Paths.get("src/test/resources/config/RegularExpressionsParserConfig.json").toString());
    --- End diff --
    
    I [opened this JIRA ](https://issues.apache.org/jira/browse/METRON-1926)to 
fix the parsing infrastructure.  The error message produced should have made it 
clear that the message failed because it was missing a timestamp, but it does 
not.


> General Purpose Regex Parser
> ----------------------------
>
>                 Key: METRON-1795
>                 URL: https://issues.apache.org/jira/browse/METRON-1795
>             Project: Metron
>          Issue Type: New Feature
>            Reporter: Jagdeep Singh
>            Priority: Minor
>
> We have implemented a general purpose regex parser for Metron that we are 
> interested in contributing back to the community.
>  
> While the Metron Grok parser provides some regex based capability today, the 
> intention of this general purpose regex parser is to:
>  # Allow for more advanced parsing scenarios (specifically, dealing with 
> multiple regex lines for devices that contain several log formats within them)
>  # Give users and developers of Metron additional options for parsing
>  # With the new parser chaining and regex routing feature available in 
> Metron, this gives some additional flexibility to logically separate a flow 
> by:
>  # Regex routing to segregate logs at a device level and handle envelope 
> unwrapping
>  # This general purpose regex parser to parse an entire device type that 
> contains multiple log formats within the single device (for example, RHEL 
> logs)
> At the high-level control flow is like this:
>  # Identify the record type if incoming raw message.
>  # Find and apply the regular expression of corresponding record type to 
> extract the fields (using named groups). 
>  # Apply the message header regex to extract the fields in the header part of 
> the message (using named groups).
>  
> The parser config uses the following structure:
>   
> {code:java}
> "recordTypeRegex": "(?<process>(?<=\\s)\\b(kernel|syslog)\\b(?=\\[|:))"  
>  "messageHeaderRegex": 
> "(?<syslogpriority>(?<=^<)\\d{1,4}(?=>)).*?(?<timestamp>(?<=>)[A-Za-z]{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(?<syslogHost>(?<=\\s).*?(?=\\s))",
>    "fields": [
>       {
>         "recordType": "kernel",
>         "regex": ".*(?<eventInfo>(?<=\\]|\\w\\:).*?(?=$))"
>       },
>       {
>         "recordType": "syslog",
>         "regex": 
> ".*(?<processid>(?<=PID\\s=\\s).*?(?=\\sLine)).*(?<filePath>(?<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))(?<fileName>.*?(?=\")).*(?<eventInfo>(?<=\").*?(?=$))"
>       }
> ]
> {code}
>  
> Where:
>  * *recordTypeRegex* is used to distinctly identify a record type. It inputs 
> a valid regular expression and may also have named groups, which would be 
> extracted into fields.
>  * *messageHeaderRegex* is used to specify a regular expression to extract 
> fields from a message part which is common across all the messages (i.e, 
> syslog fields, standard headers)
>  * *fields*: json list of objects containing recordType and regex. The 
> expression that is evaluated is based on the output of the recordTypeRegex
>  * Note: *recordTypeRegex* and *messageHeaderRegex* could be specified as 
> lists also (as a JSON array), where the list will be evaluated in order until 
> a matching regular expression is found.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to