[ https://issues.apache.org/jira/browse/METRON-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713021#comment-16713021 ]
ASF GitHub Bot commented on METRON-1795: ---------------------------------------- Github user nickwallen commented on a diff in the pull request: https://github.com/apache/metron/pull/1245#discussion_r239855809 --- Diff: metron-platform/metron-parsers/src/test/java/org/apache/metron/parsers/regex/RegularExpressionsParserTest.java --- @@ -0,0 +1,152 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one or more contributor license + * agreements. See the NOTICE file distributed with this work for additional information regarding + * copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance with the License. You may obtain a + * copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software distributed under the License + * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express + * or implied. See the License for the specific language governing permissions and limitations under + * the License. + */ +package org.apache.metron.parsers.regex; + +import org.json.simple.JSONObject; +import org.json.simple.parser.JSONParser; +import org.junit.Before; +import org.junit.Test; + +import java.nio.file.Files; +import java.nio.file.Paths; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import static org.junit.Assert.assertTrue; + +public class RegularExpressionsParserTest { + + private RegularExpressionsParser regularExpressionsParser; + private JSONObject parserConfig; + + @Before + public void setUp() throws Exception { + regularExpressionsParser = new RegularExpressionsParser(); + } + + @Test + public void testSSHDParse() throws Exception { + String message = + "<38>Jun 20 15:01:17 deviceName sshd[11672]: Accepted publickey for prod from 22.22.22.22 port 55555 ssh2"; + + parserConfig = getJsonConfig( + Paths.get("src/test/resources/config/RegularExpressionsParserConfig.json").toString()); --- End diff -- Yes, good point @ottobackwards . @jagdeepsingh2 - He is referring specifically to the class `Syslog3164ParserIntegrationTest` in that PR. Should be fairly simple to put together with what you already have. > General Purpose Regex Parser > ---------------------------- > > Key: METRON-1795 > URL: https://issues.apache.org/jira/browse/METRON-1795 > Project: Metron > Issue Type: New Feature > Reporter: Jagdeep Singh > Priority: Minor > > We have implemented a general purpose regex parser for Metron that we are > interested in contributing back to the community. > > While the Metron Grok parser provides some regex based capability today, the > intention of this general purpose regex parser is to: > # Allow for more advanced parsing scenarios (specifically, dealing with > multiple regex lines for devices that contain several log formats within them) > # Give users and developers of Metron additional options for parsing > # With the new parser chaining and regex routing feature available in > Metron, this gives some additional flexibility to logically separate a flow > by: > # Regex routing to segregate logs at a device level and handle envelope > unwrapping > # This general purpose regex parser to parse an entire device type that > contains multiple log formats within the single device (for example, RHEL > logs) > At the high-level control flow is like this: > # Identify the record type if incoming raw message. > # Find and apply the regular expression of corresponding record type to > extract the fields (using named groups). > # Apply the message header regex to extract the fields in the header part of > the message (using named groups). > > The parser config uses the following structure: > > {code:java} > "recordTypeRegex": "(?<process>(?<=\\s)\\b(kernel|syslog)\\b(?=\\[|:))" > "messageHeaderRegex": > "(?<syslogpriority>(?<=^<)\\d{1,4}(?=>)).*?(?<timestamp>(?<=>)[A-Za-z]{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(?<syslogHost>(?<=\\s).*?(?=\\s))", > "fields": [ > { > "recordType": "kernel", > "regex": ".*(?<eventInfo>(?<=\\]|\\w\\:).*?(?=$))" > }, > { > "recordType": "syslog", > "regex": > ".*(?<processid>(?<=PID\\s=\\s).*?(?=\\sLine)).*(?<filePath>(?<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))(?<fileName>.*?(?=\")).*(?<eventInfo>(?<=\").*?(?=$))" > } > ] > {code} > > Where: > * *recordTypeRegex* is used to distinctly identify a record type. It inputs > a valid regular expression and may also have named groups, which would be > extracted into fields. > * *messageHeaderRegex* is used to specify a regular expression to extract > fields from a message part which is common across all the messages (i.e, > syslog fields, standard headers) > * *fields*: json list of objects containing recordType and regex. The > expression that is evaluated is based on the output of the recordTypeRegex > * Note: *recordTypeRegex* and *messageHeaderRegex* could be specified as > lists also (as a JSON array), where the list will be evaluated in order until > a matching regular expression is found. -- This message was sent by Atlassian JIRA (v7.6.3#76005)