[
https://issues.apache.org/jira/browse/METRON-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704088#comment-16704088
]
ASF GitHub Bot commented on METRON-1795:
----------------------------------------
Github user jagdeepsingh2 commented on a diff in the pull request:
https://github.com/apache/metron/pull/1245#discussion_r237714079
--- Diff:
metron-platform/metron-parsers/src/test/java/org/apache/metron/parsers/regex/RegularExpressionsParserTest.java
---
@@ -0,0 +1,152 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
contributor license
+ * agreements. See the NOTICE file distributed with this work for
additional information regarding
+ * copyright ownership. The ASF licenses this file to you under the Apache
License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the
License. You may obtain a
+ * copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
distributed under the License
+ * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
ANY KIND, either express
+ * or implied. See the License for the specific language governing
permissions and limitations under
+ * the License.
+ */
+package org.apache.metron.parsers.regex;
+
+import org.json.simple.JSONObject;
+import org.json.simple.parser.JSONParser;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import static org.junit.Assert.assertTrue;
+
+public class RegularExpressionsParserTest {
+
+ private RegularExpressionsParser regularExpressionsParser;
+ private JSONObject parserConfig;
+
+ @Before
+ public void setUp() throws Exception {
+ regularExpressionsParser = new RegularExpressionsParser();
+ }
+
+ @Test
+ public void testSSHDParse() throws Exception {
+ String message =
+ "<38>Jun 20 15:01:17 deviceName sshd[11672]: Accepted publickey
for prod from 22.22.22.22 port 55555 ssh2";
+
+ parserConfig = getJsonConfig(
+
Paths.get("src/test/resources/config/RegularExpressionsParserConfig.json").toString());
+ regularExpressionsParser.configure(parserConfig);
+ JSONObject parsed = parse(message);
+ // Expected
+ Map<String, Object> expectedJson = new HashMap<>();
+ expectedJson.put("device_name", "deviceName");
+ expectedJson.put("dst_process_name", "sshd");
+ expectedJson.put("dst_process_id", "11672");
+ expectedJson.put("dst_user_id", "prod");
+ expectedJson.put("ip_src_addr", "22.22.22.22");
+ expectedJson.put("ip_src_port", "55555");
+ expectedJson.put("app_protocol", "ssh2");
+ assertTrue(validate(expectedJson, parsed));
+
+ }
+
+ @Test
+ public void testNoMessageHeaderRegex() throws Exception {
+ String message =
+ "<38>Jun 20 15:01:17 deviceName sshd[11672]: Accepted publickey
for prod from 22.22.22.22 port 55555 ssh2";
+ parserConfig = getJsonConfig(
+
Paths.get("src/test/resources/config/RegularExpressionsNoMessageHeaderParserConfig.json")
+ .toString());
+ regularExpressionsParser.configure(parserConfig);
+ JSONObject parsed = parse(message);
+ // Expected
+ Map<String, Object> expectedJson = new HashMap<>();
+ expectedJson.put("dst_process_name", "sshd");
+ expectedJson.put("dst_process_id", "11672");
+ expectedJson.put("dst_user_id", "prod");
+ expectedJson.put("ip_src_addr", "22.22.22.22");
+ expectedJson.put("ip_src_port", "55555");
+ expectedJson.put("app_protocol", "ssh2");
+ assertTrue(validate(expectedJson, parsed));
--- End diff --
I personally found junit logging to be insufficient. I wanted more
information in the logs. Also expectedJson.put("ip_src_port", "55555"); was
more concise than its counterpart.
Other advantage of using this method was it would let you know all the
failed scenarios in one run. While a failed JUnit assertion will stop the test
case then and there itself.
Also, Junit best practices state that maximum one assertion per test case.
Now if we want to follow this best practice, we will have to write a unit test
per field which again does not feel right. Having the validate method let us
follow the Junit best practices.
Do you still want me to remove validate method ?
> General Purpose Regex Parser
> ----------------------------
>
> Key: METRON-1795
> URL: https://issues.apache.org/jira/browse/METRON-1795
> Project: Metron
> Issue Type: New Feature
> Reporter: Jagdeep Singh
> Priority: Minor
>
> We have implemented a general purpose regex parser for Metron that we are
> interested in contributing back to the community.
>
> While the Metron Grok parser provides some regex based capability today, the
> intention of this general purpose regex parser is to:
> # Allow for more advanced parsing scenarios (specifically, dealing with
> multiple regex lines for devices that contain several log formats within them)
> # Give users and developers of Metron additional options for parsing
> # With the new parser chaining and regex routing feature available in
> Metron, this gives some additional flexibility to logically separate a flow
> by:
> # Regex routing to segregate logs at a device level and handle envelope
> unwrapping
> # This general purpose regex parser to parse an entire device type that
> contains multiple log formats within the single device (for example, RHEL
> logs)
> At the high-level control flow is like this:
> # Identify the record type if incoming raw message.
> # Find and apply the regular expression of corresponding record type to
> extract the fields (using named groups).
> # Apply the message header regex to extract the fields in the header part of
> the message (using named groups).
>
> The parser config uses the following structure:
>
> {code:java}
> "recordTypeRegex": "(?<process>(?<=\\s)\\b(kernel|syslog)\\b(?=\\[|:))"
> "messageHeaderRegex":
> "(?<syslogpriority>(?<=^<)\\d{1,4}(?=>)).*?(?<timestamp>(?<=>)[A-Za-z]{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(?<syslogHost>(?<=\\s).*?(?=\\s))",
> "fields": [
> {
> "recordType": "kernel",
> "regex": ".*(?<eventInfo>(?<=\\]|\\w\\:).*?(?=$))"
> },
> {
> "recordType": "syslog",
> "regex":
> ".*(?<processid>(?<=PID\\s=\\s).*?(?=\\sLine)).*(?<filePath>(?<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))(?<fileName>.*?(?=\")).*(?<eventInfo>(?<=\").*?(?=$))"
> }
> ]
> {code}
>
> Where:
> * *recordTypeRegex* is used to distinctly identify a record type. It inputs
> a valid regular expression and may also have named groups, which would be
> extracted into fields.
> * *messageHeaderRegex* is used to specify a regular expression to extract
> fields from a message part which is common across all the messages (i.e,
> syslog fields, standard headers)
> * *fields*: json list of objects containing recordType and regex. The
> expression that is evaluated is based on the output of the recordTypeRegex
> * Note: *recordTypeRegex* and *messageHeaderRegex* could be specified as
> lists also (as a JSON array), where the list will be evaluated in order until
> a matching regular expression is found.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)