This is an automated email from the ASF dual-hosted git repository.
mmiklavcic pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metron.git
The following commit(s) were added to refs/heads/master by this push:
new 54aa46e METRON-2066 Documentation and logging corrections (mmiklavc)
closes apache/metron#1378
54aa46e is described below
commit 54aa46ee44a329504559f417790324c175f5af6a
Author: mmiklavc <[email protected]>
AuthorDate: Wed Apr 10 13:04:03 2019 -0600
METRON-2066 Documentation and logging corrections (mmiklavc) closes
apache/metron#1378
---
metron-platform/Performance-tuning-guide.md | 2 +-
metron-platform/README.md | 2 +-
metron-platform/metron-common/README.md | 18 +++++++++-
metron-platform/metron-parsing/README.md | 35 ++++++++++++++-----
.../java/org/apache/metron/parsers/GrokParser.java | 39 +++++++++++-----------
5 files changed, 64 insertions(+), 32 deletions(-)
diff --git a/metron-platform/Performance-tuning-guide.md
b/metron-platform/Performance-tuning-guide.md
index bd5c126..fe1b01b 100644
--- a/metron-platform/Performance-tuning-guide.md
+++ b/metron-platform/Performance-tuning-guide.md
@@ -412,7 +412,7 @@ And we ran our bro parser topology with the following
options. We did not need t
though you could certainly do so if necessary. Notice that we only needed 1
worker.
```
-/usr/metron/0.7.1/bin/start_parser_topology.sh \
+$METRON_HOME/bin/start_parser_topology.sh \
-e ~metron/.storm/storm-bro.config \
-esc ~/.storm/spout-bro.config \
-k $BROKERLIST \
diff --git a/metron-platform/README.md b/metron-platform/README.md
index feb30e5..e5a7e6a 100644
--- a/metron-platform/README.md
+++ b/metron-platform/README.md
@@ -27,4 +27,4 @@ Extensible set of Storm topologies and topology attributes
for streaming, enrich
# Documentation
-Please see documentation within each individual module for description and
usage instructions. Sample topologies are provided under Metron_Topologies to
get you started with the framework. We pre-assume knowledge of Hadoop, Storm,
Kafka, and HBase.
+Please see documentation within each individual module for description and
usage instructions. Sample topologies are provided under Metron_Topologies to
get you started with the framework. We pre-assume knowledge of Hadoop, Storm,
Kafka, Zookeeper, and HBase.
diff --git a/metron-platform/metron-common/README.md
b/metron-platform/metron-common/README.md
index 20f0eef..cbea9dd 100644
--- a/metron-platform/metron-common/README.md
+++ b/metron-platform/metron-common/README.md
@@ -18,6 +18,7 @@ limitations under the License.
# Contents
* [Stellar Language](#stellar-language)
+* [High Level Architecture](#high-level-architecture)
* [Global Configuration](#global-configuration)
* [Validation Framework](#validation-framework)
* [Management Utility](#management-utility)
@@ -109,6 +110,20 @@ If a field is managed via ambari, you should change the
field via
ambari. Otherwise, upon service restarts, you may find your update
overwritten.
+# High Level Architecture
+
+As already pointed out in the main project README, Apache Metron is a Kappa
architecture (see [Navigating the
Architecture](../../#navigating-the-architecture)) primarily backed by Storm
and Kafka. We additionally leverage:
+* Zookeeper for dynamic configuration updates to running Storm topologies.
This enables us to push updates to our Storm topologies without restarting them.
+* HBase primarily for enrichments. But we also use it to store user state for
our UI's.
+* HDFS for long term storage. Our parsed and enriched messages land here,
along with any reported exceptions or errors encountered along the way.
+* Solr and Elasticsearch (plus Kibana) for real-time access. We provide out of
the box compatibility with both Solr and Elasticsearch, and custom dashboards
for data exploration in Kibana.
+* Zeppelin for providing dashboards to do custom analytics.
+
+Getting data "into" Metron is accomplished by setting up a Kafka topic for
parsers to read from. There are a variety of options, including, but not
limited to:
+* [Bro Kafka plugin](https://github.com/apache/metron-bro-plugin-kafka)
+* [Fastcapa](../../metron-sensors/fastcapa)
+* [NiFi](https://nifi.apache.org)
+
# Validation Framework
Inside of the global configuration, there is a validation framework in
@@ -336,7 +351,8 @@ Errors generated in Metron topologies are transformed into
JSON format and follo
"error_hash":
"f7baf053f2d3c801a01d196f40f3468e87eea81788b2567423030100865c5061",
"error_type": "parser_error",
"message": "Unable to parse Message: {\"http\":
{\"ts\":1488809627.000000.31915,\"uid\":\"C9JpSd2vFAWo3mXKz1\", ...",
- "timestamp": 1488809630698
+ "timestamp": 1488809630698,
+ "guid": "bf9fb8d1-2507-4a41-a5b2-42f75f6ddc63"
}
```
diff --git a/metron-platform/metron-parsing/README.md
b/metron-platform/metron-parsing/README.md
index b8f44cb..e5368fe 100644
--- a/metron-platform/metron-parsing/README.md
+++ b/metron-platform/metron-parsing/README.md
@@ -15,8 +15,22 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
+
# Parsers
+## Contents
+
+* [Introduction](#introduction)
+* [Parser Error Routing](#parser-error-routing)
+* [Filtering](#filtering)
+* [Parser Architecture](#parser-architecture)
+* [Message Format](#message-format)
+* [Global Configuration](#global-configuration)
+* [Parser Configuration](#parser-configuration)
+* [Parser Adapters](#parser-adapters)
+* [Kafka Queue](#kafka-queue)
+* [JSON Path](#json-path)
+
## Introduction
Parsers are pluggable components which are used to transform raw data
@@ -27,12 +41,12 @@ There are two general types types of parsers:
* A parser written in Java which conforms to the `MessageParser` interface.
This kind of parser is optimized for speed and performance and is built for use
with higher velocity topologies. These parsers are not easily modifiable and
in order to make changes to them the entire topology need to be recompiled.
* A general purpose parser. This type of parser is primarily designed for
lower-velocity topologies or for quickly standing up a parser for a new
telemetry before a permanent Java parser can be written for it. As of the time
of this writing, we have:
* Grok parser: `org.apache.metron.parsers.GrokParser` with possible
`parserConfig` entries of
- * `grokPath` : The path in HDFS (or in the Jar) to the grok statement
+ * `grokPath` : The path in HDFS (or in the Jar) to the grok statement.
By default attempts to load from HDFS, then falls back to the classpath, and
finally throws an exception if unable to load a pattern.
* `patternLabel` : The pattern label to use from the grok statement
* `multiLine` : The raw data passed in should be handled as a long
with multiple lines, with each line to be parsed separately. This setting's
valid values are 'true' or 'false'. The default if unset is 'false'. When set
the parser will handle multiple lines with successfully processed lines emitted
normally, and lines with errors sent to the error topic.
- * `timestampField` : The field to use for timestamp
- * `timeFields` : A list of fields to be treated as time
- * `dateFormat` : The date format to use to parse the time fields
+ * `timestampField` : The field to use for timestamp. If your data does
not have a field exactly named "timestamp" this field is required, otherwise
the record will not pass validation. If the timestampField is also included in
the list of timeFields, it will first be parsed using the provided dateFormat.
+ * `timeFields` : A list of fields to be treated as time.
+ * `dateFormat` : The date format to use to parse the time fields.
Default is "yyyy-MM-dd HH:mm:ss.S z".
* `timezone` : The timezone to use. `UTC` is default.
* The Grok parser supports either 1 line to parse per incoming
message, or incoming messages with multiple log lines, and will produce a json
message per line
* CSV Parser: `org.apache.metron.parsers.csv.CSVParser` with possible
`parserConfig` entries of
@@ -154,10 +168,13 @@ messages or marking messages as invalid.
There are two reasons a message will be marked as invalid:
* Fail [global validation](../../metron-common#validation-framework)
-* Fail the parser's validate function (generally that means to not have a
`timestamp` field or a `original_string` field.
+* Fail the parser's validate function. Generally, that means not having a
`timestamp` field or an `original_string` field.
-Those messages which are marked as invalid are sent to the error queue
-with an indication that they are invalid in the error message.
+Those messages which are marked as invalid are sent to the error queue with an
indication that they
+are invalid in the error message. The messages will contain
"error_type":"parser_invalid". Note,
+you will not see additional exceptions in the logs for this type of failure,
rather the error messages
+are written directly to the configured error topic. See [Topology
Errors](../../metron-common#topology-errors)
+for more.
### Parser Errors
@@ -166,7 +183,7 @@ parse, are sent along to the error queue with a message
indicating that
there was an error in parse along with a stacktrace. This is to
distinguish from the invalid messages.
-## Filtered
+## Filtering
One can also filter a message by specifying a `filterClassName` in the
parser config. Filtered messages are just dropped rather than passed
@@ -261,7 +278,7 @@ The document is structured in the following way
}
```
-* `sensorTopic` : The kafka topic to send the parsed messages to. If the
topic is prefixed and suffixed by `/`
+* `sensorTopic` : The kafka topic to that the parser will read messages from.
If the topic is prefixed and suffixed by `/`
then it is assumed to be a regex and will match any topic matching the pattern
(e.g. `/bro.*/` would match `bro_cust0`, `bro_cust1` and `bro_cust2`)
* `readMetadata` : Boolean indicating whether to read metadata or not (The
default is raw message strategy dependent). See below for a discussion about
metadata.
* `mergeMetadata` : Boolean indicating whether to merge metadata with the
message or not (The default is raw message strategy dependent). See below for
a discussion about metadata.
diff --git
a/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/GrokParser.java
b/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/GrokParser.java
index f64b4af..616639c 100644
---
a/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/GrokParser.java
+++
b/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/GrokParser.java
@@ -20,19 +20,6 @@ package org.apache.metron.parsers;
import com.google.common.base.Joiner;
import com.google.common.base.Splitter;
-import oi.thekraken.grok.api.Grok;
-import oi.thekraken.grok.api.Match;
-import org.apache.commons.lang3.StringUtils;
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-import org.apache.metron.common.Constants;
-import org.apache.metron.parsers.interfaces.MessageParser;
-import org.apache.metron.parsers.interfaces.MessageParserResult;
-import org.json.simple.JSONObject;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
@@ -50,6 +37,18 @@ import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.TimeZone;
+import oi.thekraken.grok.api.Grok;
+import oi.thekraken.grok.api.Match;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.metron.common.Constants;
+import org.apache.metron.parsers.interfaces.MessageParser;
+import org.apache.metron.parsers.interfaces.MessageParserResult;
+import org.json.simple.JSONObject;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
public class GrokParser implements MessageParser<JSONObject>, Serializable {
@@ -96,9 +95,11 @@ public class GrokParser implements
MessageParser<JSONObject>, Serializable {
public InputStream openInputStream(String streamName) throws IOException {
FileSystem fs = FileSystem.get(new Configuration());
Path path = new Path(streamName);
- if(fs.exists(path)) {
+ if (fs.exists(path)) {
+ LOG.info("Loading {} from HDFS.", streamName);
return fs.open(path);
} else {
+ LOG.info("File not found in HDFS, attempting to load {} from classpath
using classloader for {}.", streamName, getClass());
return getClass().getResourceAsStream(streamName);
}
}
@@ -108,7 +109,7 @@ public class GrokParser implements
MessageParser<JSONObject>, Serializable {
grok = new Grok();
try {
InputStream commonInputStream = openInputStream(patternsCommonDir);
- LOG.debug("Grok parser loading common patterns from: {}",
patternsCommonDir);
+ LOG.info("Grok parser loading common patterns from: {}",
patternsCommonDir);
if (commonInputStream == null) {
throw new RuntimeException(
@@ -116,7 +117,7 @@ public class GrokParser implements
MessageParser<JSONObject>, Serializable {
}
grok.addPatternFromReader(new InputStreamReader(commonInputStream));
- LOG.debug("Loading parser-specific patterns from: {}", grokPath);
+ LOG.info("Loading parser-specific patterns from: {}", grokPath);
InputStream patterInputStream = openInputStream(grokPath);
if (patterInputStream == null) {
@@ -125,14 +126,12 @@ public class GrokParser implements
MessageParser<JSONObject>, Serializable {
}
grok.addPatternFromReader(new InputStreamReader(patterInputStream));
- if (LOG.isDebugEnabled()) {
- LOG.debug("Grok parser set the following grok expression: {}",
grok.getNamedRegexCollectionById(patternLabel));
- }
+ LOG.info("Grok parser set the following grok expression for '{}': {}",
patternLabel, grok.getPatterns().get(patternLabel));
String grokPattern = "%{" + patternLabel + "}";
grok.compile(grokPattern);
- LOG.debug("Compiled grok pattern {}", grokPattern);
+ LOG.info("Compiled grok pattern {}", grokPattern);
} catch (Throwable e) {
LOG.error(e.getMessage(), e);