This is an automated email from the ASF dual-hosted git repository.
mmiklavcic pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metron.git
The following commit(s) were added to refs/heads/master by this push:
new 9e026e3 METRON-1950: Site-book generation broken in master (mmiklavc)
closes apache/metron#1309
9e026e3 is described below
commit 9e026e3e902769dae364b4c4acf64e00839d24f5
Author: mmiklavc <[email protected]>
AuthorDate: Thu Dec 20 12:22:19 2018 -0700
METRON-1950: Site-book generation broken in master (mmiklavc) closes
apache/metron#1309
---
metron-platform/metron-parsing/README.md | 536 +++++++++++----------
.../{metron-parsers-common => }/parser_arch.png | Bin
site-book/bin/generate-md.sh | 6 +-
3 files changed, 276 insertions(+), 266 deletions(-)
diff --git a/metron-platform/metron-parsing/README.md
b/metron-platform/metron-parsing/README.md
index 76b6168..9a46532 100644
--- a/metron-platform/metron-parsing/README.md
+++ b/metron-platform/metron-parsing/README.md
@@ -21,127 +21,129 @@ limitations under the License.
Parsers are pluggable components which are used to transform raw data
(textual or raw bytes) into JSON messages suitable for downstream
-enrichment and indexing.
+enrichment and indexing.
There are two general types types of parsers:
* A parser written in Java which conforms to the `MessageParser` interface.
This kind of parser is optimized for speed and performance and is built for use
with higher velocity topologies. These parsers are not easily modifiable and
in order to make changes to them the entire topology need to be recompiled.
* A general purpose parser. This type of parser is primarily designed for
lower-velocity topologies or for quickly standing up a parser for a new
telemetry before a permanent Java parser can be written for it. As of the time
of this writing, we have:
- * Grok parser: `org.apache.metron.parsers.GrokParser` with possible
`parserConfig` entries of
- * `grokPath` : The path in HDFS (or in the Jar) to the grok statement
- * `patternLabel` : The pattern label to use from the grok statement
- * `multiLine` : The raw data passed in should be handled as a long with
multiple lines, with each line to be parsed separately. This setting's valid
values are 'true' or 'false'. The default if unset is 'false'. When set the
parser will handle multiple lines with successfully processed lines emitted
normally, and lines with errors sent to the error topic.
- * `timestampField` : The field to use for timestamp
- * `timeFields` : A list of fields to be treated as time
- * `dateFormat` : The date format to use to parse the time fields
- * `timezone` : The timezone to use. `UTC` is default.
- * The Grok parser supports either 1 line to parse per incoming message, or
incoming messages with multiple log lines, and will produce a json message per
line
- * CSV Parser: `org.apache.metron.parsers.csv.CSVParser` with possible
`parserConfig` entries of
- * `timestampFormat` : The date format of the timestamp to use. If
unspecified, the parser assumes the timestamp is ms since unix epoch.
- * `columns` : A map of column names you wish to extract from the CSV to
their offsets (e.g. `{ 'name' : 1, 'profession' : 3}` would be a column map
for extracting the 2nd and 4th columns from a CSV)
- * `separator` : The column separator, `,` by default.
- * JSON Map Parser: `org.apache.metron.parsers.json.JSONMapParser` with
possible `parserConfig` entries of
- * `mapStrategy` : A strategy to indicate how to handle multi-dimensional
Maps. This is one of
- * `DROP` : Drop fields which contain maps
- * `UNFOLD` : Unfold inner maps. So `{ "foo" : { "bar" : 1} }` would
turn into `{"foo.bar" : 1}`
- * `ALLOW` : Allow multidimensional maps
- * `ERROR` : Throw an error when a multidimensional map is encountered
- * `jsonpQuery` : A [JSON Path](#json_path) query string. If present, the
result of the JSON Path query should be a list of messages. This is useful if
you have a JSON document which contains a list or array of messages embedded in
it, and you do not have another means of splitting the message.
- * `wrapInEntityArray` : `"true" or "false"`. If `jsonQuery` is present and
this flag is present and set to `"true"`, the incoming message will be wrapped
in a JSON entity and array.
- for example:
- `{"name":"value"},{"name2","value2}` will be wrapped as `{"message" :
[{"name":"value"},{"name2","value2}]}`.
- This is using the default value for `wrapEntityName` if that property
is not set.
- * `wrapEntityName` : Sets the name to use when wrapping JSON using
`wrapInEntityArray`. The `jsonpQuery` should reference this name.
- * A field called `timestamp` is expected to exist and, if it does not,
then current time is inserted.
- * Regular Expressions Parser
- * `recordTypeRegex` : A regular expression to uniquely identify a record
type.
- * `messageHeaderRegex` : A regular expression used to extract fields
from a message part which is common across all the messages.
- * `convertCamelCaseToUnderScore` : If this property is set to true, this
parser will automatically convert all the camel case property names to
underscore seperated.
- For example, following convertions will automatically happen:
-
- ```
- ipSrcAddr -> ip_src_addr
- ipDstAddr -> ip_dst_addr
- ipSrcPort -> ip_src_port
- ```
- Note this property may be necessary, because java does not support
underscores in the named group names. So in case your property naming
conventions requires underscores in property names, use this property.
-
- * `fields` : A json list of maps contaning a record type to regular
expression mapping.
-
- A complete configuration example would look like:
-
- ```json
- "convertCamelCaseToUnderScore": true,
- "recordTypeRegex": "kernel|syslog",
- "messageHeaderRegex":
"(<syslogPriority>(<=^<)\\d{1,4}(?=>)).*?(<timestamp>(<=>)[A-Za-z]
{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(<syslogHost>(<=\\s).*?(?=\\s))",
- "fields": [
- {
- "recordType": "kernel",
- "regex": ".*(<eventInfo>(<=\\]|\\w\\:).*?(?=$))"
- },
- {
- "recordType": "syslog",
- "regex":
".*(<processid>(<=PID\\s=\\s).*?(?=\\sLine)).*(<filePath>(<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))
(<fileName>.*?(?=\")).*(<eventInfo>(<=\").*?(?=$))"
- }
- ]
- ```
- **Note**: messageHeaderRegex and regex (withing fields) could be
specified as lists also e.g.
- ```json
- "messageHeaderRegex": [
+ * Grok parser: `org.apache.metron.parsers.GrokParser` with possible
`parserConfig` entries of
+ * `grokPath` : The path in HDFS (or in the Jar) to the grok statement
+ * `patternLabel` : The pattern label to use from the grok statement
+ * `multiLine` : The raw data passed in should be handled as a long
with multiple lines, with each line to be parsed separately. This setting's
valid values are 'true' or 'false'. The default if unset is 'false'. When set
the parser will handle multiple lines with successfully processed lines emitted
normally, and lines with errors sent to the error topic.
+ * `timestampField` : The field to use for timestamp
+ * `timeFields` : A list of fields to be treated as time
+ * `dateFormat` : The date format to use to parse the time fields
+ * `timezone` : The timezone to use. `UTC` is default.
+ * The Grok parser supports either 1 line to parse per incoming
message, or incoming messages with multiple log lines, and will produce a json
message per line
+ * CSV Parser: `org.apache.metron.parsers.csv.CSVParser` with possible
`parserConfig` entries of
+ * `timestampFormat` : The date format of the timestamp to use. If
unspecified, the parser assumes the timestamp is ms since unix epoch.
+ * `columns` : A map of column names you wish to extract from the CSV
to their offsets (e.g. `{ 'name' : 1, 'profession' : 3}` would be a column map
for extracting the 2nd and 4th columns from a CSV)
+ * `separator` : The column separator, `,` by default.
+ * JSON Map Parser: `org.apache.metron.parsers.json.JSONMapParser` with
possible `parserConfig` entries of
+ * `mapStrategy` : A strategy to indicate how to handle
multi-dimensional Maps. This is one of
+ * `DROP` : Drop fields which contain maps
+ * `UNFOLD` : Unfold inner maps. So `{ "foo" : { "bar" : 1} }`
would turn into `{"foo.bar" : 1}`
+ * `ALLOW` : Allow multidimensional maps
+ * `ERROR` : Throw an error when a multidimensional map is
encountered
+ * `jsonpQuery` : A [JSON Path](#json_path) query string. If present,
the result of the JSON Path query should be a list of messages. This is useful
if you have a JSON document which contains a list or array of messages embedded
in it, and you do not have another means of splitting the message.
+ * `wrapInEntityArray` : `"true" or "false"`. If `jsonQuery` is present
and this flag is present and set to `"true"`, the incoming message will be
wrapped in a JSON entity and array.
+ for example:
+ `{"name":"value"},{"name2","value2}` will be wrapped as `{"message"
: [{"name":"value"},{"name2","value2}]}`.
+ This is using the default value for `wrapEntityName` if that
property is not set.
+ * `wrapEntityName` : Sets the name to use when wrapping JSON using
`wrapInEntityArray`. The `jsonpQuery` should reference this name.
+ * A field called `timestamp` is expected to exist and, if it does not,
then current time is inserted.
+ * Regular Expressions Parser
+ * `recordTypeRegex` : A regular expression to uniquely identify a
record type.
+ * `messageHeaderRegex` : A regular expression used to extract fields
from a message part which is common across all the messages.
+ * `convertCamelCaseToUnderScore` : If this property is set to true,
this parser will automatically convert all the camel case property names to
underscore seperated. For example, following conversions will automatically
happen:
+
+ ```
+ ipSrcAddr -> ip_src_addr
+ ipDstAddr -> ip_dst_addr
+ ipSrcPort -> ip_src_port
+ ```
+
+ Note this property may be necessary, because java does not support
underscores in the named group names. So in case your property naming
conventions requires underscores in property names, use this property.
+
+ * `fields` : A json list of maps contaning a record type to regular
expression mapping.
+
+ A complete configuration example would look like:
+
+ ```json
+ "convertCamelCaseToUnderScore": true,
+ "recordTypeRegex": "kernel|syslog",
+ "messageHeaderRegex":
"(<syslogPriority>(<=^<)\\d{1,4}(?=>)).*?(<timestamp>(<=>)[A-Za-z]
{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(<syslogHost>(<=\\s).*?(?=\\s))",
+ "fields": [
+ {
+ "recordType": "kernel",
+ "regex": ".*(<eventInfo>(<=\\]|\\w\\:).*?(?=$))"
+ },
+ {
+ "recordType": "syslog",
+ "regex":
".*(<processid>(<=PID\\s=\\s).*?(?=\\sLine)).*(<filePath>(<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))
(<fileName>.*?(?=\")).*(<eventInfo>(<=\").*?(?=$))"
+ }
+ ]
+ ```
+
+ **Note**: messageHeaderRegex and regex (withing fields) could be
specified as lists also e.g.
+
+ ```json
+ "messageHeaderRegex": [
"regular expression 1",
"regular expression 2"
- ]
- ```
- Where **regular expression 1** are valid regular expressions and may
have named
- groups, which would be extracted into fields. This list will be
evaluated in order until a
- matching regular expression is found.
-
- **messageHeaderRegex** is run on all the messages.
- Yes, all the messages are expected to contain the fields which are being
extracted using the **messageHeaderRegex**.
- **messageHeaderRegex** is a sort of HCF (highest common factor) in all
messages.
-
- **recordTypeRegex** can be a more advanced regular expression containing
named goups. For example
-
- "recordTypeRegex":
"(<process>(<=\\s)\\b(kernel|syslog)\\b(?=\\[|:))"
-
- Here all the named goups (process in above example) will be extracted as
fields.
-
- Though having named group in recordType is completely optional, still
one could want extract named groups in recordType for following reasons:
-
- 1. Since **recordType** regular expression is already getting matched
and we are paying the price for a regular expression match already,
- we can extract certain fields as a by product of this match.
- 2. Most likely the **recordType** field is common across all the
messages. Hence having it extracted in the recordType (or messageHeaderRegex)
would
- reduce the overall complexity of regular expressions in the regex field.
-
- **regex** within a field could be a list of regular expressions also. In
this case all regular expressions in the list will be attempted to match until
a match is found. Once a full match is found remaining regular expressions are
ignored.
-
- ```json
- "regex": [ "record type specific regular expression 1",
- "record type specific regular expression 2"]
-
- ```
-
- **timesamp**
-
- Since this parser is a general purpose parser, it will populate the
timestamp field with current UTC timestamp. Actual timestamp value can be
overridden later using stellar.
- For example in case of syslog timestamps, one could use following
stellar construct to override the timestamp value.
- Let us say you parsed actual timestamp from the raw log:
-
- <38>Jun 20 15:01:17 hostName sshd[11672]: Accepted publickey for prod
from 55.55.55.55 port 66666 ssh2
-
- syslogTimestamp="Jun 20 15:01:17"
-
- Then something like below could be used to override the timestamp.
-
- ```
- "timestamp_str": "FORMAT('%s%s%s', YEAR(),' ',syslogTimestamp)",
- "timestamp":"TO_EPOCH_TIMESTAMP(timestamp_str, 'yyyy MMM dd HH:mm:ss' )"
- ```
-
- OR, if you want to factor in the timezone
-
- ```
- "timestamp":"TO_EPOCH_TIMESTAMP(timestamp_str, timestamp_format,
timezone_name )"
- ```
+ ]
+ ```
+
+ Where **regular expression 1** are valid regular expressions and may
have named
+ groups, which would be extracted into fields. This list will be
evaluated in order until a
+ matching regular expression is found.
+
+ **messageHeaderRegex** is run on all the messages.
+ Yes, all the messages are expected to contain the fields which are
being extracted using the **messageHeaderRegex**.
+ **messageHeaderRegex** is a sort of HCF (highest common factor) in all
messages.
+
+ **recordTypeRegex** can be a more advanced regular expression
containing named goups. For example
+
+ "recordTypeRegex":
"(<process>(<=\\s)\\b(kernel|syslog)\\b(?=\\[|:))"
+
+ Here all the named goups (process in above example) will be extracted
as fields.
+
+ Though having named group in recordType is completely optional, still
one could want extract named groups in recordType for following reasons:
+
+ 1. Since **recordType** regular expression is already getting matched
and we are paying the price for a regular expression match already,
+ we can extract certain fields as a by product of this match.
+ 2. Most likely the **recordType** field is common across all the
messages. Hence having it extracted in the recordType (or messageHeaderRegex)
would
+ reduce the overall complexity of regular expressions in the regex
field.
+
+ **regex** within a field could be a list of regular expressions also.
In this case all regular expressions in the list will be attempted to match
until a match is found. Once a full match is found remaining regular
expressions are ignored.
+
+ ```json
+ "regex": [ "record type specific regular expression 1",
+ "record type specific regular expression 2"]
+ ```
+
+ **timesamp**
+
+ Since this parser is a general purpose parser, it will populate the
timestamp field with current UTC timestamp. Actual timestamp value can be
overridden later using stellar.
+ For example in case of syslog timestamps, one could use following
stellar construct to override the timestamp value.
+ Let us say you parsed actual timestamp from the raw log:
+
+ `<38>Jun 20 15:01:17 hostName sshd[11672]: Accepted publickey for prod
from 55.55.55.55 port 66666 ssh2`
+
+ syslogTimestamp="Jun 20 15:01:17"
+
+ Then something like below could be used to override the timestamp.
+
+ ```
+ "timestamp_str": "FORMAT('%s%s%s', YEAR(),' ',syslogTimestamp)",
+ "timestamp":"TO_EPOCH_TIMESTAMP(timestamp_str, 'yyyy MMM dd HH:mm:ss'
)"
+ ```
+
+ OR, if you want to factor in the timezone
+
+ ```
+ "timestamp":"TO_EPOCH_TIMESTAMP(timestamp_str, timestamp_format,
timezone_name )"
+ ```
## Parser Error Routing
@@ -204,15 +206,15 @@ So putting it all together a typical Metron message with
all 5-tuple fields pres
```json
{
-"message":
-{"ip_src_addr": xxxx,
-"ip_dst_addr": xxxx,
-"ip_src_port": xxxx,
-"ip_dst_port": xxxx,
-"protocol": xxxx,
-"original_string": xxx,
-"additional-field 1": xxx,
-}
+ "message": {
+ "ip_src_addr": xxxx,
+ "ip_dst_addr": xxxx,
+ "ip_src_port": xxxx,
+ "ip_dst_port": xxxx,
+ "protocol": xxxx,
+ "original_string": xxx,
+ "additional-field 1": xxx
+ }
}
```
@@ -246,16 +248,19 @@ The document is structured in the following way
* `parserClassName` : The fully qualified classname for the parser to be used.
* `filterClassName` : The filter to use. This may be a fully qualified
classname of a Class that implements the
`org.apache.metron.parsers.interfaces.MessageFilter<JSONObject>` interface.
Message Filters are intended to allow the user to ignore a set of messages via
custom logic. The existing implementations are:
- * `STELLAR` : Allows you to apply a stellar statement which returns a
boolean, which will pass every message for which the statement returns `true`.
The Stellar statement that is to be applied is specified by the `filter.query`
property in the `parserConfig`.
-Example Stellar Filter which includes messages which contain a the `field1`
field:
-```
- {
- "filterClassName" : "STELLAR"
- ,"parserConfig" : {
- "filter.query" : "exists(field1)"
- }
- }
-```
+ * `STELLAR` : Allows you to apply a stellar statement which returns a
boolean, which will pass every message for which the statement returns `true`.
The Stellar statement that is to be applied is specified by the `filter.query`
property in the `parserConfig`.
+
+ Example Stellar Filter which includes messages which contain a the
`field1` field:
+
+ ```
+ {
+ "filterClassName" : "STELLAR",
+ "parserConfig" : {
+ "filter.query" : "exists(field1)"
+ }
+ }
+ ```
+
* `sensorTopic` : The kafka topic to send the parsed messages to. If the
topic is prefixed and suffixed by `/`
then it is assumed to be a regex and will match any topic matching the pattern
(e.g. `/bro.*/` would match `bro_cust0`, `bro_cust1` and `bro_cust2`)
* `readMetadata` : Boolean indicating whether to read metadata or not (The
default is raw message strategy dependent). See below for a discussion about
metadata.
@@ -263,26 +268,27 @@ then it is assumed to be a regex and will match any topic
matching the pattern (
* `rawMessageStrategy` : The strategy to use when reading the raw data and
metadata. See below for a discussion about message reading strategies.
* `rawMessageStrategyConfig` : The raw message strategy configuration map.
See below for a discussion about message reading strategies.
* `parserConfig` : A JSON Map representing the parser implementation specific
configuration. Also include batch sizing and timeout for writer configuration
here.
- * `batchSize` : Integer indicating number of records to batch together
before sending to the writer. (default to `15`)
- * `batchTimeout` : The timeout after which a batch will be flushed even if
batchSize has not been met. Optional.
- If unspecified, or set to `0`, it defaults to a system-determined duration
which is a fraction of the Storm
- parameter `topology.message.timeout.secs`. Ignored if batchSize is `1`,
since this disables batching.
- * The kafka writer can be configured within the parser config as well.
(This is all configured a priori, but this is convenient for overriding the
settings). See [here](../../metron-writer/README.md#kafka-writer)
+ * `batchSize` : Integer indicating number of records to batch together
before sending to the writer. (default to `15`)
+ * `batchTimeout` : The timeout after which a batch will be flushed even if
batchSize has not been met. Optional.
+ If unspecified, or set to `0`, it defaults to a system-determined
duration which is a fraction of the Storm
+ parameter `topology.message.timeout.secs`. Ignored if batchSize is `1`,
since this disables batching.
+ * The kafka writer can be configured within the parser config as well.
(This is all configured a priori, but this is convenient for overriding the
settings). See [here](../../metron-writer/README.md#kafka-writer)
* `fieldTransformations` : An array of complex objects representing the
transformations to be done on the message generated from the parser before
writing out to the kafka topic.
* `securityProtocol` : The security protocol to use for reading from kafka
(this is a string). This can be overridden on the command line and also
specified in the spout config via the `security.protocol` key. If both are
specified, then they are merged and the CLI will take precedence. If multiple
sensors are used, any non "PLAINTEXT" value will be used.
* `cacheConfig` : Cache config for stellar field transformations. This
configures a least frequently used cache. This is a map with the following
keys. If not explicitly configured (the default), then no cache will be used.
- * `stellar.cache.maxSize` - The maximum number of elements in the cache.
Default is to not use a cache.
- * `stellar.cache.maxTimeRetain` - The maximum amount of time an element is
kept in the cache (in minutes). Default is to not use a cache.
+ * `stellar.cache.maxSize` - The maximum number of elements in the cache.
Default is to not use a cache.
+ * `stellar.cache.maxTimeRetain` - The maximum amount of time an element is
kept in the cache (in minutes). Default is to not use a cache.
- Example of a cache config to contain at max `20000` stellar expressions for
at most `20` minutes.:
-```
-{
- "cacheConfig" : {
- "stellar.cache.maxSize" : 20000,
- "stellar.cache.maxTimeRetain" : 20
- }
-}
-```
+ Example of a cache config to contain at max `20000` stellar
expressions for at most `20` minutes.:
+
+ ```
+ {
+ "cacheConfig" : {
+ "stellar.cache.maxSize" : 20000,
+ "stellar.cache.maxTimeRetain" : 20
+ }
+ }
+ ```
The `fieldTransformations` is a complex object which defines a
transformation which can be done to a message. This transformation can
@@ -298,36 +304,34 @@ For platform specific configs, see the README of the
appropriate project. This w
Metadata is a useful thing to send to Metron and use during enrichment or
threat intelligence.
Consider the following scenarios:
* You have multiple telemetry sources of the same type that you want to
- * ensure downstream analysts can differentiate
- * ensure profiles consider independently as they have different seasonality
or some other fundamental characteristic
+ * ensure downstream analysts can differentiate
+ * ensure profiles consider independently as they have different
seasonality or some other fundamental characteristic
As such, there are two types of metadata that we seek to support in Metron:
* Environmental metadata : Metadata about the system at large
- * Consider the possibility that you have multiple kafka topics being
processed by one parser and you want to tag the messages with the kafka topic
- * At the moment, only the kafka topic is kept as the field name.
+ * Consider the possibility that you have multiple kafka topics being
processed by one parser and you want to tag the messages with the kafka topic
+ * At the moment, only the kafka topic is kept as the field name.
* Custom metadata: Custom metadata from an individual telemetry source that
one might want to use within Metron.
Metadata is controlled by the following parser configs:
-* `rawMessageStrategy` : This is a strategy which indicates how to read
- data and metadata. The strategies supported are:
- * `DEFAULT` : Data is read directly from the kafka record value and
metadata, if any, is read from the kafka record key. This strategy defaults to
not reading metadata and not merging metadata. This is the default strategy.
- * `ENVELOPE` : Data from kafka record value is presumed to be a JSON blob.
One of
- these fields must contain the raw data to pass to the parser. All other
fields should be considered metadata. The field containing the raw data is
specified in the `rawMessageStrategyConfig`. Data held in the kafka key as
well as the non-data fields in the JSON blob passed into the kafka value are
considered metadata. Note that the exception to this is that any
`original_string` field is inherited from the envelope data so that the
original string contains the envelope data. If y [...]
+* `rawMessageStrategy` : This is a strategy which indicates how to read data
and metadata. The strategies supported are:
+ * `DEFAULT` : Data is read directly from the kafka record value and
metadata, if any, is read from the kafka record key. This strategy defaults to
not reading metadata and not merging metadata. This is the default strategy.
+ * `ENVELOPE` : Data from kafka record value is presumed to be a JSON blob.
One of
+ these fields must contain the raw data to pass to the parser. All other
fields should be considered metadata. The field containing the raw data is
specified in the `rawMessageStrategyConfig`. Data held in the kafka key as
well as the non-data fields in the JSON blob passed into the kafka value are
considered metadata. Note that the exception to this is that any
`original_string` field is inherited from the envelope data so that the
original string contains the envelope data. If [...]
* `rawMessageStrategyConfig` : The configuration (a map) for the
`rawMessageStrategy`. Available configurations are strategy dependent:
- * `DEFAULT`
- * `metadataPrefix` defines the key prefix for metadata (default is
`metron.metadata`).
- * `ENVELOPE`
- * `metadataPrefix` defines the key prefix for metadata (default is
`metron.metadata`)
- * `messageField` defines the field from the envelope to use as the data.
All other fields are considered metadata.
+ * `DEFAULT`
+ * `metadataPrefix` defines the key prefix for metadata (default is
`metron.metadata`).
+ * `ENVELOPE`
+ * `metadataPrefix` defines the key prefix for metadata (default is
`metron.metadata`)
+ * `messageField` defines the field from the envelope to use as the
data. All other fields are considered metadata.
* `readMetadata` : This is a boolean indicating whether metadata will be read
and made available to Field
transformations (i.e. Stellar field transformations). The default is
dependent upon the `rawMessageStrategy`:
- * `DEFAULT` : default to `false`.
- * `ENVELOPE` : default to `true`.
+ * `DEFAULT` : default to `false`.
+ * `ENVELOPE` : default to `true`.
* `mergeMetadata` : This is a boolean indicating whether metadata fields will
be merged with the message automatically. That is to say, if this property is
set to `true` then every metadata field will become part of the messages and,
consequently, also available for use in field transformations. The default is
dependent upon the `rawMessageStrategy`:
- * `DEFAULT` : default to `false`.
- * `ENVELOPE` : default to `true`.
-
+ * `DEFAULT` : default to `false`.
+ * `ENVELOPE` : default to `true`.
#### Field Naming
@@ -359,119 +363,125 @@ The format of a `fieldTransformation` is as follows:
The currently implemented fieldTransformations are:
* `REMOVE` : This transformation removes the specified input fields. If you
want a conditional removal, you can pass a Metron Query Language statement to
define the conditions under which you want to remove the fields.
-Consider the following simple configuration which will remove `field1`
-unconditionally:
-```
-{
-...
- "fieldTransformations" : [
- {
- "input" : "field1"
- , "transformation" : "REMOVE"
- }
- ]
-}
-```
+ Consider the following simple configuration which will remove `field1`
+ unconditionally:
-Consider the following simple sensor parser configuration which will remove
`field1`
-whenever `field2` exists and whose corresponding equal to 'foo':
-```
-{
-...
- "fieldTransformations" : [
- {
- "input" : "field1"
- , "transformation" : "REMOVE"
- , "config" : {
- "condition" : "exists(field2) and field2 == 'foo'"
- }
- }
- ]
-}
-```
+ ```
+ {
+ ...
+ "fieldTransformations" : [
+ {
+ "input" : "field1"
+ , "transformation" : "REMOVE"
+ }
+ ]
+ }
+ ```
+
+ Consider the following simple sensor parser configuration which will
remove `field1`
+ whenever `field2` exists and whose corresponding equal to 'foo':
+
+ ```
+ {
+ ...
+ "fieldTransformations" : [
+ {
+ "input" : "field1"
+ , "transformation" : "REMOVE"
+ , "config" : {
+ "condition" : "exists(field2) and field2 == 'foo'"
+ }
+ }
+ ]
+ }
+ ```
* `SELECT`: This transformation filters the fields in the message to include
only the configured output fields, and drops any not explicitly included.
-For example:
-```
-{
-...
- "fieldTransformations" : [
- {
- "output" : ["field1", "field2" ]
- , "transformation" : "SELECT"
- }
- ]
-}
-```
+ For example:
+
+ ```
+ {
+ ...
+ "fieldTransformations" : [
+ {
+ "output" : ["field1", "field2" ]
+ , "transformation" : "SELECT"
+ }
+ ]
+ }
+ ```
-when applied to a message containing keys field1, field2 and field3, will only
output the first two. It is also worth noting that two standard fields -
timestamp and original_source - will always be passed along whether they are
listed in output or not, since they are considered core required fields.
+ when applied to a message containing keys field1, field2 and field3, will
only output the first two. It is also worth noting that two standard fields -
timestamp and original_source - will always be passed along whether they are
listed in output or not, since they are considered core required fields.
* `IP_PROTOCOL` : This transformation maps IANA protocol numbers to consistent
string representations.
-Consider the following sensor parser config to map the `protocol` field
-to a textual representation of the protocol:
-```
-{
-...
- "fieldTransformations" : [
- {
- "input" : "protocol"
- , "transformation" : "IP_PROTOCOL"
- }
- ]
-}
-```
+ Consider the following sensor parser config to map the `protocol` field
+ to a textual representation of the protocol:
+
+ ```
+ {
+ ...
+ "fieldTransformations" : [
+ {
+ "input" : "protocol"
+ , "transformation" : "IP_PROTOCOL"
+ }
+ ]
+ }
+ ```
-This transformation would transform `{ "protocol" : 6, "source.type" : "bro",
... }`
-into `{ "protocol" : "TCP", "source.type" : "bro", ...}`
+ This transformation would transform `{ "protocol" : 6, "source.type" :
"bro", ... }`
+ into `{ "protocol" : "TCP", "source.type" : "bro", ...}`
-* `STELLAR` : This transformation executes a set of transformations
- expressed as [Stellar Language](../../metron-common) statements.
+* `STELLAR` : This transformation executes a set of transformations expressed
as [Stellar Language](../../metron-common) statements.
* `RENAME` : This transformation allows users to rename a set of fields.
Specifically,
the config is presumed to be the mapping. The keys to the config are the
existing field names
and the values for the config map are the associated new field name.
-The following config will rename the fields `old_field` and
`different_old_field` to
-`new_field` and `different_new_field` respectively:
-```
-{
-...
- "fieldTransformations" : [
- {
- "transformation" : "RENAME",
- , "config" : {
- "old_field" : "new_field",
- "different_old_field" : "different_new_field"
- }
- }
- ]
-}
-```
+ The following config will rename the fields `old_field` and
`different_old_field` to
+ `new_field` and `different_new_field` respectively:
+
+ ```
+ {
+ ...
+ "fieldTransformations" : [
+ {
+ "transformation" : "RENAME",
+ , "config" : {
+ "old_field" : "new_field",
+ "different_old_field" : "different_new_field"
+ }
+ }
+ ]
+ }
+ ```
+
* `REGEX_SELECT` : This transformation lets users set an output field to one
of a set of possibilities based on matching regexes. This transformation is
useful when the number or conditions are large enough to make a stellar
language match statement unwieldy.
-The following config will set the field `logical_source_type` to one of the
-following, dependent upon the value of the `pix_type` field:
-* `cisco-6-302` if `pix_type` starts with either `6-302` or `06-302`
-* `cisco-5-304` if `pix_type` starts with `5-304`
-```
-{
-...
- "fieldTransformations" : [
+ The following config will set the field `logical_source_type` to one of the
+ following, dependent upon the value of the `pix_type` field:
+ * `cisco-6-302` if `pix_type` starts with either `6-302` or `06-302`
+ * `cisco-5-304` if `pix_type` starts with `5-304`
+
+ ```
{
- "transformation" : "REGEX_ROUTING"
- ,"input" : "pix_type"
- ,"output" : "logical_source_type"
- ,"config" : {
- "cisco-6-302" : [ "^6-302.*", "^06-302.*"]
- "cisco-5-304" : "^5-304.*"
- }
+ ...
+ "fieldTransformations" : [
+ {
+ "transformation" : "REGEX_ROUTING"
+ ,"input" : "pix_type"
+ ,"output" : "logical_source_type"
+ ,"config" : {
+ "cisco-6-302" : [ "^6-302.*", "^06-302.*"]
+ "cisco-5-304" : "^5-304.*"
+ }
+ }
+ ]
+ ...
}
- ]
-...
-}
-```
+ ```
### Assignment to `null`
diff --git
a/metron-platform/metron-parsing/metron-parsers-common/parser_arch.png
b/metron-platform/metron-parsing/parser_arch.png
similarity index 100%
rename from metron-platform/metron-parsing/metron-parsers-common/parser_arch.png
rename to metron-platform/metron-parsing/parser_arch.png
diff --git a/site-book/bin/generate-md.sh b/site-book/bin/generate-md.sh
index 60549f8..7ebb5f6 100755
--- a/site-book/bin/generate-md.sh
+++ b/site-book/bin/generate-md.sh
@@ -64,7 +64,7 @@ RESOURCE_LIST=(
metron-deployment/readme-images/enable-kerberos-started.png
metron-deployment/readme-images/enable-kerberos.png
metron-platform/metron-job/metron-job_state_statechart_diagram.svg
- metron-platform/metron-parsing/metron-parsers-common/parser_arch.png
+ metron-platform/metron-parsing/parser_arch.png
metron-platform/metron-indexing/indexing_arch.png
metron-platform/metron-enrichment/enrichment_arch.png
metron-analytics/metron-maas-service/maas_arch.png
@@ -96,8 +96,8 @@ HREF_REWRITE_LIST=(
metron-platform/metron-enrichment/README.md
's#(enrichment_arch.png)#(../../images/enrichment_arch.png)#g'
metron-platform/metron-indexing/README.md
's#(indexing_arch.png)#(../../images/indexing_arch.png)#g'
metron-platform/metron-job/README.md
's#(metron-job_state_statechart_diagram.svg)#(../../images/metron-job_state_statechart_diagram.svg)#g'
- metron-platform/metron-parsing/metron-parsers-common/README.md
's#(parser_arch.png)#(../../images/parser_arch.png)#g'
- metron-platform/metron-parsing/metron-parsers-common/ParserChaining.md
's#(../../use-cases/parser_chaining/message_routing_high_level.svg)#(../../images/message_routing_high_level.svg)#g'
+ metron-platform/metron-parsing/README.md
's#(parser_arch.png)#(../../images/parser_arch.png)#g'
+ metron-platform/metron-parsing/metron-parsers-common/ParserChaining.md
's#(../../../use-cases/parser_chaining/message_routing_high_level.svg)#(../../../images/message_routing_high_level.svg)#g'
metron-analytics/metron-maas-service/README.md
's#(maas_arch.png)#(../../images/maas_arch.png)#g'
metron-contrib/metron-performance/README.md
's#(performance_measurement.png)#(../../images/performance_measurement.png)#g'
use-cases/forensic_clustering/README.md
's#(find_alerts.png)#(../../images/find_alerts.png)#g'