[metron] branch master updated: METRON-1950: Site-book generation broken in master (mmiklavc) closes apache/metron#1309

mmiklavcic Thu, 20 Dec 2018 11:23:30 -0800

This is an automated email from the ASF dual-hosted git repository.

mmiklavcic pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metron.git



The following commit(s) were added to refs/heads/master by this push:
     new 9e026e3  METRON-1950: Site-book generation broken in master (mmiklavc) 
closes apache/metron#1309
9e026e3 is described below

commit 9e026e3e902769dae364b4c4acf64e00839d24f5
Author: mmiklavc <[email protected]>
AuthorDate: Thu Dec 20 12:22:19 2018 -0700

    METRON-1950: Site-book generation broken in master (mmiklavc) closes 
apache/metron#1309
---
 metron-platform/metron-parsing/README.md           | 536 +++++++++++----------
 .../{metron-parsers-common => }/parser_arch.png    | Bin
 site-book/bin/generate-md.sh                       |   6 +-
 3 files changed, 276 insertions(+), 266 deletions(-)

diff --git a/metron-platform/metron-parsing/README.md 
b/metron-platform/metron-parsing/README.md
index 76b6168..9a46532 100644
--- a/metron-platform/metron-parsing/README.md
+++ b/metron-platform/metron-parsing/README.md
@@ -21,127 +21,129 @@ limitations under the License.
 
 Parsers are pluggable components which are used to transform raw data
 (textual or raw bytes) into JSON messages suitable for downstream
-enrichment and indexing.  
+enrichment and indexing.
 
 There are two general types types of parsers:
 * A parser written in Java which conforms to the `MessageParser` interface.  
This kind of parser is optimized for speed and performance and is built for use 
with higher velocity topologies.  These parsers are not easily modifiable and 
in order to make changes to them the entire topology need to be recompiled.  
 * A general purpose parser.  This type of parser is primarily designed for 
lower-velocity topologies or for quickly standing up a parser for a new 
telemetry before a permanent Java parser can be written for it.  As of the time 
of this writing, we have:
-  * Grok parser: `org.apache.metron.parsers.GrokParser` with possible 
`parserConfig` entries of 
-    * `grokPath` : The path in HDFS (or in the Jar) to the grok statement
-    * `patternLabel` : The pattern label to use from the grok statement
-    * `multiLine` : The raw data passed in should be handled as a long with 
multiple lines, with each line to be parsed separately. This setting's valid 
values are 'true' or 'false'.  The default if unset is 'false'. When set the 
parser will handle multiple lines with successfully processed lines emitted 
normally, and lines with errors sent to the error topic.
-    * `timestampField` : The field to use for timestamp
-    * `timeFields` : A list of fields to be treated as time
-    * `dateFormat` : The date format to use to parse the time fields
-    * `timezone` : The timezone to use. `UTC` is default.
-    * The Grok parser supports either 1 line to parse per incoming message, or 
incoming messages with multiple log lines, and will produce a json message per 
line
-  * CSV Parser: `org.apache.metron.parsers.csv.CSVParser` with possible 
`parserConfig` entries of
-    * `timestampFormat` : The date format of the timestamp to use.  If 
unspecified, the parser assumes the timestamp is ms since unix epoch.
-    * `columns` : A map of column names you wish to extract from the CSV to 
their offsets (e.g. `{ 'name' : 1, 'profession' : 3}`  would be a column map 
for extracting the 2nd and 4th columns from a CSV)
-    * `separator` : The column separator, `,` by default.
-  * JSON Map Parser: `org.apache.metron.parsers.json.JSONMapParser` with 
possible `parserConfig` entries of
-    * `mapStrategy` : A strategy to indicate how to handle multi-dimensional 
Maps.  This is one of
-      * `DROP` : Drop fields which contain maps
-      * `UNFOLD` : Unfold inner maps.  So `{ "foo" : { "bar" : 1} }` would 
turn into `{"foo.bar" : 1}`
-      * `ALLOW` : Allow multidimensional maps
-      * `ERROR` : Throw an error when a multidimensional map is encountered
-    * `jsonpQuery` : A [JSON Path](#json_path) query string. If present, the 
result of the JSON Path query should be a list of messages. This is useful if 
you have a JSON document which contains a list or array of messages embedded in 
it, and you do not have another means of splitting the message.
-    * `wrapInEntityArray` : `"true" or "false"`. If `jsonQuery` is present and 
this flag is present and set to `"true"`, the incoming message will be wrapped 
in a JSON  entity and array.
-       for example:
-       `{"name":"value"},{"name2","value2}` will be wrapped as `{"message" : 
[{"name":"value"},{"name2","value2}]}`.
-       This is using the default value for `wrapEntityName` if that property 
is not set.
-    * `wrapEntityName` : Sets the name to use when wrapping JSON using 
`wrapInEntityArray`.  The `jsonpQuery` should reference this name.
-    * A field called `timestamp` is expected to exist and, if it does not, 
then current time is inserted.  
-  * Regular Expressions Parser
-      * `recordTypeRegex` : A regular expression to uniquely identify a record 
type.
-      * `messageHeaderRegex` : A regular expression used to extract fields 
from a message part which is common across all the messages.
-      * `convertCamelCaseToUnderScore` : If this property is set to true, this 
parser will automatically convert all the camel case property names to 
underscore seperated. 
-          For example, following convertions will automatically happen:
-
-          ```
-          ipSrcAddr -> ip_src_addr
-          ipDstAddr -> ip_dst_addr
-          ipSrcPort -> ip_src_port
-          ```
-          Note this property may be necessary, because java does not support 
underscores in the named group names. So in case your property naming 
conventions requires underscores in property names, use this property.
-          
-      * `fields` : A json list of maps contaning a record type to regular 
expression mapping.
-      
-      A complete configuration example would look like:
-      
-      ```json
-      "convertCamelCaseToUnderScore": true, 
-      "recordTypeRegex": "kernel|syslog",
-      "messageHeaderRegex": 
"(<syslogPriority>(<=^&lt;)\\d{1,4}(?=>)).*?(<timestamp>(<=>)[A-Za-z] 
{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(<syslogHost>(<=\\s).*?(?=\\s))",
-      "fields": [
-        {
-          "recordType": "kernel",
-          "regex": ".*(<eventInfo>(<=\\]|\\w\\:).*?(?=$))"
-        },
-        {
-          "recordType": "syslog",
-          "regex": 
".*(<processid>(<=PID\\s=\\s).*?(?=\\sLine)).*(<filePath>(<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))
        (<fileName>.*?(?=\")).*(<eventInfo>(<=\").*?(?=$))"
-        }
-      ]
-      ```
-      **Note**: messageHeaderRegex and regex (withing fields) could be 
specified as lists also e.g.
-      ```json
-          "messageHeaderRegex": [
+    * Grok parser: `org.apache.metron.parsers.GrokParser` with possible 
`parserConfig` entries of
+        * `grokPath` : The path in HDFS (or in the Jar) to the grok statement
+        * `patternLabel` : The pattern label to use from the grok statement
+        * `multiLine` : The raw data passed in should be handled as a long 
with multiple lines, with each line to be parsed separately. This setting's 
valid values are 'true' or 'false'.  The default if unset is 'false'. When set 
the parser will handle multiple lines with successfully processed lines emitted 
normally, and lines with errors sent to the error topic.
+        * `timestampField` : The field to use for timestamp
+        * `timeFields` : A list of fields to be treated as time
+        * `dateFormat` : The date format to use to parse the time fields
+        * `timezone` : The timezone to use. `UTC` is default.
+        * The Grok parser supports either 1 line to parse per incoming 
message, or incoming messages with multiple log lines, and will produce a json 
message per line
+    * CSV Parser: `org.apache.metron.parsers.csv.CSVParser` with possible 
`parserConfig` entries of
+        * `timestampFormat` : The date format of the timestamp to use.  If 
unspecified, the parser assumes the timestamp is ms since unix epoch.
+        * `columns` : A map of column names you wish to extract from the CSV 
to their offsets (e.g. `{ 'name' : 1, 'profession' : 3}`  would be a column map 
for extracting the 2nd and 4th columns from a CSV)
+        * `separator` : The column separator, `,` by default.
+    * JSON Map Parser: `org.apache.metron.parsers.json.JSONMapParser` with 
possible `parserConfig` entries of
+        * `mapStrategy` : A strategy to indicate how to handle 
multi-dimensional Maps.  This is one of
+            * `DROP` : Drop fields which contain maps
+            * `UNFOLD` : Unfold inner maps.  So `{ "foo" : { "bar" : 1} }` 
would turn into `{"foo.bar" : 1}`
+            * `ALLOW` : Allow multidimensional maps
+            * `ERROR` : Throw an error when a multidimensional map is 
encountered
+        * `jsonpQuery` : A [JSON Path](#json_path) query string. If present, 
the result of the JSON Path query should be a list of messages. This is useful 
if you have a JSON document which contains a list or array of messages embedded 
in it, and you do not have another means of splitting the message.
+        * `wrapInEntityArray` : `"true" or "false"`. If `jsonQuery` is present 
and this flag is present and set to `"true"`, the incoming message will be 
wrapped in a JSON  entity and array.
+           for example:
+           `{"name":"value"},{"name2","value2}` will be wrapped as `{"message" 
: [{"name":"value"},{"name2","value2}]}`.
+           This is using the default value for `wrapEntityName` if that 
property is not set.
+        * `wrapEntityName` : Sets the name to use when wrapping JSON using 
`wrapInEntityArray`.  The `jsonpQuery` should reference this name.
+        * A field called `timestamp` is expected to exist and, if it does not, 
then current time is inserted.
+    * Regular Expressions Parser
+        * `recordTypeRegex` : A regular expression to uniquely identify a 
record type.
+        * `messageHeaderRegex` : A regular expression used to extract fields 
from a message part which is common across all the messages.
+        * `convertCamelCaseToUnderScore` : If this property is set to true, 
this parser will automatically convert all the camel case property names to 
underscore seperated. For example, following conversions will automatically 
happen:
+
+            ```
+            ipSrcAddr -> ip_src_addr
+            ipDstAddr -> ip_dst_addr
+            ipSrcPort -> ip_src_port
+            ```
+
+            Note this property may be necessary, because java does not support 
underscores in the named group names. So in case your property naming 
conventions requires underscores in property names, use this property.
+
+        * `fields` : A json list of maps contaning a record type to regular 
expression mapping.
+
+        A complete configuration example would look like:
+
+        ```json
+        "convertCamelCaseToUnderScore": true,
+        "recordTypeRegex": "kernel|syslog",
+        "messageHeaderRegex": 
"(<syslogPriority>(<=^&lt;)\\d{1,4}(?=>)).*?(<timestamp>(<=>)[A-Za-z] 
{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(<syslogHost>(<=\\s).*?(?=\\s))",
+        "fields": [
+          {
+            "recordType": "kernel",
+            "regex": ".*(<eventInfo>(<=\\]|\\w\\:).*?(?=$))"
+          },
+          {
+            "recordType": "syslog",
+            "regex": 
".*(<processid>(<=PID\\s=\\s).*?(?=\\sLine)).*(<filePath>(<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))
        (<fileName>.*?(?=\")).*(<eventInfo>(<=\").*?(?=$))"
+          }
+        ]
+        ```
+
+        **Note**: messageHeaderRegex and regex (withing fields) could be 
specified as lists also e.g.
+
+        ```json
+        "messageHeaderRegex": [
           "regular expression 1",
           "regular expression 2"
-          ]
-      ```
-      Where **regular expression 1** are valid regular expressions and may 
have named
-      groups, which would be extracted into fields. This list will be 
evaluated in order until a
-      matching regular expression is found.
-      
-      **messageHeaderRegex** is run on all the messages.
-      Yes, all the messages are expected to contain the fields which are being 
extracted using the **messageHeaderRegex**.
-      **messageHeaderRegex** is a sort of HCF (highest common factor) in all 
messages.
-      
-      **recordTypeRegex** can be a more advanced regular expression containing 
named goups. For example
-  
-      "recordTypeRegex": 
"(&lt;process&gt;(<=\\s)\\b(kernel|syslog)\\b(?=\\[|:))"
-      
-      Here all the named goups (process in above example) will be extracted as 
fields.
-
-      Though having named group in recordType is completely optional, still 
one could want extract named groups in recordType for following reasons:
-
-      1. Since **recordType** regular expression is already getting matched 
and we are paying the price for a regular expression match already,
-      we can extract certain fields as a by product of this match.
-      2. Most likely the **recordType** field is common across all the 
messages. Hence having it extracted in the recordType (or messageHeaderRegex) 
would
-      reduce the overall complexity of regular expressions in the regex field.
-      
-      **regex** within a field could be a list of regular expressions also. In 
this case all regular expressions in the list will be attempted to match until 
a match is found. Once a full match is found remaining regular expressions are 
ignored.
-  
-      ```json
-          "regex":  [ "record type specific regular expression 1",
-                      "record type specific regular expression 2"]
-
-      ```
-
-      **timesamp**
-
-      Since this parser is a general purpose parser, it will populate the 
timestamp field with current UTC timestamp. Actual timestamp value can be 
overridden later using stellar.
-      For example in case of syslog timestamps, one could use following 
stellar construct to override the timestamp value.
-      Let us say you parsed actual timestamp from the raw log:
-
-      <38>Jun 20 15:01:17 hostName sshd[11672]: Accepted publickey for prod 
from 55.55.55.55 port 66666 ssh2
-
-      syslogTimestamp="Jun 20 15:01:17"
-
-      Then something like below could be used to override the timestamp.
-
-      ```
-      "timestamp_str": "FORMAT('%s%s%s', YEAR(),' ',syslogTimestamp)",
-      "timestamp":"TO_EPOCH_TIMESTAMP(timestamp_str, 'yyyy MMM dd HH:mm:ss' )"
-      ```
-
-      OR, if you want to factor in the timezone
-
-      ```
-      "timestamp":"TO_EPOCH_TIMESTAMP(timestamp_str, timestamp_format, 
timezone_name )"
-      ```
+        ]
+        ```
+
+        Where **regular expression 1** are valid regular expressions and may 
have named
+        groups, which would be extracted into fields. This list will be 
evaluated in order until a
+        matching regular expression is found.
+
+        **messageHeaderRegex** is run on all the messages.
+        Yes, all the messages are expected to contain the fields which are 
being extracted using the **messageHeaderRegex**.
+        **messageHeaderRegex** is a sort of HCF (highest common factor) in all 
messages.
+
+        **recordTypeRegex** can be a more advanced regular expression 
containing named goups. For example
+
+        "recordTypeRegex": 
"(&lt;process&gt;(<=\\s)\\b(kernel|syslog)\\b(?=\\[|:))"
+
+        Here all the named goups (process in above example) will be extracted 
as fields.
+
+        Though having named group in recordType is completely optional, still 
one could want extract named groups in recordType for following reasons:
+
+        1. Since **recordType** regular expression is already getting matched 
and we are paying the price for a regular expression match already,
+        we can extract certain fields as a by product of this match.
+        2. Most likely the **recordType** field is common across all the 
messages. Hence having it extracted in the recordType (or messageHeaderRegex) 
would
+        reduce the overall complexity of regular expressions in the regex 
field.
+
+        **regex** within a field could be a list of regular expressions also. 
In this case all regular expressions in the list will be attempted to match 
until a match is found. Once a full match is found remaining regular 
expressions are ignored.
+
+        ```json
+        "regex":  [ "record type specific regular expression 1",
+                    "record type specific regular expression 2"]
+        ```
+
+        **timesamp**
+
+        Since this parser is a general purpose parser, it will populate the 
timestamp field with current UTC timestamp. Actual timestamp value can be 
overridden later using stellar.
+        For example in case of syslog timestamps, one could use following 
stellar construct to override the timestamp value.
+        Let us say you parsed actual timestamp from the raw log:
+
+        `<38>Jun 20 15:01:17 hostName sshd[11672]: Accepted publickey for prod 
from 55.55.55.55 port 66666 ssh2`
+
+        syslogTimestamp="Jun 20 15:01:17"
+
+        Then something like below could be used to override the timestamp.
+
+        ```
+        "timestamp_str": "FORMAT('%s%s%s', YEAR(),' ',syslogTimestamp)",
+        "timestamp":"TO_EPOCH_TIMESTAMP(timestamp_str, 'yyyy MMM dd HH:mm:ss' 
)"
+        ```
+
+        OR, if you want to factor in the timezone
+
+        ```
+        "timestamp":"TO_EPOCH_TIMESTAMP(timestamp_str, timestamp_format, 
timezone_name )"
+        ```
 
 ## Parser Error Routing
 
@@ -204,15 +206,15 @@ So putting it all together a typical Metron message with 
all 5-tuple fields pres
 
 ```json
 {
-"message": 
-{"ip_src_addr": xxxx, 
-"ip_dst_addr": xxxx, 
-"ip_src_port": xxxx, 
-"ip_dst_port": xxxx, 
-"protocol": xxxx, 
-"original_string": xxx,
-"additional-field 1": xxx,
-}
+  "message": {
+    "ip_src_addr": xxxx,
+    "ip_dst_addr": xxxx,
+    "ip_src_port": xxxx,
+    "ip_dst_port": xxxx,
+    "protocol": xxxx,
+    "original_string": xxx,
+    "additional-field 1": xxx
+  }
 }
 ```
 
@@ -246,16 +248,19 @@ The document is structured in the following way
 
 * `parserClassName` : The fully qualified classname for the parser to be used.
 * `filterClassName` : The filter to use.  This may be a fully qualified 
classname of a Class that implements the 
`org.apache.metron.parsers.interfaces.MessageFilter<JSONObject>` interface.  
Message Filters are intended to allow the user to ignore a set of messages via 
custom logic.  The existing implementations are:
-  * `STELLAR` : Allows you to apply a stellar statement which returns a 
boolean, which will pass every message for which the statement returns `true`.  
The Stellar statement that is to be applied is specified by the `filter.query` 
property in the `parserConfig`.
-Example Stellar Filter which includes messages which contain a the `field1` 
field:
-```
-   {
-    "filterClassName" : "STELLAR"
-   ,"parserConfig" : {
-    "filter.query" : "exists(field1)"
-    }
-   }
-```
+    * `STELLAR` : Allows you to apply a stellar statement which returns a 
boolean, which will pass every message for which the statement returns `true`.  
The Stellar statement that is to be applied is specified by the `filter.query` 
property in the `parserConfig`.
+
+        Example Stellar Filter which includes messages which contain a the 
`field1` field:
+
+        ```
+        {
+          "filterClassName" : "STELLAR",
+          "parserConfig" : {
+            "filter.query" : "exists(field1)"
+          }
+        }
+        ```
+
 * `sensorTopic` : The kafka topic to send the parsed messages to.  If the 
topic is prefixed and suffixed by `/` 
 then it is assumed to be a regex and will match any topic matching the pattern 
(e.g. `/bro.*/` would match `bro_cust0`, `bro_cust1` and `bro_cust2`)
 * `readMetadata` : Boolean indicating whether to read metadata or not (The 
default is raw message strategy dependent).  See below for a discussion about 
metadata.
@@ -263,26 +268,27 @@ then it is assumed to be a regex and will match any topic 
matching the pattern (
 * `rawMessageStrategy` : The strategy to use when reading the raw data and 
metadata.  See below for a discussion about message reading strategies.
 * `rawMessageStrategyConfig` : The raw message strategy configuration map.  
See below for a discussion about message reading strategies.
 * `parserConfig` : A JSON Map representing the parser implementation specific 
configuration. Also include batch sizing and timeout for writer configuration 
here.
-  * `batchSize` : Integer indicating number of records to batch together 
before sending to the writer. (default to `15`)
-  * `batchTimeout` : The timeout after which a batch will be flushed even if 
batchSize has not been met.  Optional.
-    If unspecified, or set to `0`, it defaults to a system-determined duration 
which is a fraction of the Storm
-    parameter `topology.message.timeout.secs`.  Ignored if batchSize is `1`, 
since this disables batching.
-  * The kafka writer can be configured within the parser config as well.  
(This is all configured a priori, but this is convenient for overriding the 
settings).  See [here](../../metron-writer/README.md#kafka-writer)
+    * `batchSize` : Integer indicating number of records to batch together 
before sending to the writer. (default to `15`)
+    * `batchTimeout` : The timeout after which a batch will be flushed even if 
batchSize has not been met.  Optional.
+      If unspecified, or set to `0`, it defaults to a system-determined 
duration which is a fraction of the Storm
+      parameter `topology.message.timeout.secs`.  Ignored if batchSize is `1`, 
since this disables batching.
+    * The kafka writer can be configured within the parser config as well.  
(This is all configured a priori, but this is convenient for overriding the 
settings).  See [here](../../metron-writer/README.md#kafka-writer)
 * `fieldTransformations` : An array of complex objects representing the 
transformations to be done on the message generated from the parser before 
writing out to the kafka topic.
 * `securityProtocol` : The security protocol to use for reading from kafka 
(this is a string).  This can be overridden on the command line and also 
specified in the spout config via the `security.protocol` key.  If both are 
specified, then they are merged and the CLI will take precedence. If multiple 
sensors are used, any non "PLAINTEXT" value will be used.
 * `cacheConfig` : Cache config for stellar field transformations.   This 
configures a least frequently used cache.  This is a map with the following 
keys.  If not explicitly configured (the default), then no cache will be used.
-  * `stellar.cache.maxSize` - The maximum number of elements in the cache. 
Default is to not use a cache.
-  * `stellar.cache.maxTimeRetain` - The maximum amount of time an element is 
kept in the cache (in minutes). Default is to not use a cache.
+    * `stellar.cache.maxSize` - The maximum number of elements in the cache. 
Default is to not use a cache.
+    * `stellar.cache.maxTimeRetain` - The maximum amount of time an element is 
kept in the cache (in minutes). Default is to not use a cache.
 
-  Example of a cache config to contain at max `20000` stellar expressions for 
at most `20` minutes.:
-```
-{
-  "cacheConfig" : {
-    "stellar.cache.maxSize" : 20000,
-    "stellar.cache.maxTimeRetain" : 20
-  }
-}
-```
+        Example of a cache config to contain at max `20000` stellar 
expressions for at most `20` minutes.:
+
+        ```
+        {
+          "cacheConfig" : {
+            "stellar.cache.maxSize" : 20000,
+            "stellar.cache.maxTimeRetain" : 20
+          }
+        }
+        ```
 
 The `fieldTransformations` is a complex object which defines a
 transformation which can be done to a message.  This transformation can 
@@ -298,36 +304,34 @@ For platform specific configs, see the README of the 
appropriate project. This w
 Metadata is a useful thing to send to Metron and use during enrichment or 
threat intelligence.  
 Consider the following scenarios:
 * You have multiple telemetry sources of the same type that you want to 
-  * ensure downstream analysts can differentiate
-  * ensure profiles consider independently as they have different seasonality 
or some other fundamental characteristic
+    * ensure downstream analysts can differentiate
+    * ensure profiles consider independently as they have different 
seasonality or some other fundamental characteristic
 
 As such, there are two types of metadata that we seek to support in Metron:
 * Environmental metadata : Metadata about the system at large
-   * Consider the possibility that you have multiple kafka topics being 
processed by one parser and you want to tag the messages with the kafka topic
-   * At the moment, only the kafka topic is kept as the field name.
+    * Consider the possibility that you have multiple kafka topics being 
processed by one parser and you want to tag the messages with the kafka topic
+    * At the moment, only the kafka topic is kept as the field name.
 * Custom metadata: Custom metadata from an individual telemetry source that 
one might want to use within Metron. 
 
 Metadata is controlled by the following parser configs:
-* `rawMessageStrategy` : This is a strategy which indicates how to read
-  data and metadata.  The strategies supported are:
-  * `DEFAULT` : Data is read directly from the kafka record value and 
metadata, if any, is read from the kafka record key.  This strategy defaults to 
not reading metadata and not merging metadata.  This is the default strategy.
-  * `ENVELOPE` : Data from kafka record value is presumed to be a JSON blob. 
One of
-    these fields must contain the raw data to pass to the parser.  All other 
fields should be considered metadata.  The field containing the raw data is 
specified in the `rawMessageStrategyConfig`.  Data held in the kafka key as 
well as the non-data fields in the JSON blob passed into the kafka value are 
considered metadata. Note that the exception to this is that any 
`original_string` field is inherited from the envelope data so that the 
original string contains the envelope data.  If y [...]
+* `rawMessageStrategy` : This is a strategy which indicates how to read data 
and metadata.  The strategies supported are:
+    * `DEFAULT` : Data is read directly from the kafka record value and 
metadata, if any, is read from the kafka record key.  This strategy defaults to 
not reading metadata and not merging metadata.  This is the default strategy.
+    * `ENVELOPE` : Data from kafka record value is presumed to be a JSON blob. 
One of
+      these fields must contain the raw data to pass to the parser.  All other 
fields should be considered metadata.  The field containing the raw data is 
specified in the `rawMessageStrategyConfig`.  Data held in the kafka key as 
well as the non-data fields in the JSON blob passed into the kafka value are 
considered metadata. Note that the exception to this is that any 
`original_string` field is inherited from the envelope data so that the 
original string contains the envelope data.  If [...]
 * `rawMessageStrategyConfig` : The configuration (a map) for the 
`rawMessageStrategy`.  Available configurations are strategy dependent:
-  * `DEFAULT` 
-    * `metadataPrefix` defines the key prefix for metadata (default is 
`metron.metadata`).
-  * `ENVELOPE` 
-    * `metadataPrefix` defines the key prefix for metadata (default is 
`metron.metadata`) 
-    * `messageField` defines the field from the envelope to use as the data.  
All other fields are considered metadata.
+    * `DEFAULT`
+        * `metadataPrefix` defines the key prefix for metadata (default is 
`metron.metadata`).
+    * `ENVELOPE`
+        * `metadataPrefix` defines the key prefix for metadata (default is 
`metron.metadata`)
+        * `messageField` defines the field from the envelope to use as the 
data.  All other fields are considered metadata.
 * `readMetadata` : This is a boolean indicating whether metadata will be read 
and made available to Field 
 transformations (i.e. Stellar field transformations).  The default is
 dependent upon the `rawMessageStrategy`:
-  * `DEFAULT` : default to `false`.
-  * `ENVELOPE` : default to `true`.
+    * `DEFAULT` : default to `false`.
+    * `ENVELOPE` : default to `true`.
 * `mergeMetadata` : This is a boolean indicating whether metadata fields will 
be merged with the message automatically.  That is to say, if this property is 
set to `true` then every metadata field will become part of the messages and, 
consequently, also available for use in field transformations.  The default is 
dependent upon the `rawMessageStrategy`:
-  * `DEFAULT` : default to `false`.
-  * `ENVELOPE` : default to `true`.
-
+    * `DEFAULT` : default to `false`.
+    * `ENVELOPE` : default to `true`.
 
 #### Field Naming
 
@@ -359,119 +363,125 @@ The format of a `fieldTransformation` is as follows:
 The currently implemented fieldTransformations are:
 * `REMOVE` : This transformation removes the specified input fields.  If you 
want a conditional removal, you can pass a Metron Query Language statement to 
define the conditions under which you want to remove the fields. 
 
-Consider the following simple configuration which will remove `field1`
-unconditionally:
-```
-{
-...
-    "fieldTransformations" : [
-          {
-            "input" : "field1"
-          , "transformation" : "REMOVE"
-          }
-                      ]
-}
-```
+    Consider the following simple configuration which will remove `field1`
+    unconditionally:
 
-Consider the following simple sensor parser configuration which will remove 
`field1`
-whenever `field2` exists and whose corresponding equal to 'foo':
-```
-{
-...
-  "fieldTransformations" : [
-          {
-            "input" : "field1"
-          , "transformation" : "REMOVE"
-          , "config" : {
-              "condition" : "exists(field2) and field2 == 'foo'"
-                       }
-          }
-                      ]
-}
-```
+    ```
+    {
+    ...
+        "fieldTransformations" : [
+              {
+                "input" : "field1"
+              , "transformation" : "REMOVE"
+              }
+                          ]
+    }
+    ```
+
+    Consider the following simple sensor parser configuration which will 
remove `field1`
+    whenever `field2` exists and whose corresponding equal to 'foo':
+
+    ```
+    {
+    ...
+      "fieldTransformations" : [
+              {
+                "input" : "field1"
+              , "transformation" : "REMOVE"
+              , "config" : {
+                  "condition" : "exists(field2) and field2 == 'foo'"
+                           }
+              }
+                          ]
+    }
+    ```
 
 * `SELECT`: This transformation filters the fields in the message to include 
only the configured output fields, and drops any not explicitly included. 
 
-For example: 
-```
-{
-...
-    "fieldTransformations" : [
-          {
-            "output" : ["field1", "field2" ] 
-          , "transformation" : "SELECT"
-          }
-                      ]
-}
-```
+    For example:
+
+    ```
+    {
+    ...
+        "fieldTransformations" : [
+              {
+                "output" : ["field1", "field2" ]
+              , "transformation" : "SELECT"
+              }
+                          ]
+    }
+    ```
 
-when applied to a message containing keys field1, field2 and field3, will only 
output the first two. It is also worth noting that two standard fields - 
timestamp and original_source - will always be passed along whether they are 
listed in output or not, since they are considered core required fields.
+    when applied to a message containing keys field1, field2 and field3, will 
only output the first two. It is also worth noting that two standard fields - 
timestamp and original_source - will always be passed along whether they are 
listed in output or not, since they are considered core required fields.
 
 * `IP_PROTOCOL` : This transformation maps IANA protocol numbers to consistent 
string representations.
 
-Consider the following sensor parser config to map the `protocol` field
-to a textual representation of the protocol:
-```
-{
-...
-    "fieldTransformations" : [
-          {
-            "input" : "protocol"
-          , "transformation" : "IP_PROTOCOL"
-          }
-                      ]
-}
-```
+    Consider the following sensor parser config to map the `protocol` field
+    to a textual representation of the protocol:
+
+    ```
+    {
+    ...
+        "fieldTransformations" : [
+              {
+                "input" : "protocol"
+              , "transformation" : "IP_PROTOCOL"
+              }
+                          ]
+    }
+    ```
 
-This transformation would transform `{ "protocol" : 6, "source.type" : "bro", 
... }` 
-into `{ "protocol" : "TCP", "source.type" : "bro", ...}`
+    This transformation would transform `{ "protocol" : 6, "source.type" : 
"bro", ... }`
+    into `{ "protocol" : "TCP", "source.type" : "bro", ...}`
 
-* `STELLAR` : This transformation executes a set of transformations
-  expressed as [Stellar Language](../../metron-common) statements.
+* `STELLAR` : This transformation executes a set of transformations expressed 
as [Stellar Language](../../metron-common) statements.
 
 * `RENAME` : This transformation allows users to rename a set of fields.  
Specifically,
 the config is presumed to be the mapping.  The keys to the config are the 
existing field names
 and the values for the config map are the associated new field name.
 
-The following config will rename the fields `old_field` and 
`different_old_field` to
-`new_field` and `different_new_field` respectively:
-```
-{
-...
-    "fieldTransformations" : [
-          {
-            "transformation" : "RENAME",
-          , "config" : {
-            "old_field" : "new_field",
-            "different_old_field" : "different_new_field"
-                       }
-          }
-                      ]
-}
-```
+    The following config will rename the fields `old_field` and 
`different_old_field` to
+    `new_field` and `different_new_field` respectively:
+
+    ```
+    {
+    ...
+        "fieldTransformations" : [
+              {
+                "transformation" : "RENAME",
+              , "config" : {
+                "old_field" : "new_field",
+                "different_old_field" : "different_new_field"
+                           }
+              }
+                          ]
+    }
+    ```
+
 * `REGEX_SELECT` : This transformation lets users set an output field to one 
of a set of possibilities based on matching regexes. This transformation is 
useful when the number or conditions are large enough to make a stellar 
language match statement unwieldy.
  
-The following config will set the field `logical_source_type` to one of the
-following, dependent upon the value of the `pix_type` field:
-* `cisco-6-302` if `pix_type` starts with either `6-302` or `06-302`
-* `cisco-5-304` if `pix_type` starts with `5-304`
-```
-{
-...
-  "fieldTransformations" : [
+    The following config will set the field `logical_source_type` to one of the
+    following, dependent upon the value of the `pix_type` field:
+    * `cisco-6-302` if `pix_type` starts with either `6-302` or `06-302`
+    * `cisco-5-304` if `pix_type` starts with `5-304`
+
+    ```
     {
-     "transformation" : "REGEX_ROUTING"
-    ,"input" :  "pix_type"
-    ,"output" :  "logical_source_type"
-    ,"config" : {
-      "cisco-6-302" : [ "^6-302.*", "^06-302.*"]
-      "cisco-5-304" : "^5-304.*"
-                }
+    ...
+      "fieldTransformations" : [
+        {
+         "transformation" : "REGEX_ROUTING"
+        ,"input" :  "pix_type"
+        ,"output" :  "logical_source_type"
+        ,"config" : {
+          "cisco-6-302" : [ "^6-302.*", "^06-302.*"]
+          "cisco-5-304" : "^5-304.*"
+                    }
+        }
+                               ]
+    ...
     }
-                           ]
-...  
-}
-```
+    ```
 
 
 ### Assignment to `null`
diff --git 
a/metron-platform/metron-parsing/metron-parsers-common/parser_arch.png 
b/metron-platform/metron-parsing/parser_arch.png
similarity index 100%
rename from metron-platform/metron-parsing/metron-parsers-common/parser_arch.png
rename to metron-platform/metron-parsing/parser_arch.png
diff --git a/site-book/bin/generate-md.sh b/site-book/bin/generate-md.sh
index 60549f8..7ebb5f6 100755
--- a/site-book/bin/generate-md.sh
+++ b/site-book/bin/generate-md.sh
@@ -64,7 +64,7 @@ RESOURCE_LIST=(
     metron-deployment/readme-images/enable-kerberos-started.png
     metron-deployment/readme-images/enable-kerberos.png
     metron-platform/metron-job/metron-job_state_statechart_diagram.svg
-    metron-platform/metron-parsing/metron-parsers-common/parser_arch.png
+    metron-platform/metron-parsing/parser_arch.png
     metron-platform/metron-indexing/indexing_arch.png
     metron-platform/metron-enrichment/enrichment_arch.png
     metron-analytics/metron-maas-service/maas_arch.png
@@ -96,8 +96,8 @@ HREF_REWRITE_LIST=(
     metron-platform/metron-enrichment/README.md 
's#(enrichment_arch.png)#(../../images/enrichment_arch.png)#g'
     metron-platform/metron-indexing/README.md 
's#(indexing_arch.png)#(../../images/indexing_arch.png)#g'
     metron-platform/metron-job/README.md 
's#(metron-job_state_statechart_diagram.svg)#(../../images/metron-job_state_statechart_diagram.svg)#g'
-    metron-platform/metron-parsing/metron-parsers-common/README.md 
's#(parser_arch.png)#(../../images/parser_arch.png)#g'
-    metron-platform/metron-parsing/metron-parsers-common/ParserChaining.md 
's#(../../use-cases/parser_chaining/message_routing_high_level.svg)#(../../images/message_routing_high_level.svg)#g'
+    metron-platform/metron-parsing/README.md 
's#(parser_arch.png)#(../../images/parser_arch.png)#g'
+    metron-platform/metron-parsing/metron-parsers-common/ParserChaining.md 
's#(../../../use-cases/parser_chaining/message_routing_high_level.svg)#(../../../images/message_routing_high_level.svg)#g'
     metron-analytics/metron-maas-service/README.md 
's#(maas_arch.png)#(../../images/maas_arch.png)#g'
     metron-contrib/metron-performance/README.md 
's#(performance_measurement.png)#(../../images/performance_measurement.png)#g'
     use-cases/forensic_clustering/README.md 
's#(find_alerts.png)#(../../images/find_alerts.png)#g'

[metron] branch master updated: METRON-1950: Site-book generation broken in master (mmiklavc) closes apache/metron#1309

Reply via email to