[
https://issues.apache.org/jira/browse/METRON-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060269#comment-16060269
]
ASF GitHub Bot commented on METRON-1001:
----------------------------------------
Github user cestella commented on the issue:
https://github.com/apache/metron/pull/621
# TESTING PLAN
Testing Instructions beyond the normal smoke test (i.e. letting data
flow through to the indices and checking them).
# Preliminaries
Since I will use the squid topology to pass data through in a controlled
way, we must install squid and generate one point of data:
* `yum install -y squid`
* `service squid start`
* `squidclient http://www.yahoo.com`
Also, set an environment variable to indicate `METRON_HOME`:
* `export METRON_HOME=/usr/metron/0.4.0`
# Deploy the squid parser
* Create the squid kafka topic:
`/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper node1:2181
--create --topic squid --partitions 1 --replication-factor 1`
* Start via `$METRON_HOME/bin/start_parser_topology.sh -k node1:6667 -z
node1:2181 -s squid`
# Test Cases
## Test Case 1: Base Case
* Send squid data through: `cat /var/log/squid/access.log |
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list
node1:6667 --topic squid`
* Validate that the message goes through with no fields prefixed with
`metadata`: `curl -XPOST 'http://localhost:9200/squid*/_search?pretty'`
## Test Case 2: Validate Environmental Metadata is available
* Modify `$METRON_HOME/config/zookeeper/parsers/squid.json` and turn on
metadata reading:
```
{
"parserClassName": "org.apache.metron.parsers.GrokParser",
"sensorTopic": "squid",
"readMetadata" : true,
"parserConfig": {
"grokPath": "/patterns/squid",
"patternLabel": "SQUID_DELIMITED",
"timestampField": "timestamp"
},
"fieldTransformations" : [
{
"transformation" : "STELLAR"
,"output" : [ "full_hostname", "domain_without_subdomains",
"kafka_topic" ]
,"config" : {
"full_hostname" : "URL_TO_HOST(url)"
,"domain_without_subdomains" :
"DOMAIN_REMOVE_SUBDOMAINS(full_hostname)"
,"kafka_topic" : "TO_UPPER(metron.metadata.topic)"
}
}
]
}
```
* Persist config changes: `$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i
$METRON_HOME/config/zookeeper -z node1:2181`
* Clear indices: `curl -XDELETE "http://localhost:9200/squid*"`
* Send squid data through: `cat /var/log/squid/access.log |
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list
node1:6667 --topic squid`
* Validate that the message goes through with a `kafka_topic` field of
`SQUID`:
```
curl -XPOST 'http://localhost:9200/squid*/_search?pretty' -d '
{
"_source" : [ "kafka_topic" ]
}
'
```
## Test Case 3: Validate Environmental Metadata is available and is able to
be merged
* Modify `$METRON_HOME/config/zookeeper/parsers/squid.json` and turn on
metadata reading and merging:
```
{
"parserClassName": "org.apache.metron.parsers.GrokParser",
"sensorTopic": "squid",
"readMetadata" : true,
"mergeMetadata" : true,
"parserConfig": {
"grokPath": "/patterns/squid",
"patternLabel": "SQUID_DELIMITED",
"timestampField": "timestamp"
},
"fieldTransformations" : [
{
"transformation" : "STELLAR"
,"output" : [ "full_hostname", "domain_without_subdomains",
"kafka_topic" ]
,"config" : {
"full_hostname" : "URL_TO_HOST(url)"
,"domain_without_subdomains" :
"DOMAIN_REMOVE_SUBDOMAINS(full_hostname)"
,"kafka_topic" : "TO_UPPER(metron.metadata.topic)"
}
}
]
}
```
* Persist config changes: `$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i
$METRON_HOME/config/zookeeper -z node1:2181`
* Clear indices: `curl -XDELETE "http://localhost:9200/squid*"`
* Send squid data through: `cat /var/log/squid/access.log |
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list
node1:6667 --topic squid`
* Validate that the message goes through with a `kafka_topic` field of
`SQUID` and `metron:metadata:topic` of `squid`:
```
curl -XPOST 'http://localhost:9200/squid*/_search?pretty' -d '
{
"_source" : [ "kafka_topic", "metron:metadata:topic" ]
}
'
```
## Test Case 3: Validate Custom Metadata is available
We're going to send a custom JSON Map containing metadata along in the key.
The map will have one value `customer_id`
* Modify `$METRON_HOME/config/zookeeper/parsers/squid.json` and turn on
metadata reading and turn off merging. Also, emit a new field called
`customer_id` in the field transformation:
```
{
"parserClassName": "org.apache.metron.parsers.GrokParser",
"sensorTopic": "squid",
"readMetadata" : true,
"mergeMetadata" : false,
"parserConfig": {
"grokPath": "/patterns/squid",
"patternLabel": "SQUID_DELIMITED",
"timestampField": "timestamp"
},
"fieldTransformations" : [
{
"transformation" : "STELLAR"
,"output" : [ "full_hostname", "domain_without_subdomains",
"kafka_topic", "customer_id"]
,"config" : {
"full_hostname" : "URL_TO_HOST(url)"
,"domain_without_subdomains" :
"DOMAIN_REMOVE_SUBDOMAINS(full_hostname)"
,"kafka_topic" : "TO_UPPER(metron.metadata.topic)"
,"customer_id" : "TO_UPPER(metron.metadata.customer_id)"
}
}
]
}
```
* Persist config changes: `$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i
$METRON_HOME/config/zookeeper -z node1:2181`
* Clear indices: `curl -XDELETE "http://localhost:9200/squid*"`
* Send squid data through: `IFS=$'\n';for i in $(cat
/var/log/squid/access.log);do METADATA="{\"customer_id\" : \"cust2\"}"; echo
$METADATA\;$i;done |
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list
node1:6667 --topic squid --property="parse.key=true" --property
"key.separator=;"`
* Validate that the message goes through with a `kafka_topic` field of
`SQUID` and `customer_id` of `CUST2`:
```
curl -XPOST 'http://localhost:9200/squid*/_search?pretty' -d '
{
"_source" : [ "kafka_topic", "customer_id" ]
}
'
```
## Test Case 4: Validate Custom Metadata is available and able to be merged
We're going to send a custom JSON Map containing metadata along in the key.
The map will have one value `customer_id`
* Modify `$METRON_HOME/config/zookeeper/parsers/squid.json` and turn on
metadata reading and turn back on merging. Also, emit a custom metadata
field called `customer_id` in the field transformation:
```
{
"parserClassName": "org.apache.metron.parsers.GrokParser",
"sensorTopic": "squid",
"readMetadata" : true,
"mergeMetadata" : true,
"parserConfig": {
"grokPath": "/patterns/squid",
"patternLabel": "SQUID_DELIMITED",
"timestampField": "timestamp"
},
"fieldTransformations" : [
{
"transformation" : "STELLAR"
,"output" : [ "full_hostname", "domain_without_subdomains",
"kafka_topic", "customer_id"]
,"config" : {
"full_hostname" : "URL_TO_HOST(url)"
,"domain_without_subdomains" :
"DOMAIN_REMOVE_SUBDOMAINS(full_hostname)"
,"kafka_topic" : "TO_UPPER(metron.metadata.topic)"
,"customer_id" : "TO_UPPER(metron.metadata.customer_id)"
}
}
]
}
```
* Persist config changes: `$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i
$METRON_HOME/config/zookeeper -z node1:2181`
* Clear indices: `curl -XDELETE "http://localhost:9200/squid*"`
* Send squid data through: `IFS=$'\n';for i in $(cat
/var/log/squid/access.log);do METADATA="{\"customer_id\" : \"cust2\"}"; echo
$METADATA\;$i;done |
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list
node1:6667 --topic squid --property="parse.key=true" --property
"key.separator=;"`
* Validate that the message goes through with a `kafka_topic` field of
`SQUID` `metron:metadata:customer_id` of `cust2` and `customer_id` of `CUST2`:
```
curl -XPOST 'http://localhost:9200/squid*/_search?pretty' -d '
{
"_source" : [ "kafka_topic", "customer_id", "metron:metadata:customer_id"
]
}
'
```
> Allow metron to ingest parser metadata along with data
> ------------------------------------------------------
>
> Key: METRON-1001
> URL: https://issues.apache.org/jira/browse/METRON-1001
> Project: Metron
> Issue Type: Improvement
> Reporter: Casey Stella
> Assignee: Casey Stella
>
> Currently, we only ingest data in Metron. Often, there is valuable metadata
> constructed up-stream of Metron that is relevant to enrichment and cross-cuts
> many data formats. Take, for instance, a multi-tenancy case where multiple
> sources come in and you'd like to tag the data with the customer ID. In this
> case you're stuck finding ways to add the metadata to each data source's
> format. Rather than do that, we should allow metadata to be ingested along
> with the data associated with it.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)