[ 
https://issues.apache.org/jira/browse/METRON-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060269#comment-16060269
 ] 

ASF GitHub Bot commented on METRON-1001:
----------------------------------------

Github user cestella commented on the issue:

    https://github.com/apache/metron/pull/621
  
    # TESTING PLAN
    
    Testing Instructions beyond the normal smoke test (i.e. letting data
    flow through to the indices and checking them).
    
    # Preliminaries
    
    Since I will use the squid topology to pass data through in a controlled
    way, we must install squid and generate one point of data:
    * `yum install -y squid`
    * `service squid start`
    * `squidclient http://www.yahoo.com`
    
    Also, set an environment variable to indicate `METRON_HOME`:
    * `export METRON_HOME=/usr/metron/0.4.0` 
    
    # Deploy the squid parser
    * Create the squid kafka topic: 
`/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper node1:2181 
--create --topic squid --partitions 1 --replication-factor 1`
    * Start via `$METRON_HOME/bin/start_parser_topology.sh -k node1:6667 -z 
node1:2181 -s squid`
    
    # Test Cases
    
    ## Test Case 1: Base Case
    * Send squid data through: `cat /var/log/squid/access.log | 
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list 
node1:6667 --topic squid`
    * Validate that the message goes through with no fields prefixed with 
`metadata`: `curl -XPOST 'http://localhost:9200/squid*/_search?pretty'`
    
    ## Test Case 2: Validate Environmental Metadata is available
    * Modify `$METRON_HOME/config/zookeeper/parsers/squid.json` and turn on
      metadata reading:
    ```
    {
      "parserClassName": "org.apache.metron.parsers.GrokParser",
      "sensorTopic": "squid",
      "readMetadata" : true,
      "parserConfig": {
        "grokPath": "/patterns/squid",
        "patternLabel": "SQUID_DELIMITED",
        "timestampField": "timestamp"
      },
      "fieldTransformations" : [
        {
          "transformation" : "STELLAR"
        ,"output" : [ "full_hostname", "domain_without_subdomains", 
"kafka_topic" ]
        ,"config" : {
          "full_hostname" : "URL_TO_HOST(url)"
          ,"domain_without_subdomains" : 
"DOMAIN_REMOVE_SUBDOMAINS(full_hostname)"
          ,"kafka_topic" : "TO_UPPER(metron.metadata.topic)"
                    }
        }
                               ]
    }
    ```
    * Persist config changes: `$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i 
$METRON_HOME/config/zookeeper -z node1:2181`
    * Clear indices: `curl -XDELETE "http://localhost:9200/squid*"`
    * Send squid data through: `cat /var/log/squid/access.log | 
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list 
node1:6667 --topic squid`
    * Validate that the message goes through with a `kafka_topic` field of 
`SQUID`: 
    ```
    curl -XPOST 'http://localhost:9200/squid*/_search?pretty' -d '
    {
      "_source" : [ "kafka_topic" ]
    }
    '
    ```
    ## Test Case 3: Validate Environmental Metadata is available and is able to 
be merged
    * Modify `$METRON_HOME/config/zookeeper/parsers/squid.json` and turn on
      metadata reading and merging:
    ```
    {
      "parserClassName": "org.apache.metron.parsers.GrokParser",
      "sensorTopic": "squid",
      "readMetadata" : true,
      "mergeMetadata" : true,
      "parserConfig": {
        "grokPath": "/patterns/squid",
        "patternLabel": "SQUID_DELIMITED",
        "timestampField": "timestamp"
      },
      "fieldTransformations" : [
        {
          "transformation" : "STELLAR"
        ,"output" : [ "full_hostname", "domain_without_subdomains", 
"kafka_topic" ]
        ,"config" : {
          "full_hostname" : "URL_TO_HOST(url)"
          ,"domain_without_subdomains" : 
"DOMAIN_REMOVE_SUBDOMAINS(full_hostname)"
          ,"kafka_topic" : "TO_UPPER(metron.metadata.topic)"
                    }
        }
                               ]
    }
    ```
    * Persist config changes: `$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i 
$METRON_HOME/config/zookeeper -z node1:2181`
    * Clear indices: `curl -XDELETE "http://localhost:9200/squid*"`
    * Send squid data through: `cat /var/log/squid/access.log | 
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list 
node1:6667 --topic squid`
    * Validate that the message goes through with a `kafka_topic` field of 
`SQUID` and `metron:metadata:topic` of `squid`: 
    ```
    curl -XPOST 'http://localhost:9200/squid*/_search?pretty' -d '
    {
      "_source" : [ "kafka_topic", "metron:metadata:topic" ]
    }
    '
    ```
    ## Test Case 3: Validate Custom Metadata is available 
    We're going to send a custom JSON Map containing metadata along in the key. 
 The map will have one value `customer_id`
    * Modify `$METRON_HOME/config/zookeeper/parsers/squid.json` and turn on
      metadata reading and turn off merging.  Also, emit a new field called 
`customer_id` in the field transformation:
    ```
    {
      "parserClassName": "org.apache.metron.parsers.GrokParser",
      "sensorTopic": "squid",
      "readMetadata" : true,
      "mergeMetadata" : false,
      "parserConfig": {
        "grokPath": "/patterns/squid",
        "patternLabel": "SQUID_DELIMITED",
        "timestampField": "timestamp"
      },
      "fieldTransformations" : [
        {
          "transformation" : "STELLAR"
        ,"output" : [ "full_hostname", "domain_without_subdomains", 
"kafka_topic", "customer_id"]
        ,"config" : {
          "full_hostname" : "URL_TO_HOST(url)"
          ,"domain_without_subdomains" : 
"DOMAIN_REMOVE_SUBDOMAINS(full_hostname)"
          ,"kafka_topic" : "TO_UPPER(metron.metadata.topic)"
          ,"customer_id" : "TO_UPPER(metron.metadata.customer_id)"
                    }
        }
                               ]
    }
    ```
    * Persist config changes: `$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i 
$METRON_HOME/config/zookeeper -z node1:2181`
    * Clear indices: `curl -XDELETE "http://localhost:9200/squid*"`
    * Send squid data through: `IFS=$'\n';for i in $(cat 
/var/log/squid/access.log);do METADATA="{\"customer_id\" : \"cust2\"}"; echo 
$METADATA\;$i;done | 
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list 
node1:6667 --topic squid --property="parse.key=true" --property 
"key.separator=;"` 
    * Validate that the message goes through with a `kafka_topic` field of 
`SQUID` and `customer_id` of `CUST2`: 
    ```
    curl -XPOST 'http://localhost:9200/squid*/_search?pretty' -d '
    {
      "_source" : [ "kafka_topic", "customer_id" ]
    }
    '
    ```
    ## Test Case 4: Validate Custom Metadata is available and able to be merged
    We're going to send a custom JSON Map containing metadata along in the key. 
 The map will have one value `customer_id`
    * Modify `$METRON_HOME/config/zookeeper/parsers/squid.json` and turn on
      metadata reading and turn back on merging.  Also, emit a custom metadata 
field called `customer_id` in the field transformation:
    ```
    {
      "parserClassName": "org.apache.metron.parsers.GrokParser",
      "sensorTopic": "squid",
      "readMetadata" : true,
      "mergeMetadata" : true,
      "parserConfig": {
        "grokPath": "/patterns/squid",
        "patternLabel": "SQUID_DELIMITED",
        "timestampField": "timestamp"
      },
      "fieldTransformations" : [
        {
          "transformation" : "STELLAR"
        ,"output" : [ "full_hostname", "domain_without_subdomains", 
"kafka_topic", "customer_id"]
        ,"config" : {
          "full_hostname" : "URL_TO_HOST(url)"
          ,"domain_without_subdomains" : 
"DOMAIN_REMOVE_SUBDOMAINS(full_hostname)"
          ,"kafka_topic" : "TO_UPPER(metron.metadata.topic)"
          ,"customer_id" : "TO_UPPER(metron.metadata.customer_id)"
                    }
        }
                               ]
    }
    ```
    * Persist config changes: `$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i 
$METRON_HOME/config/zookeeper -z node1:2181`
    * Clear indices: `curl -XDELETE "http://localhost:9200/squid*"`
    * Send squid data through: `IFS=$'\n';for i in $(cat 
/var/log/squid/access.log);do METADATA="{\"customer_id\" : \"cust2\"}"; echo 
$METADATA\;$i;done | 
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list 
node1:6667 --topic squid --property="parse.key=true" --property 
"key.separator=;"` 
    * Validate that the message goes through with a `kafka_topic` field of 
`SQUID` `metron:metadata:customer_id` of `cust2` and `customer_id` of `CUST2`: 
    ```
    curl -XPOST 'http://localhost:9200/squid*/_search?pretty' -d '
    {
      "_source" : [ "kafka_topic", "customer_id", "metron:metadata:customer_id" 
]
    }
    '
    ```
    



> Allow metron to ingest parser metadata along with data
> ------------------------------------------------------
>
>                 Key: METRON-1001
>                 URL: https://issues.apache.org/jira/browse/METRON-1001
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Casey Stella
>            Assignee: Casey Stella
>
> Currently, we only ingest data in Metron.  Often, there is valuable metadata 
> constructed up-stream of Metron that is relevant to enrichment and cross-cuts 
> many data formats.  Take, for instance, a multi-tenancy case where multiple 
> sources come in and you'd like to tag the data with the customer ID.  In this 
> case you're stuck finding ways to add the metadata to each data source's 
> format.  Rather than do that, we should allow metadata to be ingested along 
> with the data associated with it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to