[jira] [Commented] (METRON-793) Migrate to storm-kafka-client kafka spout from storm-kafka

ASF GitHub Bot (JIRA) Wed, 22 Mar 2017 08:43:05 -0700

    [ 
https://issues.apache.org/jira/browse/METRON-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936552#comment-15936552
 ]


ASF GitHub Bot commented on METRON-793:
---------------------------------------

Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/486
  
    # Testing Plan
    ## Preliminaries
    
    * Please perform the following tests on the `full-dev` vagrant environment.
    * Set an environment variable to indicate `METRON_HOME`:
    `export METRON_HOME=/usr/metron/0.3.1` 
    
    
    ## Ensure Data Flows from the Indices
    Ensure that with a basic full-dev we get data into the elasticsearch
    indices and into HDFS.
    
    ## (Optional) Free Up Space on the virtual machine
    
    First, let's free up some headroom on the virtual machine.  If you are 
running this on a
    multinode cluster, you would not have to do this.
    * Stop and disable Metron in Ambari
    * Kill monit via `service monit stop`
    * Kill tcpreplay via `for i in $(ps -ef | grep tcpreplay | awk '{print 
$2}');do kill -9 $i;done`
    * Kill yaf via `for i in $(ps -ef | grep yaf | awk '{print $2}');do kill -9 
$i;done`
    * Kill bro via `for i in $(ps -ef | grep bro | awk '{print $2}');do kill -9 
$i;done`
    
    ## Test the PCAP topology
    
    A new kafka spout necessitates testing pcap.
    ### Install and start pycapa 
    ```
    # set env vars
    export PYCAPA_HOME=/opt/pycapa
    export PYTHON27_HOME=/opt/rh/python27/root
    
    # Install these packages via yum (RHEL, CentOS)
    yum -y install epel-release centos-release-scl 
    yum -y install "@Development tools" python27 python27-scldevel 
python27-python-virtualenv libpcap-devel libselinux-python
    
    # Setup directories
    mkdir $PYCAPA_HOME && chmod 755 $PYCAPA_HOME
    
    # Create virtualenv
    export LD_LIBRARY_PATH="/opt/rh/python27/root/usr/lib64"
    ${PYTHON27_HOME}/usr/bin/virtualenv pycapa-venv
    
    # Copy pycapa
    # copy incubator-metron/metron-sensors/pycapa from the Metron source tree 
into $PYCAPA_HOME on the node you would like to install pycapa on.
    
    # Build it
    cd ${PYCAPA_HOME}/pycapa
    # activate the virtualenv
    source ${PYCAPA_HOME}/pycapa-venv/bin/activate
    pip install -r requirements.txt
    python setup.py install
    
    # Run it
    cd ${PYCAPA_HOME}/pycapa-venv/bin
    pycapa --producer --topic pcap -i eth1 -k node1:6667
    ```
    ### Ensure pycapa can write to HDFS
    * Ensure that `/apps/metron/pcap` exists and can be written to by the
      storm user.  If not, then:
    ```
    sudo su - hdfs
    hadoop fs -mkdir -p /apps/metron/pcap
    hadoop fs -chown metron:hadoop /apps/metron/pcap
    hadoop fs -chmod 775 /apps/metron/pcap
    ``` 
    * Start the pcap topology via `$METRON_HOME/bin/start_pcap_topology.sh`
    * Start the pycapa packet capture producer on eth1 via `/usr/bin/pycapa 
--producer --topic pcap -i eth1 -k node1:6667`
    * Watch the topology in the Storm UI and kill the packet capture utility 
from before, when the number of packets ingested is over 3k.  Ensure that at at 
least 3 files exist on HDFS by running `hadoop fs -ls /apps/metron/pcap`
    * Choose a file (denoted by $FILE) and dump a few of the contents using the 
pcap_inspector utility via `$METRON_HOME/bin/pcap_inspector.sh -i $FILE -n 5`
    * Choose one of the lines and note the protocol.
      * Note that when you run the commands below, the resulting file will be 
placed in the execution directory where you kicked off the job from.
    * Run a Stellar query filter query by executing a command similar to the 
following, with the values noted above (match your start_time format to the 
date format provided - default is to use millis since epoch):
    ```
    $METRON_HOME/bin/pcap_query.sh query -st "20160617" -df "yyyyMMdd" -query 
"protocol == 6" -rpf 500
    ```
    * Verify the MR job finishes successfully. Upon completion, you should see 
multiple files named with relatively current datestamps in your current 
directory, e.g. pcap-data-20160617160549737+0000.pcap
    * Copy the files to your local machine and verify you can them it in 
Wireshark. I chose a middle file and the last file. The middle file should have 
500 records (per the records_per_file option), and the last one will likely 
have a number of records <= 500.
    
    ## Test the Profiler
    
    ### Setup
    * Ensure that Metron is stopped and put in maintenance mode in Ambari
    * Create the profiler hbase table
    `echo "create 'profiler', 'P'" | hbase shell`
    
    * Open `~/rand_gen.py` and paste the following:
    ```
    #!/usr/bin/python
    import random
    import sys
    import time
    def main():
      mu = float(sys.argv[1])
      sigma = float(sys.argv[2])
      freq_s = int(sys.argv[3])
      while True:
        out = '{ "value" : ' + str(random.gauss(mu, sigma)) + ' }'
        print out
        sys.stdout.flush()
        time.sleep(freq_s)
    
    if __name__ == '__main__':
      main()
    ```
    This will generate random JSON maps with a numeric field called `value`
    
    * Set the profiler to use 1 minute tick durations:
      * Edit `$METRON_HOME/config/profiler.properties` to adjust the capture 
duration by changing `profiler.period.duration=15` to 
`profiler.period.duration=1`
      * Edit `$METRON_HOME/config/zookeeper/global.json` and add the following 
properties:
    ```
    "profiler.client.period.duration" : "1",
    "profiler.client.period.duration.units" : "MINUTES"
    ```
    ### Deploy the custom parser
    
    * Edit the value parser config at 
`$METRON_HOME/config/zookeeper/parsers/value.json`:
    ```
    {
      "parserClassName":"org.apache.metron.parsers.json.JSONMapParser",
      "sensorTopic":"value",
      "fieldTransformations" : [
        {
        "transformation" : "STELLAR"
       ,"output" : [ "num_profiles_parser", "mean_parser" ]
       ,"config" : {
          "num_profiles_parser" : "LENGTH(PROFILE_GET('stat', 'global', 
PROFILE_WINDOW('5 minute window every 10 minutes starting from 2 minutes ago 
until 32 minutes ago excluding holidays:us')))",
          "mean_parser" : "STATS_MEAN(STATS_MERGE(PROFILE_GET('stat', 'global', 
PROFILE_WINDOW('5 minute window every 10 minutes starting from 2 minutes ago 
until 32 minutes ago excluding holidays:us'))))"
                   }
        }
                               ]
    }
    ```
    
    * Edit the value enrichment config at 
`$METRON_HOME/config/zookeeper/enrichments/value.json`:
    ```
    {
      "enrichment" : {
       "fieldMap": {
          "stellar" : {
            "config" : {
            "num_profiles_enrichment" : "LENGTH(PROFILE_GET('stat', 'global', 
PROFILE_WINDOW('5 minute window every 10 minutes starting from 2 minutes ago 
until 32 minutes ago excluding holidays:us')))",
            "mean_enrichment" : "STATS_MEAN(STATS_MERGE(PROFILE_GET('stat', 
'global', PROFILE_WINDOW('5 minute window every 10 minutes starting from 2 
minutes ago until 32 minutes ago excluding holidays:us'))))"
                      }
          }
        }
      }
    }
    ```
    * Create the value kafka topic:
      `/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper node1:2181 
--create --topic value --partitions 1 --replication-factor 1`
    * Push the configs via `$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i 
$METRON_HOME/config/zookeeper -z node1:2181`
    * Start via `$METRON_HOME/bin/start_parser_topology.sh -k node1:6667 -z 
node1:2181 -s value`
    
    
    ### Start the profiler
    
    * Edit `$METRON_HOME/config/zookeeper/profiler.json` and paste in the 
following:
    ```
    {
      "profiles": [
        {
          "profile": "stat",
          "foreach": "'global'",
          "onlyif": "true",
          "init" : {
                   },
          "update": {
            "s": "STATS_ADD(s, value)"
                    },
          "result": "s"
        }
      ]
    }
    ```
    
    * `$METRON_HOME/bin/start_profiler_topology.sh`
    
    ### Test Case
    
    * Set up a profile to accept some synthetic data with a numeric `value` 
field and persist a stats summary of the data
    
    * Send some synthetic data directly to the profiler:
    `python ~/rand_gen.py 0 1 1 | 
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list 
node1:6667 --topic value`
    * Wait for at least 32 minutes and execute the following via the Stellar 
REPL:
    ```
    # Grab the profiles from 1 minute ago to 8 minutes ago
    LENGTH(PROFILE_GET('stat', 'global', PROFILE_WINDOW('from 1 minute ago to 8 
minutes ago')))
    # Looks like 7 were returned, great.  Now try something more complex
    # Grab the profiles in 5 minute windows every 10 minutes from 2 minutes ago 
to 32 minutes ago:
    #  32 minutes ago til 27 minutes ago should be 5 profiles
    #  22 minutes ago til 17 minutes ago should be 5 profiles
    #  12 minutes ago til 7 minutes ago should be 5 profiles
    # for a total of 15 profiles
    LENGTH(PROFILE_GET('stat', 'global', PROFILE_WINDOW('5 minute window every 
10 minutes starting from 2 minutes ago until 32 minutes ago excluding 
holidays:us')))
    ```
    For me, the following was the result:
    ```
    ```
    * Delete any value index that currently exists (if any do) via `curl 
-XDELETE "http://localhost:9200/value*"`
    * Wait for a couple of seconds and run 
      * `curl "http://localhost:9200/value*/_search?pretty=true&q=*:*"; 2> 
/dev/null` 
      * You should see values in the index with non-zero fields:
         * `num_profiles_enrichment` should be 15
         * `num_profiles_parser` should be 15
         * `mean_enrichment` should be a non-zero double
         * `mean_parser` should be a non-zero double
    For reference, a sample message for me is:
    ```
    ```



> Migrate to storm-kafka-client kafka spout from storm-kafka
> ----------------------------------------------------------
>
>                 Key: METRON-793
>                 URL: https://issues.apache.org/jira/browse/METRON-793
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Casey Stella
>
> In order to eventually support kerberos, the suggested path is to migrate to 
> the new kafka spout (org.apache.storm:storm-kafka-client) which uses the new 
> consumer API.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (METRON-793) Migrate to storm-kafka-client kafka spout from storm-kafka

Reply via email to