METRON-1714 Create RPM Packaging for the Batch Profiler (nickwallen) closes 
apache/metron#1163


Project: http://git-wip-us.apache.org/repos/asf/metron/repo
Commit: http://git-wip-us.apache.org/repos/asf/metron/commit/c7a3dc23
Tree: http://git-wip-us.apache.org/repos/asf/metron/tree/c7a3dc23
Diff: http://git-wip-us.apache.org/repos/asf/metron/diff/c7a3dc23

Branch: refs/heads/master
Commit: c7a3dc230a8fdfbbefcbe3a04c9c5cc05bc74853
Parents: c6d0721
Author: nickwallen <[email protected]>
Authored: Mon Aug 27 17:35:00 2018 -0400
Committer: nickallen <[email protected]>
Committed: Mon Aug 27 17:35:00 2018 -0400

----------------------------------------------------------------------
 .../metron-profiler-spark/README.md             | 219 +++++++++++++++++++
 .../common-services/METRON/CURRENT/metainfo.xml |   5 +-
 .../docker/rpm-docker/SPECS/metron.spec         |  37 +++-
 .../packaging/docker/rpm-docker/pom.xml         |   6 +
 4 files changed, 260 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/metron/blob/c7a3dc23/metron-analytics/metron-profiler-spark/README.md
----------------------------------------------------------------------
diff --git a/metron-analytics/metron-profiler-spark/README.md 
b/metron-analytics/metron-profiler-spark/README.md
new file mode 100644
index 0000000..0a31263
--- /dev/null
+++ b/metron-analytics/metron-profiler-spark/README.md
@@ -0,0 +1,219 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# Metron Profiler for Spark
+
+This project allows profiles to be executed using [Apache 
Spark](https://spark.apache.org). This is a port of the Profiler to Spark.
+
+* [Introduction](#introduction)
+* [Getting Started](#getting-started)
+* [Installation](#installation)
+* [Configuring the Profiler](#configuring-the-profiler)
+* [Running the Profiler](#running-the-profiler)
+
+## Introduction
+
+Using the [Streaming Profiler](../metron-profiler/README.md) in [Apache 
Storm](http://storm.apache.org) allows you to create profiles based on the 
stream of telemetry being captured, enriched, triaged, and indexed by Metron. 
This does not allow you to create a profile based on telemetry that was 
captured in the past.  
+
+There are many cases where you might want to produce a profile from telemetry 
in the past.  This is referred to as "profile seeding".
+
+* As a Security Data Scientist, I want to understand the historical behaviors 
and trends of a profile so that I can determine if the profile has predictive 
value for model building.
+
+* As a Security Platform Engineer, I want to generate a profile using archived 
telemetry when I deploy a new model to production so that models depending on 
that profile can function on day 1.
+
+The Batch Profiler running in [Apache Spark](https://spark.apache.org) allows 
you to seed a profile using archived telemetry.
+
+The portion of a profile produced by the Batch Profiler should be 
indistinguishable from the portion created by the Streaming Profiler.  
Consumers of the profile should not care how the profile was generated.  Using 
the Streaming Profiler together with the Batch Profiler allows you to create a 
complete profile over a wide range of time.
+
+For an introduction to the Profiler and Profiler concepts, see the [Profiler 
README](../metron-profiler/README.md).
+
+## Getting Started
+
+
+
+1. Create a profile definition by editing 
`$METRON_HOME/config/zookeeper/profiler.json` as follows.  
+
+    ```
+    cat $METRON_HOME/config/zookeeper/profiler.json
+    {
+      "profiles": [
+        {
+          "profile": "hello-world",
+          "foreach": "'global'",
+          "init":    { "count": "0" },
+          "update":  { "count": "count + 1" },
+          "result":  "count"
+        }
+      ],
+      "timestampField": "timestamp"
+    }
+    ```
+
+1. Ensure that you have archived telemetry available for the Batch Profiler to 
consume.  By default, Metron will store this in HDFS at 
`/apps/metron/indexing/indexed/*/*`.
+
+    ```
+    hdfs dfs -cat /apps/metron/indexing/indexed/*/* | wc -l
+    ```
+
+1. Review the Batch Profiler's properties located at 
`$METRON_HOME/config/batch-profiler.properties`.  See [Configuring the 
Profiler](#configuring-the-profiler) for more information on these properties.
+
+1. You may want to edit the log4j properties that sits in your config 
directory in `${SPARK_HOME}` or create one.  It may be helpful to turn on 
`DEBUG` logging for the Profiler by adding the following line.
+
+         ```
+         log4j.logger.org.apache.metron.profiler.spark=DEBUG
+         ```
+
+1. Run the Batch Profiler.
+
+    ```
+    source /etc/default/metron
+    cd $METRON_HOME
+    $METRON_HOME/bin/start_batch_profiler.sh
+    ```
+
+1. Query for the profile data using the [Profiler 
Client](../metron-profiler-client/README.md).
+
+## Installation
+
+The Batch Profiler package is installed automatically when installing Metron 
using the Ambari MPack.  See the following notes when installing the Batch 
Profiler without the Ambari MPack.
+
+### Prerequisites
+
+The Batch Profiler requires Spark version 2.3.0+.
+
+### Packages
+
+#### Build the RPM
+
+1. Build Metron.
+    ```
+    mvn clean package -DskipTests -T2C
+    ```
+
+1. Build the RPMs.
+    ```
+    cd metron-deployment/
+    mvn clean package -Pbuild-rpms
+    ```
+
+1. Retrieve the package.
+    ```
+    find ./ -name "metron-profiler-spark*.rpm"
+    ```
+
+### Build the DEB
+
+1. Build Metron.
+    ```
+    mvn clean package -DskipTests -T2C
+    ```
+
+1. Build the DEBs.
+    ```
+    cd metron-deployment/
+    mvn clean package -Pbuild-debs
+    ```
+
+1. Retrieve the package.
+    ```
+    find ./ -name "metron-profiler-spark*.deb"
+    ```
+
+## Configuring the Profiler
+
+By default, the configuration for the Batch Profiler is stored in the local 
filesystem at `$METRON_HOME/config/batch-profiler.properties`.
+
+You can store both settings for the Profiler along with settings for Spark in 
this same file.  Spark will only read settings that start with `spark.`.
+
+| Setting                                                                      
 | Description
+|---                                                                           
 |---
+| [`profiler.batch.input.path`](#profilerbatchinputpath)                       
 | The path to the input data read by the Batch Profiler.
+| [`profiler.batch.input.format`](#profilerbatchinputformat)                   
 | The format of the input data read by the Batch Profiler.
+| [`profiler.period.duration`](#profilerperiodduration)                        
 | The duration of each profile period.  
+| [`profiler.period.duration.units`](#profilerperioddurationunits)             
 | The units used to specify the 
[`profiler.period.duration`](#profilerperiodduration).
+| [`profiler.hbase.salt.divisor`](#profilerhbasesaltdivisor)                   
 | A salt is prepended to the row key to help prevent hot-spotting.
+| [`profiler.hbase.table`](#profilerhbasetable)                                
 | The name of the HBase table that profiles are written to.
+| [`profiler.hbase.column.family`](#profilerhbasecolumnfamily)                 
 | The column family used to store profiles.
+
+### `profiler.batch.input.path`
+
+*Default*: hdfs://localhost:9000/apps/metron/indexing/indexed/*/*
+
+The path to the input data read by the Batch Profiler.
+
+### `profiler.batch.input.format`
+
+*Default*: text
+
+The format of the input data read by the Batch Profiler.
+
+### `profiler.period.duration`
+
+*Default*: 15
+
+The duration of each profile period.  This value should be defined along with 
[`profiler.period.duration.units`](#profilerperioddurationunits).
+
+*Important*: To read a profile using the [Profiler 
Client](metron-analytics/metron-profiler-client), the Profiler Client's 
`profiler.client.period.duration` property must match this value.  Otherwise, 
the Profiler Client will be unable to read the profile data.  
+
+### `profiler.period.duration.units`
+
+*Default*: MINUTES
+
+The units used to specify the `profiler.period.duration`.  This value should 
be defined along with [`profiler.period.duration`](#profilerperiodduration).
+
+*Important*: To read a profile using the Profiler Client, the Profiler 
Client's `profiler.client.period.duration.units` property must match this 
value.  Otherwise, the [Profiler 
Client](metron-analytics/metron-profiler-client) will be unable to read the 
profile data.
+
+### `profiler.hbase.salt.divisor`
+
+*Default*: 1000
+
+A salt is prepended to the row key to help prevent hotspotting.  This constant 
is used to generate the salt.  This constant should be roughly equal to the 
number of nodes in the Hbase cluster to ensure even distribution of data.
+
+### `profiler.hbase.table`
+
+*Default*: profiler
+
+The name of the HBase table that profile data is written to.  The Profiler 
expects that the table exists and is writable.  It will not create the table.
+
+### `profiler.hbase.column.family`
+
+*Default*: P
+
+The column family used to store profile data in HBase.
+
+## Running the Profiler
+
+A script located at `$METRON_HOME/bin/start_batch_profiler.sh` has been 
provided to simplify running the Batch Profiler.  The Batch Profiler may also 
be started as follows using the `spark-submit` script.
+
+```
+${SPARK_HOME}/bin/spark-submit \
+    --class org.apache.metron.profiler.spark.cli.BatchProfilerCLI \
+    --properties-file ${SPARK_PROPS_FILE} \
+    ${PROFILER_JAR} \
+    --config ${PROFILER_PROPS_FILE} \
+    --profiles ${PROFILES_FILE}
+```
+
+The Batch Profiler also accepts the following command line arguments when run 
from the command line.
+
+| Argument         | Description
+|---               |---
+| -p, --profiles   | The path to a file containing the profile definitions.
+| -c, --config     | The path to the profiler properties file.
+| -g, --globals    | The path to a properties file containing global 
properties.
+| -h, --help       | Print the help text.
+

http://git-wip-us.apache.org/repos/asf/metron/blob/c7a3dc23/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/metainfo.xml
----------------------------------------------------------------------
diff --git 
a/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/metainfo.xml
 
b/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/metainfo.xml
index f83d93b..eae756a 100644
--- 
a/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/metainfo.xml
+++ 
b/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/metainfo.xml
@@ -404,7 +404,10 @@
               <name>metron-enrichment</name>
             </package>
             <package>
-              <name>metron-profiler</name>
+              <name>metron-profiler-storm</name>
+            </package>
+            <package>
+              <name>metron-profiler-spark</name>
             </package>
             <package>
               <name>metron-indexing</name>

http://git-wip-us.apache.org/repos/asf/metron/blob/c7a3dc23/metron-deployment/packaging/docker/rpm-docker/SPECS/metron.spec
----------------------------------------------------------------------
diff --git a/metron-deployment/packaging/docker/rpm-docker/SPECS/metron.spec 
b/metron-deployment/packaging/docker/rpm-docker/SPECS/metron.spec
index b308908..94dc951 100644
--- a/metron-deployment/packaging/docker/rpm-docker/SPECS/metron.spec
+++ b/metron-deployment/packaging/docker/rpm-docker/SPECS/metron.spec
@@ -58,6 +58,7 @@ Source11:       
metron-management-%{full_version}-archive.tar.gz
 Source12:       metron-maas-service-%{full_version}-archive.tar.gz
 Source13:       metron-alerts-%{full_version}-archive.tar.gz
 Source14:       metron-performance-%{full_version}-archive.tar.gz
+Source15:       metron-profiler-spark-%{full_version}-archive.tar.gz
 
 %description
 Apache Metron provides a scalable advanced security analytics framework
@@ -95,6 +96,7 @@ tar -xzf %{SOURCE11} -C %{buildroot}%{metron_home}
 tar -xzf %{SOURCE12} -C %{buildroot}%{metron_home}
 tar -xzf %{SOURCE13} -C %{buildroot}%{metron_home}
 tar -xzf %{SOURCE14} -C %{buildroot}%{metron_home}
+tar -xzf %{SOURCE15} -C %{buildroot}%{metron_home}
 
 install %{buildroot}%{metron_home}/bin/metron-management-ui 
%{buildroot}/etc/init.d/
 install %{buildroot}%{metron_home}/bin/metron-alerts-ui 
%{buildroot}/etc/init.d/
@@ -379,15 +381,15 @@ This package installs the Metron PCAP files %{metron_home}
 
 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-%package        profiler
-Summary:        Metron Profiler
+%package        profiler-storm
+Summary:        Metron Profiler for Storm
 Group:          Applications/Internet
-Provides:       profiler = %{version}
+Provides:       profiler-storm = %{version}
 
-%description    profiler
-This package installs the Metron Profiler %{metron_home}
+%description    profiler-storm
+This package installs the Metron Profiler for Storm %{metron_home}
 
-%files          profiler
+%files          profiler-storm
 %defattr(-,root,root,755)
 %dir %{metron_root}
 %dir %{metron_home}
@@ -536,6 +538,27 @@ This package installs the Metron Alerts UI %{metron_home}
 
 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
+%package        profiler-spark
+Summary:        Metron Profiler for Spark
+Group:          Applications/Internet
+Provides:       profiler-spark = %{version}
+
+%description    profiler-spark
+This package installs the Metron Profiler for Spark %{metron_home}
+
+%files          profiler-spark
+%defattr(-,root,root,755)
+%dir %{metron_root}
+%dir %{metron_home}
+%dir %{metron_home}/config
+%{metron_home}/config/batch-profiler.properties
+%dir %{metron_home}/bin
+%{metron_home}/bin/start_batch_profiler.sh
+%dir %{metron_home}/lib
+%attr(0644,root,root) 
%{metron_home}/lib/metron-profiler-spark-%{full_version}.jar
+
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
 %post config
 chkconfig --add metron-management-ui
 chkconfig --add metron-alerts-ui
@@ -545,6 +568,8 @@ chkconfig --del metron-management-ui
 chkconfig --del metron-alerts-ui
 
 %changelog
+* Tue Aug 14 2018 Apache Metron <[email protected]> - 0.5.1
+- Add Profiler for Spark
 * Thu Feb 1 2018 Apache Metron <[email protected]> - 0.4.3
 - Add Solr install script to Solr RPM
 * Tue Sep 25 2017 Apache Metron <[email protected]> - 0.4.2

http://git-wip-us.apache.org/repos/asf/metron/blob/c7a3dc23/metron-deployment/packaging/docker/rpm-docker/pom.xml
----------------------------------------------------------------------
diff --git a/metron-deployment/packaging/docker/rpm-docker/pom.xml 
b/metron-deployment/packaging/docker/rpm-docker/pom.xml
index ba57079..1ea8d46 100644
--- a/metron-deployment/packaging/docker/rpm-docker/pom.xml
+++ b/metron-deployment/packaging/docker/rpm-docker/pom.xml
@@ -168,6 +168,12 @@
                                     </includes>
                                 </resource>
                                 <resource>
+                                    
<directory>${metron_dir}/metron-analytics/metron-profiler-spark/target/</directory>
+                                    <includes>
+                                        <include>*.tar.gz</include>
+                                    </includes>
+                                </resource>
+                                <resource>
                                     
<directory>${metron_dir}/metron-interface/metron-rest/target/</directory>
                                     <includes>
                                         <include>*.tar.gz</include>

Reply via email to