JonZeolla commented on a change in pull request #29: METRON-1991: Bro plugin docker scripts should exit nonzero when bro and kafka counts differ URL: https://github.com/apache/metron-bro-plugin-kafka/pull/29#discussion_r260689836
########## File path: docker/scripts/analyze_results.sh ########## @@ -0,0 +1,207 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +shopt -s nocasematch +#set -u # nounset disabled +set -e # errexit +set -E # errtrap +set -o pipefail + +# +# Analyzes the results.csv files to identify issues +# + +function help { + echo " " + echo "usage: ${0}" + echo " --test-directory [REQUIRED] The directory for the tests" + echo " -h/--help Usage information." + echo " " + echo " " +} + +SCRIPT_NAME=$(basename -- "$0") +TEST_DIRECTORY= +declare -A UNEQUAL_RESULTS +declare -a LOG_NAMES +declare -A LOG_OCCURRENCE +declare -A UNIQ_UNEQUAL_RESULTS +declare -r txtDEFAULT='\033[0m' +# shellcheck disable=SC2034 +declare -r txtERROR='\033[0;31m' +# shellcheck disable=SC2034 +declare -r txtWARN='\033[0;33m' + +# Handle command line options +for i in "$@"; do + case $i in + # + # TEST_DIRECTORY + # + # --test-directory + # + --test-directory=*) + TEST_DIRECTORY="${i#*=}" + shift # past argument=value + ;; + + # + # -h/--help + # + -h | --help) + help + exit 0 + shift # past argument with no value + ;; + + # + # Unknown option + # + *) + UNKNOWN_OPTION="${i#*=}" + _echo ERROR "unknown option: $UNKNOWN_OPTION" + help + ;; + esac +done + +if [[ -z "$TEST_DIRECTORY" ]]; then + echo "$TEST_DIRECTORY must be passed" + exit 1 +fi + +echo "Running ${SCRIPT_NAME} with" +echo "TEST_DIRECTORY = $TEST_DIRECTORY" +echo "===================================================" + +## Main functions +function _echo() { + color="txt${1:-DEFAULT}" + case "${1}" in + ERROR) + >&2 echo -e "${!color}${1}> ${2}${txtDEFAULT}" + ;; + WARN) + echo -e "${!color}${1}> ${2}${txtDEFAULT}" + ;; + *) + echo -e "${!color}${1}> ${2}${txtDEFAULT}" + ;; + esac +} + +function count_occurrences_of_each_log_file +{ + # Count the number of occurences of each log name + for LOG_NAME in "${LOG_NAMES[@]}"; do + (( ++LOG_OCCURRENCE["${LOG_NAME}"] )) + done +} + +function check_for_unequal_log_counts +{ + RESULTS_FILE="${1}" + + # Get the pcap folder name from the provided file + # shellcheck disable=SC2001 + PCAP_FOLDER="$( cd "$( dirname "${RESULTS_FILE}" )" >/dev/null 2>&1 && echo "${PWD##*/}")" + + # Check each log line in the provided log file for unequal results + for LOG_NAME in "${LOG_NAMES[@]}"; do + # For each log in the provided results, identify any unequal log counts + UNEQUAL_LOG=$(awk -F\, -v log_name="${LOG_NAME}" '$1 == log_name && $2 != $3 {print $1}' "${RESULTS_FILE}") + + # Create a space separated list of unequal logs to simulate a + # multidimensional array + if [[ -n "${UNEQUAL_LOG}" ]]; then + if [[ "${#UNEQUAL_RESULTS[${PCAP_FOLDER}]}" -eq 0 ]]; then + UNEQUAL_RESULTS["${PCAP_FOLDER}"]="${UNEQUAL_LOG}" + else + UNEQUAL_RESULTS["${PCAP_FOLDER}"]+=" ${UNEQUAL_LOG}" + fi + fi + done +} + +function print_unequal_results +{ + # Output a table with the pcap file and log name details where the imbalance + # was detected + { + echo "PCAP FOLDER,LOG NAME" + + for KEY in "${!UNEQUAL_RESULTS[@]}"; do + # This must be done because we are simulating multidimensional arrays due to + # the lack of native bash support + for VALUE in ${UNEQUAL_RESULTS[${KEY}]}; do + echo "${KEY},${VALUE}" + done + done + } | column -t -s ',' +} + +function print_log_comparison_insights +{ + # Load the log to instance count mapping from UNEQUAL_RESULTS into a new + # associative array + # shellcheck disable=SC2046 + declare -A $(echo "${UNEQUAL_RESULTS[@]}" | tr ' ' '\n' | sort | uniq -c | awk '{print "UNIQ_UNEQUAL_RESULTS["$2"]="$1}') + + # Compare each log type's instances of inequality to the total number of + # instances of each log. If they are equal, this indicates that there may be + # a log-type related issue. + # + # For example, if count_occurrences_of_each_log_file identified that there + # were 10 instances of http logs across all of the `results.csv` files, + # ${LOG_OCCURRENCE[http]} should equal 10. If check_for_unequal_log_counts + # independently found 10 instances where the http bro and kafka log counts + # from the `results.csv` files were not equal, ${UNIQ_UNEQUAL_RESULTS[http]} + # would also have 10 entries, causing us to warn the user of that insight. Review comment: Yes. `LOG_NAMES` is an array of all of the log names, pulled from `results.csv`. Technically I could replace it with `LOG_OCCURRENCE` and just iterate the keys. That should also let me turn nounset back on, so I'll do that. `LOG_OCCURRENCE` is an associative array that takes all of `results.csv` files and maps the bro log names to the total number of occurrences it found via the function `count_occurrences_of_each_log_file`. `UNEQUAL_RESULTS` keeps track of all of the specific instances of inequality, mapping the pcap names to the log file names that didn't equal from the various `results.csv` files. `UNIQ_UNEQUAL_RESULTS` is the result of sorting and counting `UNEQUAL_RESULTS` then mapping the bro file name to the number of occurrences of inequality. Based on this, we can do a simple comparison of the total number of occurrences of a log file (`LOG_OCCURRENCES`) to the total number of instances where saw log counts that didn't equal between bro and Kafka (`UNIQ_UNEQUAL_RESULTS`). I'm working around some limitations of using bash here. If I was doing this in Python it would be all in one big multidimensional array. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
