[ 
https://issues.apache.org/jira/browse/METRON-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043292#comment-16043292
 ] 

ASF GitHub Bot commented on METRON-992:
---------------------------------------

Github user mmiklavc commented on a diff in the pull request:

    https://github.com/apache/metron/pull/614#discussion_r120979346
  
    --- Diff: metron-platform/Performance-tuning-guide.md ---
    @@ -0,0 +1,326 @@
    +# Metron Performance Tunining Guide
    +
    +## Overview
    +
    +This document provides guidance from our experiences tuning the Apache 
Metron Storm topologies for maximum performance. You'll find
    +suggestions for optimum configurations under a 1 gbps load along with some 
guidance around the tooling we used to monitor and assess
    +our throughput.
    +
    +In the simplest terms, Metron is a streaming architecture created on top 
of Kafka and three main types of Storm topologies: parsers,
    +enrichment, and indexing. Each parser has it's own topology and there is 
also a highly performant, specialized spout-only topology
    +for streaming PCAP data to HDFS. We found that the architecture can be 
tuned almost exclusively through using a few primary Storm and
    +Kafka parameters along with a few Metron-specific options. You can think 
of the data flow as being similar to water flowing through a
    +pipe, and the majority of these options assist in tweaking the various 
pipe widths in the system.
    +
    +## General Suggestions
    +
    +Note that there is currently no method for specifying the number of tasks 
from the number of executors in Flux topologies (enrichment,
    + indexing). By default, the number of tasks will equal the number of 
executors. Logically, setting the number of tasks equal to the number
    +of executors is sensible. Storm enforces # executors <= # tasks. The 
reason you might set the number of tasks higher than the number of
    +executors is for future performance tuning and rebalancing without the 
need to bring down your topologies. The number of tasks is fixed
    +at topology startup time whereas the number of executors can be increased 
up to a maximum value equal to the number of tasks.
    +
    +We found that the default values for poll.timeout.ms, 
offset.commit.period.ms, and max.uncommitted.offsets worked well in nearly all 
cases.
    +As a general rule, it was optimal to set spout parallelism equal to the 
number of partitions used in your Kafka topic. Any greater
    +parallelism will leave you with idle consumers since Kafka limits the max 
number of consumers to the number of partitions. This is
    +important because Kafka has certain ordering guarantees for message 
delivery per partition that would not be possible if more than
    +one consumer in a given consumer group were able to read from that 
partition.
    +
    +## Tooling
    +
    +Before we get to the actual tooling used to monitor performance, it helps 
to describe what we might actually want to monitor and potential
    +pain points. Prior to switching over to the new Storm Kafka client, which 
leverages the new Kafka consumer API under the hood, offsets
    +were stored in Zookeeper. While the broker hosts are still stored in 
Zookeeper, this is no longer true for the offsets which are now
    +stored in Kafka itself. This is a configurable option, and you may switch 
back to Zookeeper if you choose, but Metron is currently using
    +the new defaults. This is useful to know as you're investigating both 
correctness as well as throughput performance.
    +
    +First we need to setup some environment variables
    +```
    +export BROKERLIST=<your broker comma-delimated list of host:ports>
    +export ZOOKEEPER=<your zookeeper comma-delimated list of host:ports>
    +export KAFKA_HOME=<kafka home dir>
    +export METRON_HOME=<your metron home>
    +export HDP_HOME=<your HDP home>
    +```
    +
    +If you have Kerberos enabled, setup the security protocol
    +```
    +$ cat /tmp/consumergroup.config
    +security.protocol=SASL_PLAINTEXT
    +```
    +
    +Now run the following command for a running topology's consumer group. In 
this example we are using enrichments.
    +```
    +${KAFKA_HOME}/bin/kafka-consumer-groups.sh \
    +    --command-config=/tmp/consumergroup.config \
    +    --describe \
    +    --group enrichments \
    +    --bootstrap-server $BROKERLIST \
    +    --new-consumer
    +```
    +
    +This will return a table with the following output depicting offsets for 
all partitions and consumers associated with the specified
    +consumer group:
    +```
    +GROUP                          TOPIC              PARTITION  
CURRENT-OFFSET  LOG-END-OFFSET  LAG             OWNER
    +enrichments                    enrichments        9          29746066      
  29746067        1               consumer-2_/xxx.xxx.xxx.xxx
    +enrichments                    enrichments        3          29754325      
  29754326        1               consumer-1_/xxx.xxx.xxx.xxx
    +enrichments                    enrichments        43         29754331      
  29754332        1               consumer-6_/xxx.xxx.xxx.xxx
    +...
    +```
    +
    +_Note_: You won't see any output until a topology is actually running 
because the consumer groups only exist while consumers in the
    +spouts are up and running.
    +
    +The primary column we're concerned with paying attention to is the LAG 
column, which is the current delta calculation between the
    +current and end offset for the partition. This tells us how close we are 
to keeping up with incoming data. And, as we found through
    +multiple trials, whether there are any problems with specific consumers 
getting stuck.
    +
    +Taking this one step further, it's probably more useful if we can watch 
the offsets and lags change over time. In order to do this
    +we'll add a "watch" command and set the refresh rate to 10 seconds.
    +
    +```
    +watch -n 10 -d ${KAFKA_HOME}/bin/kafka-consumer-groups.sh \
    +    --command-config=/tmp/consumergroup.config \
    +    --describe \
    +    --group enrichments \
    +    --bootstrap-server $BROKERLIST \
    +    --new-consumer
    +```
    +
    +Every 10 seconds the command will re-run and the screen will be refreshed 
with new information. The most useful bit is that the
    +watch command will highlight the differences from the current output and 
the last output screens.
    +
    +We can also monitor our Storm topologies by using the Storm UI - see 
http://www.malinga.me/reading-and-understanding-the-storm-ui-storm-ui-explained/
    +
    +And lastly, you can leverage some GUI tooling to make creating and 
modifying your Kafka topics a bit easier -
    +see https://github.com/yahoo/kafka-manager
    +
    +## General Knobs and Levers
    --- End diff --
    
    Woops, yes.


> Create performance tuning guide
> -------------------------------
>
>                 Key: METRON-992
>                 URL: https://issues.apache.org/jira/browse/METRON-992
>             Project: Metron
>          Issue Type: Task
>            Reporter: Michael Miklavcic
>            Assignee: Michael Miklavcic
>
> We need a guide to outline general guidelines for tuning the Metron Storm 
> topologies and Kafka topics.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to