[
https://issues.apache.org/jira/browse/METRON-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043292#comment-16043292
]
ASF GitHub Bot commented on METRON-992:
---------------------------------------
Github user mmiklavc commented on a diff in the pull request:
https://github.com/apache/metron/pull/614#discussion_r120979346
--- Diff: metron-platform/Performance-tuning-guide.md ---
@@ -0,0 +1,326 @@
+# Metron Performance Tunining Guide
+
+## Overview
+
+This document provides guidance from our experiences tuning the Apache
Metron Storm topologies for maximum performance. You'll find
+suggestions for optimum configurations under a 1 gbps load along with some
guidance around the tooling we used to monitor and assess
+our throughput.
+
+In the simplest terms, Metron is a streaming architecture created on top
of Kafka and three main types of Storm topologies: parsers,
+enrichment, and indexing. Each parser has it's own topology and there is
also a highly performant, specialized spout-only topology
+for streaming PCAP data to HDFS. We found that the architecture can be
tuned almost exclusively through using a few primary Storm and
+Kafka parameters along with a few Metron-specific options. You can think
of the data flow as being similar to water flowing through a
+pipe, and the majority of these options assist in tweaking the various
pipe widths in the system.
+
+## General Suggestions
+
+Note that there is currently no method for specifying the number of tasks
from the number of executors in Flux topologies (enrichment,
+ indexing). By default, the number of tasks will equal the number of
executors. Logically, setting the number of tasks equal to the number
+of executors is sensible. Storm enforces # executors <= # tasks. The
reason you might set the number of tasks higher than the number of
+executors is for future performance tuning and rebalancing without the
need to bring down your topologies. The number of tasks is fixed
+at topology startup time whereas the number of executors can be increased
up to a maximum value equal to the number of tasks.
+
+We found that the default values for poll.timeout.ms,
offset.commit.period.ms, and max.uncommitted.offsets worked well in nearly all
cases.
+As a general rule, it was optimal to set spout parallelism equal to the
number of partitions used in your Kafka topic. Any greater
+parallelism will leave you with idle consumers since Kafka limits the max
number of consumers to the number of partitions. This is
+important because Kafka has certain ordering guarantees for message
delivery per partition that would not be possible if more than
+one consumer in a given consumer group were able to read from that
partition.
+
+## Tooling
+
+Before we get to the actual tooling used to monitor performance, it helps
to describe what we might actually want to monitor and potential
+pain points. Prior to switching over to the new Storm Kafka client, which
leverages the new Kafka consumer API under the hood, offsets
+were stored in Zookeeper. While the broker hosts are still stored in
Zookeeper, this is no longer true for the offsets which are now
+stored in Kafka itself. This is a configurable option, and you may switch
back to Zookeeper if you choose, but Metron is currently using
+the new defaults. This is useful to know as you're investigating both
correctness as well as throughput performance.
+
+First we need to setup some environment variables
+```
+export BROKERLIST=<your broker comma-delimated list of host:ports>
+export ZOOKEEPER=<your zookeeper comma-delimated list of host:ports>
+export KAFKA_HOME=<kafka home dir>
+export METRON_HOME=<your metron home>
+export HDP_HOME=<your HDP home>
+```
+
+If you have Kerberos enabled, setup the security protocol
+```
+$ cat /tmp/consumergroup.config
+security.protocol=SASL_PLAINTEXT
+```
+
+Now run the following command for a running topology's consumer group. In
this example we are using enrichments.
+```
+${KAFKA_HOME}/bin/kafka-consumer-groups.sh \
+ --command-config=/tmp/consumergroup.config \
+ --describe \
+ --group enrichments \
+ --bootstrap-server $BROKERLIST \
+ --new-consumer
+```
+
+This will return a table with the following output depicting offsets for
all partitions and consumers associated with the specified
+consumer group:
+```
+GROUP TOPIC PARTITION
CURRENT-OFFSET LOG-END-OFFSET LAG OWNER
+enrichments enrichments 9 29746066
29746067 1 consumer-2_/xxx.xxx.xxx.xxx
+enrichments enrichments 3 29754325
29754326 1 consumer-1_/xxx.xxx.xxx.xxx
+enrichments enrichments 43 29754331
29754332 1 consumer-6_/xxx.xxx.xxx.xxx
+...
+```
+
+_Note_: You won't see any output until a topology is actually running
because the consumer groups only exist while consumers in the
+spouts are up and running.
+
+The primary column we're concerned with paying attention to is the LAG
column, which is the current delta calculation between the
+current and end offset for the partition. This tells us how close we are
to keeping up with incoming data. And, as we found through
+multiple trials, whether there are any problems with specific consumers
getting stuck.
+
+Taking this one step further, it's probably more useful if we can watch
the offsets and lags change over time. In order to do this
+we'll add a "watch" command and set the refresh rate to 10 seconds.
+
+```
+watch -n 10 -d ${KAFKA_HOME}/bin/kafka-consumer-groups.sh \
+ --command-config=/tmp/consumergroup.config \
+ --describe \
+ --group enrichments \
+ --bootstrap-server $BROKERLIST \
+ --new-consumer
+```
+
+Every 10 seconds the command will re-run and the screen will be refreshed
with new information. The most useful bit is that the
+watch command will highlight the differences from the current output and
the last output screens.
+
+We can also monitor our Storm topologies by using the Storm UI - see
http://www.malinga.me/reading-and-understanding-the-storm-ui-storm-ui-explained/
+
+And lastly, you can leverage some GUI tooling to make creating and
modifying your Kafka topics a bit easier -
+see https://github.com/yahoo/kafka-manager
+
+## General Knobs and Levers
--- End diff --
Woops, yes.
> Create performance tuning guide
> -------------------------------
>
> Key: METRON-992
> URL: https://issues.apache.org/jira/browse/METRON-992
> Project: Metron
> Issue Type: Task
> Reporter: Michael Miklavcic
> Assignee: Michael Miklavcic
>
> We need a guide to outline general guidelines for tuning the Metron Storm
> topologies and Kafka topics.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)