Repository: kafka-site Updated Branches: refs/heads/asf-site fdd6433bd -> 24d8d665e
http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/javadoc/org/apache/kafka/common/serialization/ByteArrayDeserializer.html ---------------------------------------------------------------------- diff --git a/090/javadoc/org/apache/kafka/common/serialization/ByteArrayDeserializer.html b/090/javadoc/org/apache/kafka/common/serialization/ByteArrayDeserializer.html index 8f98f03..faa5ea3 100644 --- a/090/javadoc/org/apache/kafka/common/serialization/ByteArrayDeserializer.html +++ b/090/javadoc/org/apache/kafka/common/serialization/ByteArrayDeserializer.html @@ -2,9 +2,9 @@ <!-- NewPage --> <html lang="en"> <head> -<!-- Generated by javadoc (version 1.7.0_80) on Fri Nov 13 08:33:05 PST 2015 --> +<!-- Generated by javadoc (version 1.7.0_80) on Tue Nov 17 18:38:10 PST 2015 --> <title>ByteArrayDeserializer (clients 0.9.0.0 API)</title> -<meta name="date" content="2015-11-13"> +<meta name="date" content="2015-11-17"> <link rel="stylesheet" type="text/css" href="../../../../../stylesheet.css" title="Style"> </head> <body> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/javadoc/org/apache/kafka/common/serialization/ByteArraySerializer.html ---------------------------------------------------------------------- diff --git a/090/javadoc/org/apache/kafka/common/serialization/ByteArraySerializer.html b/090/javadoc/org/apache/kafka/common/serialization/ByteArraySerializer.html index bef894a..71681fa 100644 --- a/090/javadoc/org/apache/kafka/common/serialization/ByteArraySerializer.html +++ b/090/javadoc/org/apache/kafka/common/serialization/ByteArraySerializer.html @@ -2,9 +2,9 @@ <!-- NewPage --> <html lang="en"> <head> -<!-- Generated by javadoc (version 1.7.0_80) on Fri Nov 13 08:33:05 PST 2015 --> +<!-- Generated by javadoc (version 1.7.0_80) on Tue Nov 17 18:38:10 PST 2015 --> <title>ByteArraySerializer (clients 0.9.0.0 API)</title> -<meta name="date" content="2015-11-13"> +<meta name="date" content="2015-11-17"> <link rel="stylesheet" type="text/css" href="../../../../../stylesheet.css" title="Style"> </head> <body> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/javadoc/org/apache/kafka/common/serialization/Deserializer.html ---------------------------------------------------------------------- diff --git a/090/javadoc/org/apache/kafka/common/serialization/Deserializer.html b/090/javadoc/org/apache/kafka/common/serialization/Deserializer.html index c9f086b..e3358ac 100644 --- a/090/javadoc/org/apache/kafka/common/serialization/Deserializer.html +++ b/090/javadoc/org/apache/kafka/common/serialization/Deserializer.html @@ -2,9 +2,9 @@ <!-- NewPage --> <html lang="en"> <head> -<!-- Generated by javadoc (version 1.7.0_80) on Fri Nov 13 08:33:05 PST 2015 --> +<!-- Generated by javadoc (version 1.7.0_80) on Tue Nov 17 18:38:10 PST 2015 --> <title>Deserializer (clients 0.9.0.0 API)</title> -<meta name="date" content="2015-11-13"> +<meta name="date" content="2015-11-17"> <link rel="stylesheet" type="text/css" href="../../../../../stylesheet.css" title="Style"> </head> <body> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/javadoc/org/apache/kafka/common/serialization/IntegerDeserializer.html ---------------------------------------------------------------------- diff --git a/090/javadoc/org/apache/kafka/common/serialization/IntegerDeserializer.html b/090/javadoc/org/apache/kafka/common/serialization/IntegerDeserializer.html index 7e285b7..d6768cb 100644 --- a/090/javadoc/org/apache/kafka/common/serialization/IntegerDeserializer.html +++ b/090/javadoc/org/apache/kafka/common/serialization/IntegerDeserializer.html @@ -2,9 +2,9 @@ <!-- NewPage --> <html lang="en"> <head> -<!-- Generated by javadoc (version 1.7.0_80) on Fri Nov 13 08:33:05 PST 2015 --> +<!-- Generated by javadoc (version 1.7.0_80) on Tue Nov 17 18:38:10 PST 2015 --> <title>IntegerDeserializer (clients 0.9.0.0 API)</title> -<meta name="date" content="2015-11-13"> +<meta name="date" content="2015-11-17"> <link rel="stylesheet" type="text/css" href="../../../../../stylesheet.css" title="Style"> </head> <body> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/javadoc/org/apache/kafka/common/serialization/IntegerSerializer.html ---------------------------------------------------------------------- diff --git a/090/javadoc/org/apache/kafka/common/serialization/IntegerSerializer.html b/090/javadoc/org/apache/kafka/common/serialization/IntegerSerializer.html index 0ba5cb0..e1dd089 100644 --- a/090/javadoc/org/apache/kafka/common/serialization/IntegerSerializer.html +++ b/090/javadoc/org/apache/kafka/common/serialization/IntegerSerializer.html @@ -2,9 +2,9 @@ <!-- NewPage --> <html lang="en"> <head> -<!-- Generated by javadoc (version 1.7.0_80) on Fri Nov 13 08:33:05 PST 2015 --> +<!-- Generated by javadoc (version 1.7.0_80) on Tue Nov 17 18:38:10 PST 2015 --> <title>IntegerSerializer (clients 0.9.0.0 API)</title> -<meta name="date" content="2015-11-13"> +<meta name="date" content="2015-11-17"> <link rel="stylesheet" type="text/css" href="../../../../../stylesheet.css" title="Style"> </head> <body> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/javadoc/org/apache/kafka/common/serialization/LongDeserializer.html ---------------------------------------------------------------------- diff --git a/090/javadoc/org/apache/kafka/common/serialization/LongDeserializer.html b/090/javadoc/org/apache/kafka/common/serialization/LongDeserializer.html index 5499dd7..0b58719 100644 --- a/090/javadoc/org/apache/kafka/common/serialization/LongDeserializer.html +++ b/090/javadoc/org/apache/kafka/common/serialization/LongDeserializer.html @@ -2,9 +2,9 @@ <!-- NewPage --> <html lang="en"> <head> -<!-- Generated by javadoc (version 1.7.0_80) on Fri Nov 13 08:33:05 PST 2015 --> +<!-- Generated by javadoc (version 1.7.0_80) on Tue Nov 17 18:38:10 PST 2015 --> <title>LongDeserializer (clients 0.9.0.0 API)</title> -<meta name="date" content="2015-11-13"> +<meta name="date" content="2015-11-17"> <link rel="stylesheet" type="text/css" href="../../../../../stylesheet.css" title="Style"> </head> <body> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/javadoc/org/apache/kafka/common/serialization/LongSerializer.html ---------------------------------------------------------------------- diff --git a/090/javadoc/org/apache/kafka/common/serialization/LongSerializer.html b/090/javadoc/org/apache/kafka/common/serialization/LongSerializer.html index 1d99e97..49d52e0 100644 --- a/090/javadoc/org/apache/kafka/common/serialization/LongSerializer.html +++ b/090/javadoc/org/apache/kafka/common/serialization/LongSerializer.html @@ -2,9 +2,9 @@ <!-- NewPage --> <html lang="en"> <head> -<!-- Generated by javadoc (version 1.7.0_80) on Fri Nov 13 08:33:05 PST 2015 --> +<!-- Generated by javadoc (version 1.7.0_80) on Tue Nov 17 18:38:10 PST 2015 --> <title>LongSerializer (clients 0.9.0.0 API)</title> -<meta name="date" content="2015-11-13"> +<meta name="date" content="2015-11-17"> <link rel="stylesheet" type="text/css" href="../../../../../stylesheet.css" title="Style"> </head> <body> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/javadoc/org/apache/kafka/common/serialization/Serializer.html ---------------------------------------------------------------------- diff --git a/090/javadoc/org/apache/kafka/common/serialization/Serializer.html b/090/javadoc/org/apache/kafka/common/serialization/Serializer.html index 0e88312..69ca220 100644 --- a/090/javadoc/org/apache/kafka/common/serialization/Serializer.html +++ b/090/javadoc/org/apache/kafka/common/serialization/Serializer.html @@ -2,9 +2,9 @@ <!-- NewPage --> <html lang="en"> <head> -<!-- Generated by javadoc (version 1.7.0_80) on Fri Nov 13 08:33:05 PST 2015 --> +<!-- Generated by javadoc (version 1.7.0_80) on Tue Nov 17 18:38:10 PST 2015 --> <title>Serializer (clients 0.9.0.0 API)</title> -<meta name="date" content="2015-11-13"> +<meta name="date" content="2015-11-17"> <link rel="stylesheet" type="text/css" href="../../../../../stylesheet.css" title="Style"> </head> <body> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/javadoc/org/apache/kafka/common/serialization/StringDeserializer.html ---------------------------------------------------------------------- diff --git a/090/javadoc/org/apache/kafka/common/serialization/StringDeserializer.html b/090/javadoc/org/apache/kafka/common/serialization/StringDeserializer.html index 0f8947a..fed4830 100644 --- a/090/javadoc/org/apache/kafka/common/serialization/StringDeserializer.html +++ b/090/javadoc/org/apache/kafka/common/serialization/StringDeserializer.html @@ -2,9 +2,9 @@ <!-- NewPage --> <html lang="en"> <head> -<!-- Generated by javadoc (version 1.7.0_80) on Fri Nov 13 08:33:05 PST 2015 --> +<!-- Generated by javadoc (version 1.7.0_80) on Tue Nov 17 18:38:10 PST 2015 --> <title>StringDeserializer (clients 0.9.0.0 API)</title> -<meta name="date" content="2015-11-13"> +<meta name="date" content="2015-11-17"> <link rel="stylesheet" type="text/css" href="../../../../../stylesheet.css" title="Style"> </head> <body> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/javadoc/org/apache/kafka/common/serialization/StringSerializer.html ---------------------------------------------------------------------- diff --git a/090/javadoc/org/apache/kafka/common/serialization/StringSerializer.html b/090/javadoc/org/apache/kafka/common/serialization/StringSerializer.html index afe335f..eaa4a04 100644 --- a/090/javadoc/org/apache/kafka/common/serialization/StringSerializer.html +++ b/090/javadoc/org/apache/kafka/common/serialization/StringSerializer.html @@ -2,9 +2,9 @@ <!-- NewPage --> <html lang="en"> <head> -<!-- Generated by javadoc (version 1.7.0_80) on Fri Nov 13 08:33:05 PST 2015 --> +<!-- Generated by javadoc (version 1.7.0_80) on Tue Nov 17 18:38:10 PST 2015 --> <title>StringSerializer (clients 0.9.0.0 API)</title> -<meta name="date" content="2015-11-13"> +<meta name="date" content="2015-11-17"> <link rel="stylesheet" type="text/css" href="../../../../../stylesheet.css" title="Style"> </head> <body> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/javadoc/org/apache/kafka/common/serialization/package-frame.html ---------------------------------------------------------------------- diff --git a/090/javadoc/org/apache/kafka/common/serialization/package-frame.html b/090/javadoc/org/apache/kafka/common/serialization/package-frame.html index f5b0c55..5e66acc 100644 --- a/090/javadoc/org/apache/kafka/common/serialization/package-frame.html +++ b/090/javadoc/org/apache/kafka/common/serialization/package-frame.html @@ -2,9 +2,9 @@ <!-- NewPage --> <html lang="en"> <head> -<!-- Generated by javadoc (version 1.7.0_80) on Fri Nov 13 08:33:06 PST 2015 --> +<!-- Generated by javadoc (version 1.7.0_80) on Tue Nov 17 18:38:11 PST 2015 --> <title>org.apache.kafka.common.serialization (clients 0.9.0.0 API)</title> -<meta name="date" content="2015-11-13"> +<meta name="date" content="2015-11-17"> <link rel="stylesheet" type="text/css" href="../../../../../stylesheet.css" title="Style"> </head> <body> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/javadoc/org/apache/kafka/common/serialization/package-summary.html ---------------------------------------------------------------------- diff --git a/090/javadoc/org/apache/kafka/common/serialization/package-summary.html b/090/javadoc/org/apache/kafka/common/serialization/package-summary.html index e4f7d26..364ac89 100644 --- a/090/javadoc/org/apache/kafka/common/serialization/package-summary.html +++ b/090/javadoc/org/apache/kafka/common/serialization/package-summary.html @@ -2,9 +2,9 @@ <!-- NewPage --> <html lang="en"> <head> -<!-- Generated by javadoc (version 1.7.0_80) on Fri Nov 13 08:33:06 PST 2015 --> +<!-- Generated by javadoc (version 1.7.0_80) on Tue Nov 17 18:38:11 PST 2015 --> <title>org.apache.kafka.common.serialization (clients 0.9.0.0 API)</title> -<meta name="date" content="2015-11-13"> +<meta name="date" content="2015-11-17"> <link rel="stylesheet" type="text/css" href="../../../../../stylesheet.css" title="Style"> </head> <body> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/javadoc/org/apache/kafka/common/serialization/package-tree.html ---------------------------------------------------------------------- diff --git a/090/javadoc/org/apache/kafka/common/serialization/package-tree.html b/090/javadoc/org/apache/kafka/common/serialization/package-tree.html index 044da2d..5c005b6 100644 --- a/090/javadoc/org/apache/kafka/common/serialization/package-tree.html +++ b/090/javadoc/org/apache/kafka/common/serialization/package-tree.html @@ -2,9 +2,9 @@ <!-- NewPage --> <html lang="en"> <head> -<!-- Generated by javadoc (version 1.7.0_80) on Fri Nov 13 08:33:06 PST 2015 --> +<!-- Generated by javadoc (version 1.7.0_80) on Tue Nov 17 18:38:11 PST 2015 --> <title>org.apache.kafka.common.serialization Class Hierarchy (clients 0.9.0.0 API)</title> -<meta name="date" content="2015-11-13"> +<meta name="date" content="2015-11-17"> <link rel="stylesheet" type="text/css" href="../../../../../stylesheet.css" title="Style"> </head> <body> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/javadoc/overview-frame.html ---------------------------------------------------------------------- diff --git a/090/javadoc/overview-frame.html b/090/javadoc/overview-frame.html index 8106f11..989bbbe 100644 --- a/090/javadoc/overview-frame.html +++ b/090/javadoc/overview-frame.html @@ -2,9 +2,9 @@ <!-- NewPage --> <html lang="en"> <head> -<!-- Generated by javadoc (version 1.7.0_80) on Fri Nov 13 08:33:06 PST 2015 --> +<!-- Generated by javadoc (version 1.7.0_80) on Tue Nov 17 18:38:11 PST 2015 --> <title>Overview List (clients 0.9.0.0 API)</title> -<meta name="date" content="2015-11-13"> +<meta name="date" content="2015-11-17"> <link rel="stylesheet" type="text/css" href="stylesheet.css" title="Style"> </head> <body> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/javadoc/overview-summary.html ---------------------------------------------------------------------- diff --git a/090/javadoc/overview-summary.html b/090/javadoc/overview-summary.html index 12d1e03..5fcda96 100644 --- a/090/javadoc/overview-summary.html +++ b/090/javadoc/overview-summary.html @@ -2,9 +2,9 @@ <!-- NewPage --> <html lang="en"> <head> -<!-- Generated by javadoc (version 1.7.0_80) on Fri Nov 13 08:33:06 PST 2015 --> +<!-- Generated by javadoc (version 1.7.0_80) on Tue Nov 17 18:38:11 PST 2015 --> <title>Overview (clients 0.9.0.0 API)</title> -<meta name="date" content="2015-11-13"> +<meta name="date" content="2015-11-17"> <link rel="stylesheet" type="text/css" href="stylesheet.css" title="Style"> </head> <body> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/javadoc/overview-tree.html ---------------------------------------------------------------------- diff --git a/090/javadoc/overview-tree.html b/090/javadoc/overview-tree.html index d6f3519..714c997 100644 --- a/090/javadoc/overview-tree.html +++ b/090/javadoc/overview-tree.html @@ -2,9 +2,9 @@ <!-- NewPage --> <html lang="en"> <head> -<!-- Generated by javadoc (version 1.7.0_80) on Fri Nov 13 08:33:06 PST 2015 --> +<!-- Generated by javadoc (version 1.7.0_80) on Tue Nov 17 18:38:11 PST 2015 --> <title>Class Hierarchy (clients 0.9.0.0 API)</title> -<meta name="date" content="2015-11-13"> +<meta name="date" content="2015-11-17"> <link rel="stylesheet" type="text/css" href="stylesheet.css" title="Style"> </head> <body> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/javadoc/serialized-form.html ---------------------------------------------------------------------- diff --git a/090/javadoc/serialized-form.html b/090/javadoc/serialized-form.html index 29fd75a..1d4f8d3 100644 --- a/090/javadoc/serialized-form.html +++ b/090/javadoc/serialized-form.html @@ -2,9 +2,9 @@ <!-- NewPage --> <html lang="en"> <head> -<!-- Generated by javadoc (version 1.7.0_80) on Fri Nov 13 08:33:06 PST 2015 --> +<!-- Generated by javadoc (version 1.7.0_80) on Tue Nov 17 18:38:11 PST 2015 --> <title>Serialized Form (clients 0.9.0.0 API)</title> -<meta name="date" content="2015-11-13"> +<meta name="date" content="2015-11-17"> <link rel="stylesheet" type="text/css" href="stylesheet.css" title="Style"> </head> <body> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/kafka_config.html ---------------------------------------------------------------------- diff --git a/090/kafka_config.html b/090/kafka_config.html index c56b29e..4cac226 100644 --- a/090/kafka_config.html +++ b/090/kafka_config.html @@ -200,7 +200,7 @@ <tr> <td>num.partitions</td><td>The default number of log partitions per topic</td><td>int</td><td>1</td><td>[1,...]</td><td>medium</td></tr> <tr> -<td>principal.builder.class</td><td>principal builder to generate a java Principal. This config is optional for client.</td><td>class</td><td>class org.apache.kafka.common.security.auth.DefaultPrincipalBuilder</td><td></td><td>medium</td></tr> +<td>principal.builder.class</td><td>The fully qualified name of a class that implements the PrincipalBuilder interface, which is currently used to build the Principal for connections with the SSL SecurityProtocol. Default is DefaultPrincipalBuilder.</td><td>class</td><td>class org.apache.kafka.common.security.auth.DefaultPrincipalBuilder</td><td></td><td>medium</td></tr> <tr> <td>producer.purgatory.purge.interval.requests</td><td>The purge interval (in number of requests) of the producer request purgatory</td><td>int</td><td>1000</td><td></td><td>medium</td></tr> <tr> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/migration.html ---------------------------------------------------------------------- diff --git a/090/migration.html b/090/migration.html index 18ab6d4..2da6a7e 100644 --- a/090/migration.html +++ b/090/migration.html @@ -5,9 +5,9 @@ The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at - + http://www.apache.org/licenses/LICENSE-2.0 - + Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @@ -16,11 +16,11 @@ --> <!--#include virtual="../includes/header.html" --> -<h2>Migrating from 0.7.x to 0.8</h2> +<h2><a id="migration" href="#migration">Migrating from 0.7.x to 0.8</a></h2> 0.8 is our first (and hopefully last) release with a non-backwards-compatible wire protocol, ZooKeeper layout, and on-disk data format. This was a chance for us to clean up a lot of cruft and start fresh. This means performing a no-downtime upgrade is more painful than normal—you cannot just swap in the new code in-place. -<h3>Migration Steps</h3> +<h3><a id="migration_steps" href="#migration_steps">Migration Steps</a></h3> <ol> <li>Setup a new cluster running 0.8. http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/ops.html ---------------------------------------------------------------------- diff --git a/090/ops.html b/090/ops.html index af1c15c..2624527 100644 --- a/090/ops.html +++ b/090/ops.html @@ -5,9 +5,9 @@ The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at - + http://www.apache.org/licenses/LICENSE-2.0 - + Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @@ -17,17 +17,17 @@ Here is some information on actually running Kafka as a production system based on usage and experience at LinkedIn. Please send us any additional tips you know of. -<h3><a id="basic_ops">6.1 Basic Kafka Operations</a></h3> +<h3><a id="basic_ops" href="#basic_ops">6.1 Basic Kafka Operations</a></h3> This section will review the most common operations you will perform on your Kafka cluster. All of the tools reviewed in this section are available under the <code>bin/</code> directory of the Kafka distribution and each tool will print details on all possible commandline options if it is run with no arguments. - -<h4><a id="basic_ops_add_topic">Adding and removing topics</a></h4> + +<h4><a id="basic_ops_add_topic" href="#basic_ops_add_topic">Adding and removing topics</a></h4> You have the option of either adding topics manually or having them be created automatically when data is first published to a non-existent topic. If topics are auto-created then you may want to tune the default <a href="#topic-config">topic configurations</a> used for auto-created topics. <p> Topics are added and modified using the topic tool: <pre> - > bin/kafka-topics.sh --zookeeper zk_host:port/chroot --create --topic my_topic_name + > bin/kafka-topics.sh --zookeeper zk_host:port/chroot --create --topic my_topic_name --partitions 20 --replication-factor 3 --config x=y </pre> The replication factor controls how many servers will replicate each message that is written. If you have a replication factor of 3 then up to 2 servers can fail before you will lose access to your data. We recommend you use a replication factor of 2 or 3 so that you can transparently bounce machines without interrupting data consumption. @@ -36,14 +36,14 @@ The partition count controls how many logs the topic will be sharded into. There <p> The configurations added on the command line override the default settings the server has for things like the length of time data should be retained. The complete set of per-topic configurations is documented <a href="#topic-config">here</a>. -<h4><a id="basic_ops_modify_topic">Modifying topics</a></h4> +<h4><a id="basic_ops_modify_topic" href="#basic_ops_modify_topic">Modifying topics</a></h4> You can change the configuration or partitioning of a topic using the same topic tool. <p> To add partitions you can do <pre> - > bin/kafka-topics.sh --zookeeper zk_host:port/chroot --alter --topic my_topic_name - --partitions 40 + > bin/kafka-topics.sh --zookeeper zk_host:port/chroot --alter --topic my_topic_name + --partitions 40 </pre> Be aware that one use case for partitions is to semantically partition data, and adding partitions doesn't change the partitioning of existing data so this may disturb consumers if they rely on that partition. That is if data is partitioned by <code>hash(key) % number_of_partitions</code> then this partitioning will potentially be shuffled by adding partitions but Kafka will not attempt to automatically redistribute data in any way. <p> @@ -64,7 +64,7 @@ Topic deletion option is disabled by default. To enable it set the server config <p> Kafka does not currently support reducing the number of partitions for a topic or changing the replication factor. -<h4><a id="basic_ops_restarting">Graceful shutdown</a></h4> +<h4><a id="basic_ops_restarting" href="#basic_ops_restarting">Graceful shutdown</a></h4> The Kafka cluster will automatically detect any broker shutdown or failure and elect new leaders for the partitions on that machine. This will occur whether a server fails or it is brought down intentionally for maintenance or configuration changes. For the later cases Kafka supports a more graceful mechanism for stoping a server then just killing it. @@ -80,7 +80,7 @@ Syncing the logs will happen automatically happen whenever the server is stopped </pre> Note that controlled shutdown will only succeed if <i>all</i> the partitions hosted on the broker have replicas (i.e. the replication factor is greater than 1 <i>and</i> at least one of these replicas is alive). This is generally what you want since shutting down the last replica would make that topic partition unavailable. -<h4><a id="basic_ops_leader_balancing">Balancing leadership</a></h4> +<h4><a id="basic_ops_leader_balancing" href="#basic_ops_leader_balancing">Balancing leadership</a></h4> Whenever a broker stops or crashes leadership for that broker's partitions transfers to other replicas. This means that by default when the broker is restarted it will only be a follower for all its partitions, meaning it will not be used for client reads and writes. <p> @@ -94,7 +94,7 @@ Since running this command can be tedious you can also configure Kafka to do thi auto.leader.rebalance.enable=true </pre> -<h4><a id="basic_ops_mirror_maker">Mirroring data between clusters</a></h4> +<h4><a id="basic_ops_mirror_maker" href="#basic_ops_mirror_maker">Mirroring data between clusters</a></h4> We refer to the process of replicating data <i>between</i> Kafka clusters "mirroring" to avoid confusion with the replication that happens amongst the nodes in a single cluster. Kafka comes with a tool for mirroring data between Kafka clusters. The tool reads from a source cluster and writes to a destination cluster, like this: <p> @@ -111,7 +111,7 @@ The source and destination clusters are completely independent entities: they ca Here is an example showing how to mirror a single topic (named <i>my-topic</i>) from two input clusters: <pre> > bin/kafka-run-class.sh kafka.tools.MirrorMaker - --consumer.config consumer-1.properties --consumer.config consumer-2.properties + --consumer.config consumer-1.properties --consumer.config consumer-2.properties --producer.config producer.properties --whitelist my-topic </pre> Note that we specify the list of topics with the <code>--whitelist</code> option. This option allows any regular expression using <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Java-style regular expressions</a>. So you could mirror two topics named <i>A</i> and <i>B</i> using <code>--whitelist 'A|B'</code>. Or you could mirror <i>all</i> topics using <code>--whitelist '*'</code>. Make sure to quote any regular expression to ensure the shell doesn't try to expand it as a file path. For convenience we allow the use of ',' instead of '|' to specify a list of topics. @@ -120,7 +120,7 @@ Sometime it is easier to say what it is that you <i>don't</i> want. Instead of u <p> Combining mirroring with the configuration <code>auto.create.topics.enable=true</code> makes it possible to have a replica cluster that will automatically create and replicate all data in a source cluster even as new topics are added. -<h4><a id="basic_ops_consumer_lag">Checking consumer position</a></h4> +<h4><a id="basic_ops_consumer_lag" href="#basic_ops_consumer_lag">Checking consumer position</a></h4> Sometimes it's useful to see the position of your consumers. We have a tool that will show the position of all consumers in a consumer group as well as how far behind the end of the log they are. To run this tool on a consumer group named <i>my-group</i> consuming a topic named <i>my-topic</i> would look like this: <pre> > bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zkconnect localhost:2181 --group test @@ -129,13 +129,13 @@ my-group my-topic 0 0 0 my-group my-topic 1 0 0 0 test_jkreps-mn-1394154521217-1a0be913-0 </pre> -<h4><a id="basic_ops_cluster_expansion">Expanding your cluster</a></h4> +<h4><a id="basic_ops_cluster_expansion" href="#basic_ops_cluster_expansion">Expanding your cluster</a></h4> Adding servers to a Kafka cluster is easy, just assign them a unique broker id and start up Kafka on your new servers. However these new servers will not automatically be assigned any data partitions, so unless partitions are moved to them they won't be doing any work until new topics are created. So usually when you add machines to your cluster you will want to migrate some existing data to these machines. <p> The process of migrating data is manually initiated but fully automated. Under the covers what happens is that Kafka will add the new server as a follower of the partition it is migrating and allow it to fully replicate the existing data in that partition. When the new server has fully replicated the contents of this partition and joined the in-sync replica one of the existing replicas will delete their partition's data. <p> -The partition reassignment tool can be used to move partitions across brokers. An ideal partition distribution would ensure even data load and partition sizes across all brokers. +The partition reassignment tool can be used to move partitions across brokers. An ideal partition distribution would ensure even data load and partition sizes across all brokers. The partition reassignment tool does not have the capability to automatically study the data distribution in a Kafka cluster and move partitions around to attain an even load distribution. As such, the admin has to figure out which topics or partitions should be moved around. <p> The partition reassignment tool can run in 3 mutually exclusive modes - <ul> @@ -143,8 +143,8 @@ The partition reassignment tool can run in 3 mutually exclusive modes - <li>--execute: In this mode, the tool kicks off the reassignment of partitions based on the user provided reassignment plan. (using the --reassignment-json-file option). This can either be a custom reassignment plan hand crafted by the admin or provided by using the --generate option</li> <li>--verify: In this mode, the tool verifies the status of the reassignment for all partitions listed during the last --execute. The status can be either of successfully completed, failed or in progress</li> </ul> -<h5>Automatically migrating data to new machines</h5> -The partition reassignment tool can be used to move some topics off of the current set of brokers to the newly added brokers. This is typically useful while expanding an existing cluster since it is easier to move entire topics to the new set of brokers, than moving one partition at a time. When used to do this, the user should provide a list of topics that should be moved to the new set of brokers and a target list of new brokers. The tool then evenly distributes all partitions for the given list of topics across the new set of brokers. During this move, the replication factor of the topic is kept constant. Effectively the replicas for all partitions for the input list of topics are moved from the old set of brokers to the newly added brokers. +<h5><a id="basic_ops_automigrate" href="#basic_ops_automigrate">Automatically migrating data to new machines</a></h5> +The partition reassignment tool can be used to move some topics off of the current set of brokers to the newly added brokers. This is typically useful while expanding an existing cluster since it is easier to move entire topics to the new set of brokers, than moving one partition at a time. When used to do this, the user should provide a list of topics that should be moved to the new set of brokers and a target list of new brokers. The tool then evenly distributes all partitions for the given list of topics across the new set of brokers. During this move, the replication factor of the topic is kept constant. Effectively the replicas for all partitions for the input list of topics are moved from the old set of brokers to the newly added brokers. <p> For instance, the following example will move all partitions for topics foo1,foo2 to the new set of brokers 5,6. At the end of this move, all partitions for topics foo1 and foo2 will <i>only</i> exist on brokers 5,6 <p> @@ -158,7 +158,7 @@ Since, the tool accepts the input list of topics as a json file, you first need </pre> Once the json file is ready, use the partition reassignment tool to generate a candidate assignment- <pre> -> bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --topics-to-move-json-file topics-to-move.json --broker-list "5,6" --generate +> bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --topics-to-move-json-file topics-to-move.json --broker-list "5,6" --generate Current partition replica assignment {"version":1, @@ -216,11 +216,11 @@ Reassignment of partition [foo1,0] completed successfully Reassignment of partition [foo1,1] is in progress Reassignment of partition [foo1,2] is in progress Reassignment of partition [foo2,0] completed successfully -Reassignment of partition [foo2,1] completed successfully -Reassignment of partition [foo2,2] completed successfully +Reassignment of partition [foo2,1] completed successfully +Reassignment of partition [foo2,2] completed successfully </pre> -<h5>Custom partition assignment and migration</h5> +<h5><a id="basic_ops_partitionassignment" href="#basic_ops_partitionassignment">Custom partition assignment and migration</a></h5> The partition reassignment tool can also be used to selectively move replicas of a partition to a specific set of brokers. When used in this manner, it is assumed that the user knows the reassignment plan and does not require the tool to generate a candidate reassignment, effectively skipping the --generate step and moving straight to the --execute step <p> For instance, the following example moves partition 0 of topic foo1 to brokers 5,6 and partition 1 of topic foo2 to brokers 2,3 @@ -253,14 +253,14 @@ The --verify option can be used with the tool to check the status of the partiti bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file custom-reassignment.json --verify Status of partition reassignment: Reassignment of partition [foo1,0] completed successfully -Reassignment of partition [foo2,1] completed successfully +Reassignment of partition [foo2,1] completed successfully </pre> -<h4><a id="basic_ops_decommissioning_brokers">Decommissioning brokers</a></h4> +<h4><a id="basic_ops_decommissioning_brokers" href="#basic_ops_decommissioning_brokers">Decommissioning brokers</a></h4> The partition reassignment tool does not have the ability to automatically generate a reassignment plan for decommissioning brokers yet. As such, the admin has to come up with a reassignment plan to move the replica for all partitions hosted on the broker to be decommissioned, to the rest of the brokers. This can be relatively tedious as the reassignment needs to ensure that all the replicas are not moved from the decommissioned broker to only one other broker. To make this process effortless, we plan to add tooling support for decommissioning brokers in the future. -<h4><a id="basic_ops_increase_replication_factor">Increasing replication factor</a></h4> -Increasing the replication factor of an existing partition is easy. Just specify the extra replicas in the custom reassignment json file and use it with the --execute option to increase the replication factor of the specified partitions. +<h4><a id="basic_ops_increase_replication_factor" href="#basic_ops_increase_replication_factor">Increasing replication factor</a></h4> +Increasing the replication factor of an existing partition is easy. Just specify the extra replicas in the custom reassignment json file and use it with the --execute option to increase the replication factor of the specified partitions. <p> For instance, the following example increases the replication factor of partition 0 of topic foo from 1 to 3. Before increasing the replication factor, the partition's only replica existed on broker 5. As part of increasing the replication factor, we will add more replicas on brokers 6 and 7. <p> @@ -297,7 +297,7 @@ Topic:foo PartitionCount:1 ReplicationFactor:3 Configs: Topic: foo Partition: 0 Leader: 5 Replicas: 5,6,7 Isr: 5,6,7 </pre> -<h4><a id="quotas">Setting quotas</a></h4> +<h4><a id="quotas" href="#quotas">Setting quotas</a></h4> It is possible to set default quotas that apply to all client-ids by setting these configs on the brokers. By default, each client-id receives an unlimited quota. The following sets the default quota per producer and consumer client-id to 10MB/sec. <pre> quota.producer.default=10485760 @@ -316,7 +316,7 @@ Here's how to describe the quota for a given client. Configs for clients:clientA are producer_byte_rate=1024,consumer_byte_rate=2048 </pre> -<h3><a id="datacenters">6.2 Datacenters</a></h3> +<h3><a id="datacenters" href="#datacenters">6.2 Datacenters</a></h3> Some deployments will need to manage a data pipeline that spans multiple datacenters. Our recommended approach to this is to deploy a local Kafka cluster in each datacenter with application instances in each datacenter interacting only with their local cluster and mirroring between clusters (see the documentation on the <a href="#basic_ops_mirror_maker">mirror maker tool</a> for how to do this). <p> @@ -326,13 +326,13 @@ For applications that need a global view of all data you can use mirroring to pr <p> This is not the only possible deployment pattern. It is possible to read from or write to a remote Kafka cluster over the WAN, though obviously this will add whatever latency is required to get the cluster. <p> -Kafka naturally batches data in both the producer and consumer so it can achieve high-throughput even over a high-latency connection. To allow this though it may be necessary to increase the TCP socket buffer sizes for the producer, consumer, and broker using the <code>socket.send.buffer.bytes</code> and <code>socket.receive.buffer.bytes</code> configurations. The appropriate way to set this is documented <a href="http://en.wikipedia.org/wiki/Bandwidth-delay_product">here</a>. +Kafka naturally batches data in both the producer and consumer so it can achieve high-throughput even over a high-latency connection. To allow this though it may be necessary to increase the TCP socket buffer sizes for the producer, consumer, and broker using the <code>socket.send.buffer.bytes</code> and <code>socket.receive.buffer.bytes</code> configurations. The appropriate way to set this is documented <a href="http://en.wikipedia.org/wiki/Bandwidth-delay_product">here</a>. <p> It is generally <i>not</i> advisable to run a <i>single</i> Kafka cluster that spans multiple datacenters over a high-latency link. This will incur very high replication latency both for Kafka writes and ZooKeeper writes, and neither Kafka nor ZooKeeper will remain available in all locations if the network between locations is unavailable. -<h3><a id="config">6.3 Kafka Configuration</a></h3> +<h3><a id="config" href="#config">6.3 Kafka Configuration</a></h3> -<h4><a id="clientconfig">Important Client Configurations</a></h4> +<h4><a id="clientconfig" href="#clientconfig">Important Client Configurations</a></h4> The most important producer configurations control <ul> <li>compression</li> @@ -343,7 +343,7 @@ The most important consumer configuration is the fetch size. <p> All configurations are documented in the <a href="#configuration">configuration</a> section. <p> -<h4><a id="prodconfig">A Production Server Config</a></h4> +<h4><a id="prodconfig" href="#prodconfig">A Production Server Config</a></h4> Here is our server production server configuration: <pre> # Replication configurations @@ -389,7 +389,7 @@ producer.purgatory.purge.interval.requests=100 Our client configuration varies a fair amount between different use cases. -<h3><a id="java">Java Version</a></h3> +<h3><a id="java" href="#java">Java Version</a></h3> We're currently running JDK 1.7 u51, and we've switched over to the G1 collector. If you do this (and we highly recommend it), make sure you're on u51. We tried out u21 in testing, but we had a number of problems with the GC implementation in that version. Our tuning looks like this: @@ -405,20 +405,20 @@ For reference, here are the stats on one of LinkedIn's busiest clusters (at peak - 70 MB/sec inbound, 400 MB/sec+ outbound The tuning looks fairly aggressive, but all of the brokers in that cluster have a 90% GC pause time of about 21ms, and they're doing less than 1 young GC per second. - -<h3><a id="hwandos">6.4 Hardware and OS</a></h3> + +<h3><a id="hwandos" href="#hwandos">6.4 Hardware and OS</a></h3> We are using dual quad-core Intel Xeon machines with 24GB of memory. <p> You need sufficient memory to buffer active readers and writers. You can do a back-of-the-envelope estimate of memory needs by assuming you want to be able to buffer for 30 seconds and compute your memory need as write_throughput*30. <p> The disk throughput is important. We have 8x7200 rpm SATA drives. In general disk throughput is the performance bottleneck, and more disks is more better. Depending on how you configure flush behavior you may or may not benefit from more expensive disks (if you force flush often then higher RPM SAS drives may be better). -<h4><a id="os">OS</a></h4> +<h4><a id="os" href="#os">OS</a></h4> Kafka should run well on any unix system and has been tested on Linux and Solaris. <p> We have seen a few issues running on Windows and Windows is not currently a well supported platform though we would be happy to change that. <p> -You likely don't need to do much OS-level tuning though there are a few things that will help performance. +You likely don't need to do much OS-level tuning though there are a few things that will help performance. <p> Two configurations that may be important: <ul> @@ -426,7 +426,7 @@ Two configurations that may be important: <li>We upped the max socket buffer size to enable high-performance data transfer between data centers <a href="http://www.psc.edu/index.php/networking/641-tcp-tune">described here</a>. </ul> -<h4><a id="diskandfs">Disks and Filesystem</a></h4> +<h4><a id="diskandfs" href="#diskandfs">Disks and Filesystem</a></h4> We recommend using multiple drives to get good throughput and not sharing the same drives used for Kafka data with application logs or other OS filesystem activity to ensure good latency. You can either RAID these drives together into a single volume or format and mount each drive as its own directory. Since Kafka has replication the redundancy provided by RAID can also be provided at the application level. This choice has several tradeoffs. <p> If you configure multiple data directories partitions will be assigned round-robin to data directories. Each partition will be entirely in one of the data directories. If data is not well balanced among partitions this can lead to load imbalance between disks. @@ -435,7 +435,7 @@ RAID can potentially do better at balancing load between disks (although it does <p> Another potential benefit of RAID is the ability to tolerate disk failures. However our experience has been that rebuilding the RAID array is so I/O intensive that it effectively disables the server, so this does not provide much real availability improvement. -<h4><a id="appvsosflush">Application vs. OS Flush Management</a></h4> +<h4><a id="appvsosflush" href="#appvsosflush">Application vs. OS Flush Management</a></h4> Kafka always immediately writes all data to the filesystem and supports the ability to configure the flush policy that controls when data is forced out of the OS cache and onto disk using the and flush. This flush policy can be controlled to force data to disk after a period of time or after a certain number of messages has been written. There are several choices in this configuration. <p> Kafka must eventually call fsync to know that data was flushed. When recovering from a crash for any log segment not known to be fsync'd Kafka will check the integrity of each message by checking its CRC and also rebuild the accompanying offset index file as part of the recovery process executed on startup. @@ -448,7 +448,7 @@ The drawback of using application level flush settings are that this is less eff <p> In general you don't need to do any low-level tuning of the filesystem, but in the next few sections we will go over some of this in case it is useful. -<h4><a id="linuxflush">Understanding Linux OS Flush Behavior</a></h4> +<h4><a id="linuxflush" href="#linuxflush">Understanding Linux OS Flush Behavior</a></h4> In Linux, data written to the filesystem is maintained in <a href="http://en.wikipedia.org/wiki/Page_cache">pagecache</a> until it must be written out to disk (due to an application-level fsync or the OS's own flush policy). The flushing of data is done by a set of background threads called pdflush (or in post 2.6.32 kernels "flusher threads"). <p> @@ -467,7 +467,7 @@ Using pagecache has several advantages over an in-process cache for storing data <li>It automatically uses all the free memory on the machine </ul> -<h4><a id="ext4">Ext4 Notes</a></h4> +<h4><a id="ext4" href="#ext4">Ext4 Notes</a></h4> Ext4 may or may not be the best filesystem for Kafka. Filesystems like XFS supposedly handle locking during fsync better. We have only tried Ext4, though. <p> It is not necessary to tune these settings, however those wanting to optimize performance have a few knobs that will help: @@ -478,8 +478,8 @@ It is not necessary to tune these settings, however those wanting to optimize pe <li>nobh: This setting controls additional ordering guarantees when using data=writeback mode. This should be safe with Kafka as we do not depend on write ordering and improves throughput and latency. <li>delalloc: Delayed allocation means that the filesystem avoid allocating any blocks until the physical write occurs. This allows ext4 to allocate a large extent instead of smaller pages and helps ensure the data is written sequentially. This feature is great for throughput. It does seem to involve some locking in the filesystem which adds a bit of latency variance. </ul> - -<h3><a id="monitoring">6.6 Monitoring</a></h3> + +<h3><a id="monitoring" href="#monitoring">6.6 Monitoring</a></h3> Kafka uses Yammer Metrics for metrics reporting in both the server and the client. This can be configured to report stats using pluggable stats reporters to hook up to your monitoring system. <p> @@ -628,7 +628,7 @@ We pay particular we do graphing and alerting on the following metrics: </tr> </tbody></table> -<h4><a id="new_producer_monitoring">New producer monitoring</a></h4> +<h4><a id="new_producer_monitoring" href="#new_producer_monitoring">New producer monitoring</a></h4> The following metrics are available on new producer instances. @@ -889,15 +889,15 @@ We recommend monitor GC time and other stats and various server stats such as CP On the client side, we recommend monitor the message/byte rate (global and per topic), request rate/size/time, and on the consumer side, max lag in messages among all partitions and min fetch request rate. For a consumer to keep up, max lag needs to be less than a threshold and min fetch rate needs to be larger than 0. -<h4>Audit</h4> +<h4><a id="basic_ops_audit" href="#basic_ops_audit">Audit</a></h4> The final alerting we do is on the correctness of the data delivery. We audit that every message that is sent is consumed by all consumers and measure the lag for this to occur. For important topics we alert if a certain completeness is not achieved in a certain time period. The details of this are discussed in KAFKA-260. -<h3><a id="zk">6.7 ZooKeeper</a></h3> +<h3><a id="zk" href="#zk">6.7 ZooKeeper</a></h3> -<h4><a id="zkversion">Stable version</a></h4> +<h4><a id="zkversion" href="#zkversion">Stable version</a></h4> At LinkedIn, we are running ZooKeeper 3.3.*. Version 3.3.3 has known serious issues regarding ephemeral node deletion and session expirations. After running into those issues in production, we upgraded to 3.3.4 and have been running that smoothly for over a year now. -<h4><a id="zkops">Operationalizing ZooKeeper</a></h4> +<h4><a id="zkops" href="#zkops">Operationalizing ZooKeeper</a></h4> Operationally, we do the following for a healthy ZooKeeper installation: <ul> <li>Redundancy in the physical/hardware/network layout: try not to put them all in the same rack, decent (but don't go nuts) hardware, try to keep redundant power and network paths, etc.</li> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/producer_config.html ---------------------------------------------------------------------- diff --git a/090/producer_config.html b/090/producer_config.html index 2b1809d..7ddd2cc 100644 --- a/090/producer_config.html +++ b/090/producer_config.html @@ -82,8 +82,6 @@ <tr> <td>metrics.sample.window.ms</td><td>The number of samples maintained to compute metrics.</td><td>long</td><td>30000</td><td>[0,...]</td><td>low</td></tr> <tr> -<td>principal.builder.class</td><td>principal builder to generate a java Principal. This config is optional for client.</td><td>class</td><td>class org.apache.kafka.common.security.auth.DefaultPrincipalBuilder</td><td></td><td>low</td></tr> -<tr> <td>reconnect.backoff.ms</td><td>The amount of time to wait before attempting to reconnect to a given host. This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all requests sent by the consumer to the broker.</td><td>long</td><td>50</td><td>[0,...]</td><td>low</td></tr> <tr> <td>retry.backoff.ms</td><td>The amount of time to wait before attempting to retry a failed fetch request to a given topic partition. This avoids repeated fetching-and-failing in a tight loop.</td><td>long</td><td>100</td><td>[0,...]</td><td>low</td></tr> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/quickstart.html ---------------------------------------------------------------------- diff --git a/090/quickstart.html b/090/quickstart.html index 268ed34..14f7518 100644 --- a/090/quickstart.html +++ b/090/quickstart.html @@ -5,9 +5,9 @@ The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at - + http://www.apache.org/licenses/LICENSE-2.0 - + Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @@ -15,11 +15,11 @@ limitations under the License. --> -<h3><a id="quickstart">1.3 Quick Start</a></h3> +<h3><a id="quickstart" href="#quickstart">1.3 Quick Start</a></h3> This tutorial assumes you are starting fresh and have no existing Kafka or ZooKeeper data. -<h4> Step 1: Download the code </h4> +<h4><a id="quickstart_download" href="#quickstart_download">Step 1: Download the code</a></h4> <a href="https://www.apache.org/dyn/closer.cgi?path=/kafka/0.9.0.0/kafka_2.11-0.9.0.0.tgz" title="Kafka downloads">Download</a> the 0.9.0.0 release and un-tar it. @@ -28,7 +28,7 @@ This tutorial assumes you are starting fresh and have no existing Kafka or ZooKe > <b>cd kafka_2.11-0.9.0.0</b> </pre> -<h4>Step 2: Start the server</h4> +<h4><a id="quickstart_startserver" href="#quickstart_startserver">Step 2: Start the server</a></h4> <p> Kafka uses ZooKeeper so you need to first start a ZooKeeper server if you don't already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node ZooKeeper instance. @@ -47,7 +47,7 @@ Now start the Kafka server: ... </pre> -<h4>Step 3: Create a topic</h4> +<h4><a id="quickstart_createtopic" href="#quickstart_createtopic">Step 3: Create a topic</a></h4> Let's create a topic named "test" with a single partition and only one replica: <pre> @@ -61,19 +61,19 @@ test </pre> Alternatively, instead of manually creating topics you can also configure your brokers to auto-create topics when a non-existent topic is published to. -<h4>Step 4: Send some messages</h4> +<h4><a id="quickstart_send" href="#quickstart_send">Step 4: Send some messages</a></h4> Kafka comes with a command line client that will take input from a file or from standard input and send it out as messages to the Kafka cluster. By default each line will be sent as a separate message. <p> Run the producer and then type a few messages into the console to send to the server. <pre> -> <b>bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test</b> +> <b>bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test</b> <b>This is a message</b> <b>This is another message</b> </pre> -<h4>Step 5: Start a consumer</h4> +<h4><a id="quickstart_consume" href="#quickstart_consume">Step 5: Start a consumer</a></h4> Kafka also has a command line consumer that will dump out messages to standard output. @@ -86,27 +86,27 @@ This is another message If you have each of the above commands running in a different terminal then you should now be able to type messages into the producer terminal and see them appear in the consumer terminal. </p> <p> -All of the command line tools have additional options; running the command with no arguments will display usage information documenting them in more detail. +All of the command line tools have additional options; running the command with no arguments will display usage information documenting them in more detail. </p> -<h4>Step 6: Setting up a multi-broker cluster</h4> +<h4><a id="quickstart_multibroker" href="#quickstart_multibroker">Step 6: Setting up a multi-broker cluster</a></h4> So far we have been running against a single broker, but that's no fun. For Kafka, a single broker is just a cluster of size one, so nothing much changes other than starting a few more broker instances. But just to get feel for it, let's expand our cluster to three nodes (still all on our local machine). <p> First we make a config file for each of the brokers: <pre> -> <b>cp config/server.properties config/server-1.properties</b> +> <b>cp config/server.properties config/server-1.properties</b> > <b>cp config/server.properties config/server-2.properties</b> </pre> Now edit these new files and set the following properties: <pre> - + config/server-1.properties: broker.id=1 port=9093 log.dir=/tmp/kafka-logs-1 - + config/server-2.properties: broker.id=2 port=9094 @@ -138,7 +138,7 @@ Here is an explanation of output. The first line gives a summary of all the part <li>"leader" is the node responsible for all reads and writes for the given partition. Each node will be the leader for a randomly selected portion of the partitions. <li>"replicas" is the list of nodes that replicate the log for this partition regardless of whether they are the leader or even if they are currently alive. <li>"isr" is the set of "in-sync" replicas. This is the subset of the replicas list that is currently alive and caught-up to the leader. -</ul> +</ul> Note that in my example node 1 is the leader for the only partition of the topic. <p> We can run the same command on the original topic we created to see where it is: @@ -155,7 +155,7 @@ Let's publish a few messages to our new topic: ... <b>my test message 1</b> <b>my test message 2</b> -<b>^C</b> +<b>^C</b> </pre> Now let's consume these messages: <pre> @@ -189,7 +189,7 @@ my test message 2 </pre> -<h4>Step 7: Use Kafka Connect to import/export data</h4> +<h4><a id="quickstart_kafkaconnect" href="#quickstart_kafkaconnect">Step 7: Use Kafka Connect to import/export data</a></h4> Writing data from the console and writing it back to the console is a convenient place to start, but you'll probably want to use data from other sources or export data from Kafka to other systems. For many systems, instead of writing custom http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/security.html ---------------------------------------------------------------------- diff --git a/090/security.html b/090/security.html index 57fe874..da9c3c6 100644 --- a/090/security.html +++ b/090/security.html @@ -1,4 +1,4 @@ -<!-- + Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. @@ -15,7 +15,7 @@ limitations under the License. --> -<h3><a id="security_overview">7.1 Security Overview</a></h3> +<h3><a id="security_overview" href="#security_overview">7.1 Security Overview</a></h3> In release 0.9.0.0, the Kafka community added a number of features that, used either separately or together, increases security in a Kafka cluster. The following security measures are currently supported: <ol> <li>Authenticating clients (Producers and consumers) connections to brokers, using either SSL or SASL (Kerberos)</li> @@ -28,11 +28,11 @@ In release 0.9.0.0, the Kafka community added a number of features that, used ei The guides below explain how to configure and use the security features in both clients and brokers. -<h3><a id="security_ssl">7.2 Encryption and Authentication using SSL</a></h3> +<h3><a id="security_ssl" href="#security_ssl">7.2 Encryption and Authentication using SSL</a></h3> Apache Kafka allows clients to connect over SSL. By default SSL is disabled but can be turned on as needed. <ol> - <li><h4>Generate SSL key and certificate for each Kafka broker</h4> + <li><h4><a id="security_ssl_key" href="#security_ssl_key">Generate SSL key and certificate for each Kafka broker</a></h4> The first step of deploying HTTPS is to generate the key and the certificate for each machine in the cluster. You can use Javaâs keytool utility to accomplish this task. We will generate the key into a temporary keystore initially so that we can export and sign it later with CA. <pre>$ keytool -keystore server.keystore.jks -alias localhost -validity {validity} -genkey</pre> @@ -44,7 +44,7 @@ Apache Kafka allows clients to connect over SSL. By default SSL is disabled but </ol> Ensure that common name (CN) matches exactly with the fully qualified domain name (FQDN) of the server. The client compares the CN with the DNS domain name to ensure that it is indeed connecting to the desired server, not the malicious one.</li> - <li><h4>Creating your own CA</h4> + <li><h4><a id="security_ssl_ca" href="#security_ssl_ca">Creating your own CA</a></h4> After the first step, each machine in the cluster has a public-private key pair, and a certificate to identify the machine. The certificate, however, is unsigned, which means that an attacker can create such a certificate to pretend to be any machine.<p> Therefore, it is important to prevent forged certificates by signing them for each machine in the cluster. A certificate authority (CA) is responsible for signing certificates. CA works likes a government that issues passportsâthe government stamps (signs) each passport so that the passport becomes difficult to forge. Other governments verify the stamps to ensure the passport is authentic. Similarly, the CA signs the certificates, and the cryptography guarantees that a signed certificate is computationally difficult to forge. Thus, as long as the CA is a genuine and trusted authority, the clients have high assurance that they are connecting to the authentic machines. <pre>openssl req <b>-new</b> -x509 -keyout ca-key -out ca-cert -days 365</pre> @@ -59,7 +59,7 @@ Apache Kafka allows clients to connect over SSL. By default SSL is disabled but In contrast to the keystore in step 1 that stores each machineâs own identity, the truststore of a client stores all the certificates that the client should trust. Importing a certificate into oneâs truststore also means that trusting all certificates that are signed by that certificate. As the analogy above, trusting the government (CA) also means that trusting all passports (certificates) that it has issued. This attribute is called the chains of trust, and it is particularly useful when deploying SSL on a large Kafka cluster. You can sign all certificates in the cluster with a single CA, and have all machines share the same truststore that trusts the CA. That way all machines can authenticate all other machines.</li> - <li><h4>Signing the certificate</h4> + <li><h4><a id="security_ssl_signing" href="#security_ssl_signing">Signing the certificate</a></h4> The next step is to sign all certificates generated by step 1 with the CA generated in step 2. First, you need to export the certificate from the keystore: <pre>keytool -keystore server.keystore.jks -alias localhost -certreq -file cert-file</pre> @@ -97,7 +97,7 @@ Apache Kafka allows clients to connect over SSL. By default SSL is disabled but keytool -keystore server.keystore.jks -alias CARoot -import -file ca-cert keytool -keystore server.keystore.jks -alias localhost -import -file cert-signed </pre></li> - <li><h4><a name="config_broker">Configuring Kafka Broker</a></h4> + <li><h4><a id="security_configbroker" href="#security_configbroker">Configuring Kafka Broker</a></h4> Kafka Broker comes with the feature of listening on multiple ports thanks to [KAFKA-1809](https://issues.apache.org/jira/browse/KAFKA-1809). We need to configure the following property in server.properties, which must have one or more comma-separated values: <pre>listeners</pre> @@ -143,7 +143,7 @@ Apache Kafka allows clients to connect over SSL. By default SSL is disabled but </pre> If the certificate does not show up or if there are any other error messages than your keystore is not setup properly.</li> - <li><h4>Configuring Kafka Clients</h4>h4> + <li><h4><a id="security_configclients" href="#security_configclients">Configuring Kafka Clients</a></h4>h4> SSL is supported only for new Kafka Producer & Consumer, the older API is not supported. The configs for SSL will be same for both producer & consumer.<br> If client authentication is not required in the broker, then the following is a minimal configuration example: <pre> @@ -175,10 +175,10 @@ Apache Kafka allows clients to connect over SSL. By default SSL is disabled but </pre> </li> </ol> -<h3><a id="security_sasl">7.3 Authentication using SASL</a></h3> +<h3><a id="security_sasl" href="#security_sasl">7.3 Authentication using SASL</a></h3> <ol> - <li><h4>Prerequisites</h4><br> + <li><h4><a id="security_sasl_prereq" href="#security_sasl_prereq">Prerequisites</a></h4><br> <ol> <li><b>Kerberos</b><br> If your organization is already using a Kerberos server (for example, by using Active Directory), there is no need to install a new server just for Kafka. Otherwise you will need to install one, your Linux vendor likely has packages for Kerberos and a short guide on how to install and configure it (<a href="https://help.ubuntu.com/community/Kerberos">Ubuntu</a>, <a href="https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Managing_Smart_Cards/installing-kerberos.html">Redhat</a>). Note that if you are using Oracle Java, you will need to download JCE policy files for your Java version and copy them to $JAVA_HOME/jre/lib/security.</li> @@ -223,7 +223,7 @@ Apache Kafka allows clients to connect over SSL. By default SSL is disabled but <li>In KafkaServer and KafkaClient sections we've "serviceName" this should match principal name with which kafka broker is running. In the above example principal="kafka/kafka1.hostname....@domain.com" so we've "kafka" which is matching the principalName.</li> </ol> </li> - <li><b><a name="jaas_client">Creating Client Side JAAS Config</a></b><br> + <li><h4><a id="security_sasl_jaas" href="#security_sasl_jaas">Creating Client Side JAAS Config</a></h4> Clients (producers, consumers, connect workers, etc) will authenticate to the cluster with their own principal (usually with the same name as the user used for running the client), so obtain or create these principals as needed. Then create a JAAS file as follows: <pre> KafkaClient { @@ -237,7 +237,7 @@ Apache Kafka allows clients to connect over SSL. By default SSL is disabled but </pre> </li> </ol></li> - <li><h4>Configuring Kafka Brokers</h4> + <li><h4><a id="security_sasl_brokerconfig" href="#security_sasl_brokerconfig">Configuring Kafka Brokers</a></h4> <ol> <li>Pass the name of the jaas file you created in <a href="#jaas_config_file">Creating JAAS Config File"</a> as a JVM parameter to the kafka broker: <pre>-Djava.security.auth.login.config=/etc/kafka/kafka_jaas.conf</pre></li> <li>Make sure the keytabs configured in the kafka_jaas.conf are readable by the linux user who is starting kafka broker.</li> @@ -248,11 +248,11 @@ Apache Kafka allows clients to connect over SSL. By default SSL is disabled but </ol> </li> - <li><h4>Configuring Kafka Clients</h4> + <li><h4><a id="security_sasl_clientconfig" href="#security_sasl_clientconfig">Configuring Kafka Clients</a></h4> SASL authentication is only supported for new kafka producer and consumer, the older API is not supported.>br> To configure SASL authentication on the clients: <ol> - <li>pass the name of the jaas file you created in <a href="#jaas_client">Creating Client Side JAAS Config"</a> as a JVM parameter to the client JVM: + <li>pass the name of the jaas file you created in <a href="#security_sasl_jaas">Creating Client Side JAAS Config"</a> as a JVM parameter to the client JVM: <pre>-Djava.security.auth.login.config=/etc/kafka/kafka_client_jaas.conf</pre></li> <li>Make sure the keytabs configured in the kafka_client_jaas.conf are readable by the linux user who is starting kafka client.</li> <li>Configure the following property in producer.properties or consumer.properties: @@ -260,12 +260,15 @@ Apache Kafka allows clients to connect over SSL. By default SSL is disabled but </ol></li> </ol> -<h3><a id="security_authz">7.4 Authorization and ACLs</a></h3> +<h3><a id="security_authz" href="#security_authz">7.4 Authorization and ACLs</a></h3> Kafka ships with a pluggable Authorizer and an out-of-box authorizer implementation that uses zookeeper to store all the acls. Kafka acls are defined in the general format of "Principal P is [Allowed/Denied] Operation O From Host H On Resource R". You can read more about the acl structure on KIP-11. In order to add, remove or list acls you can use the Kafka authorizer CLI. By default, if a Resource R has no associated acls, no one other than super users is allowed to access R. If you want change that behavior, you can include the following in broker.properties. <pre>allow.everyone.if.no.acl.found=true</pre> One can also add super users in broker.properties like the following. <pre>super.users=User:Bob;User:Alice</pre> -<h4>Command Line Interface</h4> +By default, the SSL user name will be of the form "CN=writeuser,OU=Unknown,O=Unknown,L=Unknown,ST=Unknown,C=Unknown". One can change that by setting a customized PrincipalBuilder in broker.properties like the following. +<pre>principal.builder.classs=CustomizedPrincipalBuilderClass</pre> +By default, the SASL user name will be the primary part of the Kerberos principal. One can change that by setting <code>sasl.kerberos.principal.to.local.rules</code> to a customized rule in broker.properties. +<h4><a id="security_authz_cli" href="#security_authz_cli">Command Line Interface</a></h4> Kafka Authorization management CLI can be found under bin directory with all the other CLIs. The CLI script is called <b>kafka-acls.sh</b>. Following lists all the options that the script supports: <p></p> <table class="data-table"> @@ -336,20 +339,20 @@ Kafka Authorization management CLI can be found under bin directory with all the <td>Principal</td> </tr> <tr> - <td>--allow-hosts</td> - <td>Comma separated list of hosts from which principals listed in --allow-principals will have access.</td> + <td>--allow-host</td> + <td>Host from which principals listed in --allow-principals will have access.</td> <td> if --allow-principals is specified defaults to * which translates to "all hosts"</td> <td>Host</td> </tr> <tr> - <td>--deny-hosts</td> - <td>Comma separated list of hosts from which principals listed in --deny-principals will be denied access.</td> + <td>--deny-host</td> + <td>Host from which principals listed in --deny-principals will be denied access.</td> <td>if --deny-principals is specified defaults to * which translates to "all hosts"</td> <td>Host</td> </tr> <tr> - <td>--operations</td> - <td>Comma separated list of operations.<br> + <td>--operation</td> + <td>Operation that will be allowed or denied.<br> Valid values are : Read, Write, Create, Delete, Alter, Describe, ClusterAction, All</td> <td>All</td> <td>Operation</td> @@ -369,18 +372,18 @@ Kafka Authorization management CLI can be found under bin directory with all the </tr> </tbody></table> -<h4>Examples</h4> +<h4><a id="security_authz_examples" href="#security_authz_examples">Examples</a></h4> <ul> <li><b>Adding Acls</b><br> Suppose you want to add an acl "Principals User:Bob and User:Alice are allowed to perform Operation Read and Write on Topic Test-Topic from Host1 and Host2". You can do that by executing the CLI with following options: - <pre>bin/kafka-acls.sh --authorizer kafka.security.auth.SimpleAclAuthorizer --authorizer-properties zookeeper.connect=localhost:2181 --add --allow-principal User:Bob --allow-principal User:Alice --allow-hosts Host1,Host2 --operations Read,Write --topic Test-topic</pre> - By default all principals that don't have an explicit acl that allows access for an operation to a resource are denied. In rare cases where an allow acl is defined that allows access to all but some principal we will have to use the --deny-principals and --deny-host option. For example, if we want to allow all users to Read from Test-topic but only deny User:BadBob from host bad-host we can do so using following commands: - <pre>bin/kafka-acls.sh --authorizer kafka.security.auth.SimpleAclAuthorizer --authorizer-properties zookeeper.connect=localhost:2181 --add --allow-principal User:* --allow-hosts * --deny-principal User:BadBob --deny-hosts bad-host --operations Read--topic Test-topic</pre> + <pre>bin/kafka-acls.sh --authorizer kafka.security.auth.SimpleAclAuthorizer --authorizer-properties zookeeper.connect=localhost:2181 --add --allow-principal User:Bob --allow-principal User:Alice --allow-host Host1 --allow-host Host2 --operation Read --operation Write --topic Test-topic</pre> + By default all principals that don't have an explicit acl that allows access for an operation to a resource are denied. In rare cases where an allow acl is defined that allows access to all but some principal we will have to use the --deny-principal and --deny-host option. For example, if we want to allow all users to Read from Test-topic but only deny User:BadBob from host bad-host we can do so using following commands: + <pre>bin/kafka-acls.sh --authorizer kafka.security.auth.SimpleAclAuthorizer --authorizer-properties zookeeper.connect=localhost:2181 --add --allow-principal User:* --allow-hosts * --deny-principal User:BadBob --deny-host bad-host --operation Read--topic Test-topic</pre> Above examples add acls to a topic by specifying --topic [topic-name] as the resource option. Similarly user can add acls to cluster by specifying --cluster and to a consumer group by specifying --consumer-group [group-name].</li> <li><b>Removing Acls</b><br> Removing acls is pretty much the same. The only difference is instead of --add option users will have to specify --remove option. To remove the acls added by the first example above we can execute the CLI with following options: - <pre> bin/kafka-acls.sh --authorizer kafka.security.auth.SimpleAclAuthorizer --authorizer-properties zookeeper.connect=localhost:2181 --remove --allow-principal User:Bob --allow-principal User:Alice --allow-hosts Host1,Host2 --operations Read,Write --topic Test-topic </pre></li> + <pre> bin/kafka-acls.sh --authorizer kafka.security.auth.SimpleAclAuthorizer --authorizer-properties zookeeper.connect=localhost:2181 --remove --allow-principal User:Bob --allow-principal User:Alice --allow-host Host1 --allow-host Host2 --operation Read --operation Write --topic Test-topic </pre></li> <li><b>List Acls</b><br> We can list acls for any resource by specifying the --list option with the resource. To list all acls for Test-topic we can execute the CLI with following options: @@ -395,8 +398,8 @@ Suppose you want to add an acl "Principals User:Bob and User:Alice are allowed t In order to remove a principal from producer or consumer role we just need to pass --remove option. </li> </ul> -<h3><a id="zk_authz">7.5 ZooKeeper Authentication</a></h3> -<h4><a id="zk_authz_new">7.5.1 New clusters</a></h4> +<h3><a id="zk_authz" href="#zk_authz">7.5 ZooKeeper Authentication</a></h3> +<h4><a id="zk_authz_new" href="#zk_authz_new">7.5.1 New clusters</a></h4> To enable ZooKeeper authentication on brokers, there are two necessary steps: <ol> <li> Create a JAAS login file and set the appropriate system property to point to it as described above</li> @@ -405,7 +408,7 @@ To enable ZooKeeper authentication on brokers, there are two necessary steps: The metadata stored in ZooKeeper is such that only brokers will be able to modify the corresponding znodes, but znodes are world readable. The rationale behind this decision is that the data stored in ZooKeeper is not sensitive, but inappropriate manipulation of znodes can cause cluster disruption. -<h4><a id="zk_authz_migration">7.5.2 Migrating clusters</a></h4> +<h4><a id="zk_authz_migration" href="#zk_authz_migration">7.5.2 Migrating clusters</a></h4> If you are running a version of Kafka that does not support security of simply with security disabled, and you want to make the cluster secure, then you need to execute the following steps to enable ZooKeeper authentication with minimal disruption to your operations: <ol> <li>Perform a rolling restart setting the JAAS login file, which enables brokers to authenticate. At the end of the rolling restart, brokers are able to manipulate znodes with strict ACLs, but they will not create znodes with those ACLs</li> @@ -426,7 +429,7 @@ Here is an example of how to run the migration tool: <pre> ./bin/zookeeper-security-migration --help </pre> -<h4><a id="zk_authz_ensemble">7.5.3 Migrating the ZooKeeper ensemble</a></h4> +<h4><a id="zk_authz_ensemble" href="#zk_authz_ensemble">7.5.3 Migrating the ZooKeeper ensemble</a></h4> It is also necessary to enable authentication on the ZooKeeper ensemble. To do it, we need to perform a rolling restart of the server and set a few properties. Please refer to the ZooKeeper documentation for more detail: <ol> <li><a href="http://zookeeper.apache.org/doc/r3.4.6/zookeeperProgrammers.html#sc_ZooKeeperAccessControl">Apache ZooKeeper documentation</a></li> http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/upgrade.html ---------------------------------------------------------------------- diff --git a/090/upgrade.html b/090/upgrade.html index 3cf5aa1..704ec4f 100644 --- a/090/upgrade.html +++ b/090/upgrade.html @@ -15,9 +15,9 @@ limitations under the License. --> -<h3><a id="upgrade">1.5 Upgrading From Previous Versions</a></h3> +<h3><a id="upgrade" href="#upgrade">1.5 Upgrading From Previous Versions</a></h3> -<h4>Upgrading from 0.8.0, 0.8.1.X or 0.8.2.X to 0.9.0.0</h4> +<h4><a id="upgrade_9" href="#upgrade_9">Upgrading from 0.8.0, 0.8.1.X or 0.8.2.X to 0.9.0.0</a></h4> 0.9.0.0 has an inter-broker protocol change from previous versions. For a rolling upgrade: <ol> @@ -31,7 +31,7 @@ Note: If you are willing to accept downtime, you can simply take all the brokers Note: Bumping the protocol version and restarting can be done any time after the brokers were upgraded. It does not have to be immediately after. -<h5>Potential breaking changes in 0.9.0.0</h5> +<h5><a id="upgrade_9_breaking" href="#upgrade_9_breaking">Potential breaking changes in 0.9.0.0</a></h5> <ul> <li> Java 1.6 is no longer supported. </li> @@ -52,14 +52,14 @@ Note: Bumping the protocol version and restarting can be done any time after the <li> The kafka.tools.ProducerPerformance class has been deprecated. Going forward, please use org.apache.kafka.tools.ProducerPerformance for this functionality (kafka-producer-perf-test.sh will also be changed to use the new class). </li> </ul> -<h4>Upgrading from 0.8.1 to 0.8.2</h4> +<h4><a id="upgrade_82" href="#upgrade_82">Upgrading from 0.8.1 to 0.8.2</a></h4> 0.8.2 is fully compatible with 0.8.1. The upgrade can be done one broker at a time by simply bringing it down, updating the code, and restarting it. -<h4>Upgrading from 0.8.0 to 0.8.1</h4> +<h4><a id="upgrade_81" href="#upgrade_81">Upgrading from 0.8.0 to 0.8.1</a></h4> 0.8.1 is fully compatible with 0.8. The upgrade can be done one broker at a time by simply bringing it down, updating the code, and restarting it. -<h4>Upgrading from 0.7</h4> +<h4><a id="upgrade_7" href="#upgrade_7">Upgrading from 0.7</a></h4> Release 0.7 is incompatible with newer releases. Major changes were made to the API, ZooKeeper data structures, and protocol, and configuration in order to add replication (Which was missing in 0.7). The upgrade from 0.7 to later versions requires a <a href="https://cwiki.apache.org/confluence/display/KAFKA/Migrating+from+0.7+to+0.8">special tool</a> for migration. This migration can be done without downtime. http://git-wip-us.apache.org/repos/asf/kafka-site/blob/24d8d665/090/uses.html ---------------------------------------------------------------------- diff --git a/090/uses.html b/090/uses.html index aa87d07..f769bed 100644 --- a/090/uses.html +++ b/090/uses.html @@ -5,9 +5,9 @@ The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at - + http://www.apache.org/licenses/LICENSE-2.0 - + Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @@ -15,11 +15,11 @@ limitations under the License. --> -<h3><a id="uses">1.2 Use Cases</a></h3> +<h3><a id="uses" href="#uses">1.2 Use Cases</a></h3> Here is a description of a few of the popular use cases for Apache Kafka. For an overview of a number of these areas in action, see <a href="http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying">this blog post</a>. -<h4>Messaging</h4> +<h4><a id="uses_messaging" href="#uses_messaging">Messaging</a></h4> Kafka works well as a replacement for a more traditional message broker. Message brokers are used for a variety of reasons (to decouple processing from data producers, to buffer unprocessed messages, etc). In comparison to most messaging systems Kafka has better throughput, built-in partitioning, replication, and fault-tolerance which makes it a good solution for large scale message processing applications. <p> @@ -27,30 +27,30 @@ In our experience messaging uses are often comparatively low-throughput, but may <p> In this domain Kafka is comparable to traditional messaging systems such as <a href="http://activemq.apache.org">ActiveMQ</a> or <a href="https://www.rabbitmq.com">RabbitMQ</a>. -<h4>Website Activity Tracking</h4> +<h4><a id="uses_website" href="#uses_website">Website Activity Tracking</a></h4> The original use case for Kafka was to be able to rebuild a user activity tracking pipeline as a set of real-time publish-subscribe feeds. This means site activity (page views, searches, or other actions users may take) is published to central topics with one topic per activity type. These feeds are available for subscription for a range of use cases including real-time processing, real-time monitoring, and loading into Hadoop or offline data warehousing systems for offline processing and reporting. <p> Activity tracking is often very high volume as many activity messages are generated for each user page view. -<h4>Metrics</h4> +<h4><a id="uses_metrics" href="#uses_metrics">Metrics</a></h4> Kafka is often used for operational monitoring data. This involves aggregating statistics from distributed applications to produce centralized feeds of operational data. -<h4>Log Aggregation</h4> +<h4><a id="uses_logs" href="#uses_logs">Log Aggregation</a></h4> Many people use Kafka as a replacement for a log aggregation solution. Log aggregation typically collects physical log files off servers and puts them in a central place (a file server or HDFS perhaps) for processing. Kafka abstracts away the details of files and gives a cleaner abstraction of log or event data as a stream of messages. This allows for lower-latency processing and easier support for multiple data sources and distributed data consumption. In comparison to log-centric systems like Scribe or Flume, Kafka offers equally good performance, stronger durability guarantees due to replication, and much lower end-to-end latency. -<h4>Stream Processing</h4> +<h4><a id="uses_streamprocessing" href="#uses_streamprocessing">Stream Processing</a></h4> Many users end up doing stage-wise processing of data where data is consumed from topics of raw data and then aggregated, enriched, or otherwise transformed into new Kafka topics for further consumption. For example a processing flow for article recommendation might crawl article content from RSS feeds and publish it to an "articles" topic; further processing might help normalize or deduplicate this content to a topic of cleaned article content; a final stage might attempt to match this content to users. This creates a graph of real-time data flow out of the individual topics. <a href="https://storm.apache.org/">Storm</a> and <a href="http://samza.apache.org/">Samza</a> are popular frameworks for implementing these kinds of transformations. -<h4>Event Sourcing</h4> +<h4><a id="uses_eventsourcing" href="#uses_eventsourcing">Event Sourcing</a></h4> <a href="http://martinfowler.com/eaaDev/EventSourcing.html">Event sourcing</a> is a style of application design where state changes are logged as a time-ordered sequence of records. Kafka's support for very large stored log data makes it an excellent backend for an application built in this style. -<h4>Commit Log</h4> +<h4><a id="uses_commitlog" href="#uses_commitlog">Commit Log</a></h4> Kafka can serve as a kind of external commit-log for a distributed system. The log helps replicate data between nodes and acts as a re-syncing mechanism for failed nodes to restore their data. The <a href="/documentation.html#compaction">log compaction</a> feature in Kafka helps support this usage. In this usage Kafka is similar to <a href="http://zookeeper.apache.org/bookkeeper/">Apache BookKeeper</a> project.