Repository: kafka
Updated Branches:
  refs/heads/trunk 9f33bfe19 -> e11946b09


MINOR: fixing typos in docs

This commit contains minor grammatical fixes.  Some of the changes are just 
removing rogue commas, which can be hard to see in the diff.

This contribution is my original work and I license the work to the project 
under the project's open source license.

Author: Samuel Julius Hecht <[email protected]>

Reviewers: Gwen Shapira

Closes #721 from samjhecht/minor-docs-edits


Project: http://git-wip-us.apache.org/repos/asf/kafka/repo
Commit: http://git-wip-us.apache.org/repos/asf/kafka/commit/e11946b0
Tree: http://git-wip-us.apache.org/repos/asf/kafka/tree/e11946b0
Diff: http://git-wip-us.apache.org/repos/asf/kafka/diff/e11946b0

Branch: refs/heads/trunk
Commit: e11946b097ed617b6de78909dbd9f6e44765c1fd
Parents: 9f33bfe
Author: Samuel Julius Hecht <[email protected]>
Authored: Wed Dec 30 17:02:58 2015 -0800
Committer: Gwen Shapira <[email protected]>
Committed: Wed Dec 30 17:02:58 2015 -0800

----------------------------------------------------------------------
 docs/design.html         | 30 +++++++++++++++---------------
 docs/implementation.html |  2 +-
 2 files changed, 16 insertions(+), 16 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kafka/blob/e11946b0/docs/design.html
----------------------------------------------------------------------
diff --git a/docs/design.html b/docs/design.html
index 10e8f9d..7cefb24 100644
--- a/docs/design.html
+++ b/docs/design.html
@@ -38,7 +38,7 @@ Kafka relies heavily on the filesystem for storing and 
caching messages. There i
 <p>
 The key fact about disk performance is that the throughput of hard drives has 
been diverging from the latency of a disk seek for the last decade. As a result 
the performance of linear writes on a <a 
href="http://en.wikipedia.org/wiki/Non-RAID_drive_architectures";>JBOD</a> 
configuration with six 7200rpm SATA RAID-5 array is about 600MB/sec but the 
performance of random writes is only about 100k/sec&mdash;a difference of over 
6000X. These linear reads and writes are the most predictable of all usage 
patterns, and are heavily optimized by the operating system. A modern operating 
system provides read-ahead and write-behind techniques that prefetch data in 
large block multiples and group smaller logical writes into large physical 
writes. A further discussion of this issue can be found in this <a 
href="http://queue.acm.org/detail.cfm?id=1563874";>ACM Queue article</a>; they 
actually find that <a 
href="http://deliveryimages.acm.org/10.1145/1570000/1563874/jacobs3.jpg";>sequential
 disk access
  can in some cases be faster than random memory access!</a>
 <p>
-To compensate for this performance divergence modern operating systems have 
become increasingly aggressive in their use of main memory for disk caching. A 
modern OS will happily divert <i>all</i> free memory to disk caching with 
little performance penalty when the memory is reclaimed. All disk reads and 
writes will go through this unified cache. This feature cannot easily be turned 
off without using direct I/O, so even if a process maintains an in-process 
cache of the data, this data will likely be duplicated in OS pagecache, 
effectively storing everything twice.
+To compensate for this performance divergence, modern operating systems have 
become increasingly aggressive in their use of main memory for disk caching. A 
modern OS will happily divert <i>all</i> free memory to disk caching with 
little performance penalty when the memory is reclaimed. All disk reads and 
writes will go through this unified cache. This feature cannot easily be turned 
off without using direct I/O, so even if a process maintains an in-process 
cache of the data, this data will likely be duplicated in OS pagecache, 
effectively storing everything twice.
 <p>
 Furthermore we are building on top of the JVM, and anyone who has spent any 
time with Java memory usage knows two things:
 <ol>
@@ -58,7 +58,7 @@ The persistent data structure used in messaging systems are 
often a per-consumer
 <p>
 Intuitively a persistent queue could be built on simple reads and appends to 
files as is commonly the case with logging solutions. This structure has the 
advantage that all operations are O(1) and reads do not block writes or each 
other. This has obvious performance advantages since the performance is 
completely decoupled from the data size&mdash;one server can now take full 
advantage of a number of cheap, low-rotational speed 1+TB SATA drives. Though 
they have poor seek performance, these drives have acceptable performance for 
large reads and writes and come at 1/3 the price and 3x the capacity.
 <p>
-Having access to virtually unlimited disk space without any performance 
penalty means that we can provide some features not usually found in a 
messaging system. For example, in Kafka, instead of attempting to deleting 
messages as soon as they are consumed, we can retain messages for a relative 
long period (say a week). This leads to a great deal of flexibility for 
consumers, as we will describe.
+Having access to virtually unlimited disk space without any performance 
penalty means that we can provide some features not usually found in a 
messaging system. For example, in Kafka, instead of attempting to delete 
messages as soon as they are consumed, we can retain messages for a relatively 
long period (say a week). This leads to a great deal of flexibility for 
consumers, as we will describe.
 
 <h3><a id="maximizingefficiency" href="#maximizingefficiency">4.3 
Efficiency</a></h3>
 <p>
@@ -106,7 +106,7 @@ Kafka supports GZIP and Snappy compression protocols. More 
details on compressio
 
 <h4><a id="design_loadbalancing" href="#design_loadbalancing">Load 
balancing</a></h4>
 <p>
-The producer sends data directly to the broker that is the leader for the 
partition without any intervening routing tier. To help the producer do this 
all Kafka nodes can answer a request for metadata about which servers are alive 
and where the leaders for the partitions of a topic are at any given time to 
allow the producer to appropriate direct its requests.
+The producer sends data directly to the broker that is the leader for the 
partition without any intervening routing tier. To help the producer do this 
all Kafka nodes can answer a request for metadata about which servers are alive 
and where the leaders for the partitions of a topic are at any given time to 
allow the producer to appropriately direct its requests.
 <p>
 The client controls which partition it publishes messages to. This can be done 
at random, implementing a kind of random load balancing, or it can be done by 
some semantic partitioning function. We expose the interface for semantic 
partitioning by allowing the user to specify a key to partition by and using 
this to hash to a partition (there is also an option to override the partition 
function if need be). For example if the key chosen was a user id then all data 
for a given user would be sent to the same partition. This in turn will allow 
consumers to make locality assumptions about their consumption. This style of 
partitioning is explicitly designed to allow locality-sensitive processing in 
consumers.
 
@@ -114,7 +114,7 @@ The client controls which partition it publishes messages 
to. This can be done a
 <p>
 Batching is one of the big drivers of efficiency, and to enable batching the 
Kafka producer will attempt to accumulate data in memory and to send out larger 
batches in a single request. The batching can be configured to accumulate no 
more than a fixed number of messages and to wait no longer than some fixed 
latency bound (say 64k or 10 ms). This allows the accumulation of more bytes to 
send, and few larger I/O operations on the servers. This buffering is 
configurable and gives a mechanism to trade off a small amount of additional 
latency for better throughput.
 <p>
-Details on <a href="#producerconfigs">configuration</a> and <a 
href="http://kafka.apache.org/082/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html";>api</a>
 for the producer can be found elsewhere in the documentation.
+Details on <a href="#producerconfigs">configuration</a> and the <a 
href="http://kafka.apache.org/082/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html";>api</a>
 for the producer can be found elsewhere in the documentation.
 
 <h3><a id="theconsumer" href="#theconsumer">4.5 The Consumer</a></h3>
 
@@ -122,22 +122,22 @@ The Kafka consumer works by issuing "fetch" requests to 
the brokers leading the
 
 <h4><a id="design_pull" href="#design_pull">Push vs. pull</a></h4>
 <p>
-An initial question we considered is whether consumers should pull data from 
brokers or brokers should push data to the consumer. In this respect Kafka 
follows a more traditional design, shared by most messaging systems, where data 
is pushed to the broker from the producer and pulled from the broker by the 
consumer. Some logging-centric systems, such as <a 
href="http://github.com/facebook/scribe";>Scribe</a> and <a 
href="http://flume.apache.org/";>Apache Flume</a> follow a very different push 
based path where  data is pushed downstream. There are pros and cons to both 
approaches. However a push-based system has difficulty dealing with diverse 
consumers as the broker controls the rate at which data is transferred. The 
goal is generally for the consumer to be able to consume at the maximum 
possible rate; unfortunately in a push system this means the consumer tends to 
be overwhelmed when its rate of consumption falls below the rate of production 
(a denial of service attack, in essence). 
 A pull-based system has the nicer property that the consumer simply falls 
behind and catches up when it can. This can be mitigated with some kind of 
backoff protocol by which the consumer can indicate it is overwhelmed, but 
getting the rate of transfer to fully utilize (but never over-utilize) the 
consumer is trickier than it seems. Previous attempts at building systems in 
this fashion led us to go with a more traditional pull model.
+An initial question we considered is whether consumers should pull data from 
brokers or brokers should push data to the consumer. In this respect Kafka 
follows a more traditional design, shared by most messaging systems, where data 
is pushed to the broker from the producer and pulled from the broker by the 
consumer. Some logging-centric systems, such as <a 
href="http://github.com/facebook/scribe";>Scribe</a> and <a 
href="http://flume.apache.org/";>Apache Flume</a>, follow a very different 
push-based path where data is pushed downstream. There are pros and cons to 
both approaches. However, a push-based system has difficulty dealing with 
diverse consumers as the broker controls the rate at which data is transferred. 
The goal is generally for the consumer to be able to consume at the maximum 
possible rate; unfortunately, in a push system this means the consumer tends to 
be overwhelmed when its rate of consumption falls below the rate of production 
(a denial of service attack, in essence)
 . A pull-based system has the nicer property that the consumer simply falls 
behind and catches up when it can. This can be mitigated with some kind of 
backoff protocol by which the consumer can indicate it is overwhelmed, but 
getting the rate of transfer to fully utilize (but never over-utilize) the 
consumer is trickier than it seems. Previous attempts at building systems in 
this fashion led us to go with a more traditional pull model.
 <p>
-Another advantage of a pull-based system is that it lends itself to aggressive 
batching of data sent to the consumer. A push-based system must choose to 
either send a request immediately or accumulate more data and then send it 
later without knowledge of whether the downstream consumer will be able to 
immediately process it. If tuned for low latency this will result in sending a 
single message at a time only for the transfer to end up being buffered anyway, 
which is wasteful. A pull-based design fixes this as the consumer always pulls 
all available messages after its current position in the log (or up to some 
configurable max size). So one gets optimal batching without introducing 
unnecessary latency.
+Another advantage of a pull-based system is that it lends itself to aggressive 
batching of data sent to the consumer. A push-based system must choose to 
either send a request immediately or accumulate more data and then send it 
later without knowledge of whether the downstream consumer will be able to 
immediately process it. If tuned for low latency, this will result in sending a 
single message at a time only for the transfer to end up being buffered anyway, 
which is wasteful. A pull-based design fixes this as the consumer always pulls 
all available messages after its current position in the log (or up to some 
configurable max size). So one gets optimal batching without introducing 
unnecessary latency.
 <p>
 The deficiency of a naive pull-based system is that if the broker has no data 
the consumer may end up polling in a tight loop, effectively busy-waiting for 
data to arrive. To avoid this we have parameters in our pull request that allow 
the consumer request to block in a "long poll" waiting until data arrives (and 
optionally waiting until a given number of bytes is available to ensure large 
transfer sizes).
 <p>
 You could imagine other possible designs which would be only pull, end-to-end. 
The producer would locally write to a local log, and brokers would pull from 
that with consumers pulling from them. A similar type of "store-and-forward" 
producer is often proposed. This is intriguing but we felt not very suitable 
for our target use cases which have thousands of producers. Our experience 
running persistent data systems at scale led us to feel that involving 
thousands of disks in the system across many applications would not actually 
make things more reliable and would be a nightmare to operate. And in practice 
we have found that we can run a pipeline with strong SLAs at large scale 
without a need for producer persistence.
 
 <h4><a id="design_consumerposition" href="#design_consumerposition">Consumer 
Position</a></h4>
-Keeping track of <i>what</i> has been consumed, is, surprisingly, one of the 
key performance points of a messaging system.
+Keeping track of <i>what</i> has been consumed is, surprisingly, one of the 
key performance points of a messaging system.
 <p>
-Most messaging systems keep metadata about what messages have been consumed on 
the broker. That is, as a message is handed out to a consumer, the broker 
either records that fact locally immediately or it may wait for acknowledgement 
from the consumer. This is a fairly intuitive choice, and indeed for a single 
machine server it is not clear where else this state could go. Since the data 
structure used for storage in many messaging systems scale poorly, this is also 
a pragmatic choice--since the broker knows what is consumed it can immediately 
delete it, keeping the data size small.
+Most messaging systems keep metadata about what messages have been consumed on 
the broker. That is, as a message is handed out to a consumer, the broker 
either records that fact locally immediately or it may wait for acknowledgement 
from the consumer. This is a fairly intuitive choice, and indeed for a single 
machine server it is not clear where else this state could go. Since the data 
structures used for storage in many messaging systems scale poorly, this is 
also a pragmatic choice--since the broker knows what is consumed it can 
immediately delete it, keeping the data size small.
 <p>
-What is perhaps not obvious, is that getting the broker and consumer to come 
into agreement about what has been consumed is not a trivial problem. If the 
broker records a message as <b>consumed</b> immediately every time it is handed 
out over the network, then if the consumer fails to process the message (say 
because it crashes or the request times out or whatever) that message will be 
lost. To solve this problem, many messaging systems add an acknowledgement 
feature which means that messages are only marked as <b>sent</b> not 
<b>consumed</b> when they are sent; the broker waits for a specific 
acknowledgement from the consumer to record the message as <b>consumed</b>. 
This strategy fixes the problem of losing messages, but creates new problems. 
First of all, if the consumer processes the message but fails before it can 
send an acknowledgement then the message will be consumed twice. The second 
problem is around performance, now the broker must keep multiple states about 
every single
  message (first to lock it so it is not given out a second time, and then to 
mark it as permanently consumed so that it can be removed). Tricky problems 
must be dealt with, like what to do with messages that are sent but never 
acknowledged.
+What is perhaps not obvious is that getting the broker and consumer to come 
into agreement about what has been consumed is not a trivial problem. If the 
broker records a message as <b>consumed</b> immediately every time it is handed 
out over the network, then if the consumer fails to process the message (say 
because it crashes or the request times out or whatever) that message will be 
lost. To solve this problem, many messaging systems add an acknowledgement 
feature which means that messages are only marked as <b>sent</b> not 
<b>consumed</b> when they are sent; the broker waits for a specific 
acknowledgement from the consumer to record the message as <b>consumed</b>. 
This strategy fixes the problem of losing messages, but creates new problems. 
First of all, if the consumer processes the message but fails before it can 
send an acknowledgement then the message will be consumed twice. The second 
problem is around performance, now the broker must keep multiple states about 
every single 
 message (first to lock it so it is not given out a second time, and then to 
mark it as permanently consumed so that it can be removed). Tricky problems 
must be dealt with, like what to do with messages that are sent but never 
acknowledged.
 <p>
-Kafka handles this differently. Our topic is divided into a set of totally 
ordered partitions, each of which is consumed by one consumer at any given 
time. This means that the position of consumer in each partition is just a 
single integer, the offset of the next message to consume. This makes the state 
about what has been consumed very small, just one number for each partition. 
This state can be periodically checkpointed. This makes the equivalent of 
message acknowledgements very cheap.
+Kafka handles this differently. Our topic is divided into a set of totally 
ordered partitions, each of which is consumed by one consumer at any given 
time. This means that the position of a consumer in each partition is just a 
single integer, the offset of the next message to consume. This makes the state 
about what has been consumed very small, just one number for each partition. 
This state can be periodically checkpointed. This makes the equivalent of 
message acknowledgements very cheap.
 <p>
 There is a side benefit of this decision. A consumer can deliberately 
<i>rewind</i> back to an old offset and re-consume data. This violates the 
common contract of a queue, but turns out to be an essential feature for many 
consumers. For example, if the consumer code has a bug and is discovered after 
some messages are consumed, the consumer can re-consume those messages once the 
bug is fixed.
 
@@ -164,7 +164,7 @@ Now that we understand a little about how producers and 
consumers work, let's di
 
 It's worth noting that this breaks down into two problems: the durability 
guarantees for publishing a message and the guarantees when consuming a message.
 <p>
-Many systems claim to provide "exactly once" delivery semantics, but it is 
important to read the fine print, most of these claims are misleading (i.e. 
they don't translate to the case where consumers or producers can fail, or 
cases where there are multiple consumer processes, or cases where data written 
to disk can be lost).
+Many systems claim to provide "exactly once" delivery semantics, but it is 
important to read the fine print, most of these claims are misleading (i.e. 
they don't translate to the case where consumers or producers can fail, cases 
where there are multiple consumer processes, or cases where data written to 
disk can be lost).
 <p>
 Kafka's semantics are straight-forward. When publishing a message we have a 
notion of the message being "committed" to the log. Once a published message is 
committed it will not be lost as long as one broker that replicates the 
partition to which this message was written remains "alive". The definition of 
alive as well as a description of which types of failures we attempt to handle 
will be described in more detail in the next section. For now let's assume a 
perfect, lossless broker and try to understand the guarantees to the producer 
and consumer. If a producer attempts to publish a message and experiences a 
network error it cannot be sure if this error happened before or after the 
message was committed. This is similar to the semantics of inserting into a 
database table with an autogenerated key.
 <p>
@@ -294,7 +294,7 @@ In each of these cases one needs primarily to handle the 
real-time feed of chang
 
 This style of usage of a log is described in more detail in <a 
href="http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying";>this
 blog post</a>.
 <p>
-The general idea is quite simple. If we had infinite log retention, and we 
logged each change in the above cases, then we would have captured the state of 
the system at each time from when it first began. Using this complete log we 
could restore to any point in time by replaying the first N records in the log. 
This hypothetical complete log is not very practical for systems that update a 
single record many times as the log will grow without bound even for a stable 
dataset. The simple log retention mechanism which throws away old updates will 
bound space but the log is no longer a way to restore the current 
state&mdash;now restoring from the beginning of the log no longer recreates the 
current state as old updates may not be captured at all.
+The general idea is quite simple. If we had infinite log retention, and we 
logged each change in the above cases, then we would have captured the state of 
the system at each time from when it first began. Using this complete log, we 
could restore to any point in time by replaying the first N records in the log. 
This hypothetical complete log is not very practical for systems that update a 
single record many times as the log will grow without bound even for a stable 
dataset. The simple log retention mechanism which throws away old updates will 
bound space but the log is no longer a way to restore the current 
state&mdash;now restoring from the beginning of the log no longer recreates the 
current state as old updates may not be captured at all.
 <p>
 Log compaction is a mechanism to give finer-grained per-record retention, 
rather than the coarser-grained time-based retention. The idea is to 
selectively remove records where we have a more recent update with the same 
primary key. This way the log is guaranteed to have at least the last state for 
each key.
 <p>
@@ -324,7 +324,7 @@ Log compaction guarantees the following:
 <li>Ordering of messages is always maintained.  Compaction will never re-order 
messages, just remove some.
 <li>The offset for a message never changes.  It is the permanent identifier 
for a position in the log.
 <li>Any read progressing from offset 0 will see at least the final state of 
all records in the order they were written. All delete markers for deleted 
records will be seen provided the reader reaches the head of the log in a time 
period less than the topic's delete.retention.ms setting (the default is 24 
hours). This is important as delete marker removal happens concurrently with 
read (and thus it is important that we not remove any delete marker prior to 
the reader seeing it).
-<li>Any consumer progressing from the start of the log, will see at least the 
<em>final</em> state of all records in the order they were written.  All delete 
markers for deleted records will be seen provided the consumer reaches the head 
of the log in a time period less than the topic's 
<code>delete.retention.ms</code> setting (the default is 24 hours).  This is 
important as delete marker removal happens concurrently with read, and thus it 
is important that we do not remove any delete marker prior to the consumer 
seeing it.
+<li>Any consumer progressing from the start of the log will see at least the 
<em>final</em> state of all records in the order they were written.  All delete 
markers for deleted records will be seen provided the consumer reaches the head 
of the log in a time period less than the topic's 
<code>delete.retention.ms</code> setting (the default is 24 hours).  This is 
important as delete marker removal happens concurrently with read, and thus it 
is important that we do not remove any delete marker prior to the consumer 
seeing it.
 </ol>
 
 <h4><a id="design_compactiondetails" href="#design_compactiondetails">Log 
Compaction Details</a></h4>
@@ -367,10 +367,10 @@ It is possible for producers and consumers to 
produce/consume very high volumes
     This quota is defined on a per-broker basis. Each client can publish/fetch 
a maximum of X bytes/sec per broker before it gets throttled. We decided that 
defining these quotas per broker is much better than having a fixed cluster 
wide bandwidth per client because that would require a mechanism to share 
client quota usage among all the brokers. This can be harder to get right than 
the quota implementation itself!
 </p>
 <p>
-    How does a broker react when it detects a quota violation? In our 
solution, the broker does not return an error rather it attempts to slow down a 
client exceeding its quota. It computes the amount of delay needed to bring a 
guilty client under it's quota and delays the response for that time. This 
approach keeps the quota violation transparent to clients (outside of client 
side metrics). This also keeps them from having to implement any special 
backoff and retry behavior which can get tricky. In fact, bad client behavior 
(retry without backoff) can exacerbate the very problem quotas are trying to 
solve.
+    How does a broker react when it detects a quota violation? In our 
solution, the broker does not return an error rather it attempts to slow down a 
client exceeding its quota. It computes the amount of delay needed to bring a 
guilty client under it's quota and delays the response for that time. This 
approach keeps the quota violation transparent to clients (outside of 
client-side metrics). This also keeps them from having to implement any special 
backoff and retry behavior which can get tricky. In fact, bad client behavior 
(retry without backoff) can exacerbate the very problem quotas are trying to 
solve.
 </p>
 <p>
-Client byte rate is measured over multiple small windows (for e.g. 30 windows 
of 1 second each) in order to detect and correct quota violations quickly. 
Typically, having large measurement windows (for e.g. 10 windows of 30 seconds 
each) leads to large bursts of traffic followed by long delays which is not 
great in terms of user experience.
+Client byte rate is measured over multiple small windows (e.g. 30 windows of 1 
second each) in order to detect and correct quota violations quickly. 
Typically, having large measurement windows (for e.g. 10 windows of 30 seconds 
each) leads to large bursts of traffic followed by long delays which is not 
great in terms of user experience.
 </p>
 <h4><a id="design_quotasoverrides" href="#design_quotasoverrides">Quota 
overrides</a></h4>
 <p>

http://git-wip-us.apache.org/repos/asf/kafka/blob/e11946b0/docs/implementation.html
----------------------------------------------------------------------
diff --git a/docs/implementation.html b/docs/implementation.html
index 9ae7d4e..234d8d7 100644
--- a/docs/implementation.html
+++ b/docs/implementation.html
@@ -194,7 +194,7 @@ The use of the message offset as the message id is unusual. 
Our original idea wa
 <img src="images/kafka_log.png">
 <h4><a id="impl_writes" href="#impl_writes">Writes</a></h4>
 <p>
-The log allows serial appends which always go to the last file. This file is 
rolled over to a fresh file when it reaches a configurable size (say 1GB). The 
log takes two configuration parameter <i>M</i> which gives the number of 
messages to write before forcing the OS to flush the file to disk, and <i>S</i> 
which gives a number of seconds after which a flush is forced. This gives a 
durability guarantee of losing at most <i>M</i> messages or <i>S</i> seconds of 
data in the event of a system crash.
+The log allows serial appends which always go to the last file. This file is 
rolled over to a fresh file when it reaches a configurable size (say 1GB). The 
log takes two configuration parameters: <i>M</i>, which gives the number of 
messages to write before forcing the OS to flush the file to disk, and 
<i>S</i>, which gives a number of seconds after which a flush is forced. This 
gives a durability guarantee of losing at most <i>M</i> messages or <i>S</i> 
seconds of data in the event of a system crash.
 </p>
 <h4><a id="impl_reads" href="#impl_reads">Reads</a></h4>
 <p>

Reply via email to