configuration.html

jkreps Fri, 04 Apr 2014 16:52:23 -0700

Author: jkreps
Date: Fri Apr  4 23:51:20 2014
New Revision: 1584942

URL: http://svn.apache.org/r1584942
Log:
Misc. tweaks to the producer config documentation.



Modified:
    kafka/site/081/configuration.html

Modified: kafka/site/081/configuration.html
URL: 
http://svn.apache.org/viewvc/kafka/site/081/configuration.html?rev=1584942&r1=1584941&r2=1584942&view=diff
==============================================================================
--- kafka/site/081/configuration.html (original)
+++ kafka/site/081/configuration.html Fri Apr  4 23:51:20 2014
@@ -718,21 +718,21 @@ We are working on a replacement for our 
        <th>Description</th>
        </tr>
        <tr>
-       <td>bootstrap.servers</td><td>list</td><td></td><td>high</td><td>A list 
of host/port pairs to use for establishing the initial connection to the Kafka 
cluster. Data will be load balanced over all servers irrespective of which 
servers are specified here for bootstrapping&mdash;this list only impacts the 
initial hosts used to discover the full set of servers. This list should be in 
the form <code>host1:port1,host2:port2,...</code>. Since these servers are just 
used for the initial connection to discover the full cluster membership (which 
may change dynamically), this list need not contain the full set of servers 
(you may want more than one, though, in case a server is down).</td></tr>
+       <td>bootstrap.servers</td><td>list</td><td></td><td>high</td><td>A list 
of host/port pairs to use for establishing the initial connection to the Kafka 
cluster. Data will be load balanced over all servers irrespective of which 
servers are specified here for bootstrapping&mdash;this list only impacts the 
initial hosts used to discover the full set of servers. This list should be in 
the form <code>host1:port1,host2:port2,...</code>. Since these servers are just 
used for the initial connection to discover the full cluster membership (which 
may change dynamically), this list need not contain the full set of servers 
(you may want more than one, though, in case a server is down). If no server in 
this list is available sending data will fail until on becomes 
available.</td></tr>
        <tr>
-       <td>acks</td><td>string</td><td>1</td><td>high</td><td>The number of 
acknowledgments the producer requires before considering a request complete. 
This controls the  durability of records that are sent. The following settings 
are commonly useful:  <ul> <li><code>acks=0</code> If set to zero then the 
producer will not wait for any acknowledgment from the server at all. The 
record will be immediately added to the socket buffer and considered sent. No 
guarantee can be made that the server has received the record in this case, and 
the <code>retries</code> configuration will not take effect (as the client 
won't generally know of any failures). The offset given back for each message 
will always be set to -1. <li><code>acks=1</code> This will mean the leader 
will write the record to its local log but will respond without awaiting full 
acknowledgement from all followers. In this case should the leader fail 
immediately after acknowledging the record but before the followers have 
replicated i
 t then the record will be lost. <li><code>acks=all</code> This means the 
leader will wait for the full set of in-sync replicas to acknowledge the 
record. This guarantees that the record will not be lost as long as at least 
one in-sync replica remains alive. This is the strongest available guarantee. 
<li>Other settings such as <code>acks=2</code> are also possible, and will 
require the given number of acknowledgements but this is generally less 
useful.</td></tr>
+       <td>acks</td><td>string</td><td>1</td><td>high</td><td>The number of 
acknowledgments the producer requires the leader to have received before 
considering a request complete. This controls the  durability of records that 
are sent. The following settings are common:  <ul> <li><code>acks=0</code> If 
set to zero then the producer will not wait for any acknowledgment from the 
server at all. The record will be immediately added to the socket buffer and 
considered sent. No guarantee can be made that the server has received the 
record in this case, and the <code>retries</code> configuration will not take 
effect (as the client won't generally know of any failures). The offset given 
back for each record will always be set to -1. <li><code>acks=1</code> This 
will mean the leader will write the record to its local log but will respond 
without awaiting full acknowledgement from all followers. In this case should 
the leader fail immediately after acknowledging the record but before the 
followers
  have replicated it then the record will be lost. <li><code>acks=all</code> 
This means the leader will wait for the full set of in-sync replicas to 
acknowledge the record. This guarantees that the record will not be lost as 
long as at least one in-sync replica remains alive. This is the strongest 
available guarantee. <li>Other settings such as <code>acks=2</code> are also 
possible, and will require the given number of acknowledgements but this is 
generally less useful.</td></tr>
        <tr>
        
<td>buffer.memory</td><td>long</td><td>33554432</td><td>high</td><td>The total 
bytes of memory the producer can use to buffer records waiting to be sent to 
the server. If records are sent faster than they can be delivered to the server 
the producer will either block or throw an exception based on the preference 
specified by <code>block.on.buffer.full</code>. <p>This setting should 
correspond roughly to the total memory the producer will use, but is not a hard 
bound since not all memory the producer uses is used for buffering. Some 
additional memory will be used for compression (if compression is enabled) as 
well as for maintaining in-flight requests.</td></tr>
        <tr>
        
<td>compression.type</td><td>string</td><td>none</td><td>high</td><td>The 
compression type for all data generated by the producer. The default is none 
(i.e. no compression). Valid  values are <code>none</code>, <code>gzip</code>, 
or <code>snappy</code>. Compression is of full batches of data,  so the 
efficacy of batching will also impact the compression ratio (more batching 
means better compression).</td></tr>
        <tr>
-       <td>retries</td><td>int</td><td>0</td><td>high</td><td>Setting a value 
greater than zero will cause the client to resend any record whose send fails 
with a potentially transient error. Note that this retry is no different than 
if the client resent the message upon receiving the error. Allowing retries 
will potentially change the ordering of messages because if two messages are 
sent to a single partition, and the first fails and is retried but the second 
succeeds, then the second message may appear first.</td></tr>
+       <td>retries</td><td>int</td><td>0</td><td>high</td><td>Setting a value 
greater than zero will cause the client to resend any record whose send fails 
with a potentially transient error. Note that this retry is no different than 
if the client resent the record upon receiving the error. Allowing retries will 
potentially change the ordering of records because if two records are sent to a 
single partition, and the first fails and is retried but the second succeeds, 
then the second record may appear first.</td></tr>
        <tr>
-       <td>batch.size</td><td>int</td><td>16384</td><td>medium</td><td>The 
producer will attempt to batch records together into fewer requests whenever 
multiple records are being sent to the same partition. This helps performance 
on both the client and the server. This configuration controls the default 
batch size in bytes. <p>No attempt will be made to batch records larger than 
this size. <p>Requests sent to brokers will contain multiple batches, one for 
each partition there is data for. <p>A small batch size will make batching less 
common and may reduce throughput (a batch size of zero will disable batching 
entirely). A very large batch size may use memory a bit more wastefully as we 
will always allocate a buffer of the specified batch size in anticipation of 
additional messages.</td></tr>
+       <td>batch.size</td><td>int</td><td>16384</td><td>medium</td><td>The 
producer will attempt to batch records together into fewer requests whenever 
multiple records are being sent to the same partition. This helps performance 
on both the client and the server. This configuration controls the default 
batch size in bytes. <p>No attempt will be made to batch records larger than 
this size. <p>Requests sent to brokers will contain multiple batches, one for 
each partition with data available to be sent. <p>A small batch size will make 
batching less common and may reduce throughput (a batch size of zero will 
disable batching entirely). A very large batch size may use memory a bit more 
wastefully as we will always allocate a buffer of the specified batch size in 
anticipation of additional records.</td></tr>
        <tr>
        <td>client.id</td><td>string</td><td></td><td>medium</td><td>The id 
string to pass to the server when making requests. The purpose of this is to be 
able to track the source of requests beyond just ip/port by allowing a logical 
application name to be included with the request. The application can set any 
string it wants as this has no functional purpose other than in logging and 
metrics.</td></tr>
        <tr>
-       <td>linger.ms</td><td>long</td><td>0</td><td>medium</td><td>The 
producer groups together any records that arrive in between request sends. 
Normally this occurs only under load when records arrive faster than they can 
be sent out. However in some circumstances the client may want to reduce the 
number of requests even under moderate load. This setting accomplishes this by 
adding a small amount of artificial delay&mdash;that is, rather than 
immediately sending out a record the producer will wait for up to the given 
delay to allow other records to be sent so that the sends can be batched 
together. This can be thought of as analogous to Nagle's algorithm in TCP. This 
setting gives the upper bound on the delay for batching: once we get 
<code>batch.size</code> worth of records for a partition it will be sent 
immediately regardless of this setting, however if we have fewer than this many 
bytes accumulated for this partition we will 'linger' for the specified time 
waiting for more records t
 o show up. This setting defaults to 0 (i.e. no delay).</td></tr>
+       <td>linger.ms</td><td>long</td><td>0</td><td>medium</td><td>The 
producer groups together any records that arrive in between request 
transmissions into a single batched request. Normally this occurs only under 
load when records arrive faster than they can be sent out. However in some 
circumstances the client may want to reduce the number of requests even under 
moderate load. This setting accomplishes this by adding a small amount of 
artificial delay&mdash;that is, rather than immediately sending out a record 
the producer will wait for up to the given delay to allow other records to be 
sent so that the sends can be batched together. This can be thought of as 
analogous to Nagle's algorithm in TCP. This setting gives the upper bound on 
the delay for batching: once we get <code>batch.size</code> worth of records 
for a partition it will be sent immediately regardless of this setting, however 
if we have fewer than this many bytes accumulated for this partition we will 
'linger' for the spe
 cified time waiting for more records to show up. This setting defaults to 0 
(i.e. no delay). Setting <code>linger.ms=5</code>, for example, would have the 
effect of reducing the number of requests sent but would add up to 5ms of 
latency to records sent in the absense of load.</td></tr>
        <tr>
        
<td>max.request.size</td><td>int</td><td>1048576</td><td>medium</td><td>The 
maximum size of a request. This is also effectively a cap on the maximum record 
size. Note that the server has its own cap on record size which may be 
different from this. This setting will limit the number of record batches the 
producer will send in a single request to avoid sending huge requests.</td></tr>
        <tr>
@@ -740,17 +740,15 @@ We are working on a replacement for our 
        <tr>
        
<td>send.buffer.bytes</td><td>int</td><td>131072</td><td>medium</td><td>The 
size of the TCP send buffer to use when sending data</td></tr>
        <tr>
-       <td>timeout.ms</td><td>int</td><td>30000</td><td>medium</td><td>The 
configuration controls the maximum amount of time the server will wait for 
acknowledgments from followers to meet the acknowledgment requirements the 
producer has specified with the <code>acks</code> configuration. If the 
requested number of acknowledgments are not met when the timeout ellipses an 
error will be returned. This timeout is measured on the server side and does 
not include the network latency of the request.</td></tr>
+       <td>timeout.ms</td><td>int</td><td>30000</td><td>medium</td><td>The 
configuration controls the maximum amount of time the server will wait for 
acknowledgments from followers to meet the acknowledgment requirements the 
producer has specified with the <code>acks</code> configuration. If the 
requested number of acknowledgments are not met when the timeout elapses an 
error will be returned. This timeout is measured on the server side and does 
not include the network latency of the request.</td></tr>
        <tr>
-       
<td>block.on.buffer.full</td><td>boolean</td><td>true</td><td>low</td><td>When 
our memory buffer is exhausted we must either stop accepting new records 
(block) or throw errors. By default this setting is true and we block, however 
in some scenarios blocking is not desirable and it is better to immediately 
give an error. Setting this to <code>false</code> will accomplish 
that.</td></tr>
-       <tr>
-       
<td>metadata.fetch.backoff.ms</td><td>long</td><td>50</td><td>low</td><td>The 
minimum amount of time between metadata refreshes. The client refreshes 
metadata whenever it realizes its internal metadata is out of sync with the 
actual leadership of partitions. This configuration specifies a backoff to 
prevent metadata refreshes from happening too frequently.</td></tr>
+       
<td>block.on.buffer.full</td><td>boolean</td><td>true</td><td>low</td><td>When 
our memory buffer is exhausted we must either stop accepting new records 
(block) or throw errors. By default this setting is true and we block, however 
in some scenarios blocking is not desirable and it is better to immediately 
give an error. Setting this to <code>false</code> will accomplish that: the 
producer will throw a BufferExhaustedException if a recrord is sent and the 
buffer space is full.</td></tr>
        <tr>
        
<td>metadata.fetch.timeout.ms</td><td>long</td><td>60000</td><td>low</td><td>The
 first time data is sent to a topic we must fetch metadata about that topic to 
know which servers host the topic's partitions. This configuration controls the 
maximum amount of time we will block waiting for the metadata fetch to succeed 
before throwing an exception back to the client.</td></tr>
        <tr>
-       
<td>metadata.max.age.ms</td><td>long</td><td>300000</td><td>low</td><td>The 
period of time in milliseconds after which we force a refresh of metadata even 
if we haven't seen any leadership changes to proactively discover any new 
brokers or partitions.</td></tr>
+       
<td>metadata.max.age.ms</td><td>long</td><td>300000</td><td>low</td><td>The 
period of time in milliseconds after which we force a refresh of metadata even 
if we haven't seen any  partition leadership changes to proactively discover 
any new brokers or partitions.</td></tr>
        <tr>
-       <td>metric.reporters</td><td>list</td><td>[]</td><td>low</td><td>A list 
of classes to use as metrics reporters. Implementing the 
<code>MetricReporter</code> interface allows plugging in classes that will be 
notified of new metric creation.</td></tr>
+       <td>metric.reporters</td><td>list</td><td>[]</td><td>low</td><td>A list 
of classes to use as metrics reporters. Implementing the 
<code>MetricReporter</code> interface allows plugging in classes that will be 
notified of new metric creation. The JmxReporter is always included to register 
JMX statistics.</td></tr>
        <tr>
        <td>metrics.num.samples</td><td>int</td><td>2</td><td>low</td><td>The 
number of samples maintained to compute metrics.</td></tr>
        <tr>

svn commit: r1584942 - /kafka/site/081/configuration.html

Reply via email to