svn commit: r953472 [2/5] - in /websites/staging/flume/trunk/content: ./ .doctrees/ .doctrees/releases/ _sources/ _sources/releases/ releases/

buildbot Mon, 01 Jun 2015 12:51:13 -0700

Modified: websites/staging/flume/trunk/content/FlumeUserGuide.html
==============================================================================
--- websites/staging/flume/trunk/content/FlumeUserGuide.html (original)
+++ websites/staging/flume/trunk/content/FlumeUserGuide.html Mon Jun  1 
19:50:47 2015
@@ -7,7 +7,7 @@
   <head>
     <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
     
-    <title>Flume 1.5.2 User Guide &mdash; Apache Flume</title>
+    <title>Flume 1.6.0 User Guide &mdash; Apache Flume</title>
     
     <link rel="stylesheet" href="_static/flume.css" type="text/css" />
     <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
@@ -26,7 +26,7 @@
     <script type="text/javascript" src="_static/doctools.js"></script>
     <link rel="top" title="Apache Flume" href="index.html" />
     <link rel="up" title="Documentation" href="documentation.html" />
-    <link rel="next" title="Flume 1.5.2 Developer Guide" 
href="FlumeDeveloperGuide.html" />
+    <link rel="next" title="Flume 1.6.0 Developer Guide" 
href="FlumeDeveloperGuide.html" />
     <link rel="prev" title="Documentation" href="documentation.html" /> 
   </head>
   <body>
@@ -59,8 +59,8 @@
         <div class="bodywrapper">
           <div class="body">
             
-  <div class="section" id="flume-1-5-2-user-guide">
-<h1>Flume 1.5.2 User Guide<a class="headerlink" href="#flume-1-5-2-user-guide" 
title="Permalink to this headline">Â¶</a></h1>
+  <div class="section" id="flume-1-6-0-user-guide">
+<h1>Flume 1.6.0 User Guide<a class="headerlink" href="#flume-1-6-0-user-guide" 
title="Permalink to this headline">Â¶</a></h1>
 <div class="section" id="introduction">
 <h2>Introduction<a class="headerlink" href="#introduction" title="Permalink to 
this headline">Â¶</a></h2>
 <div class="section" id="overview">
@@ -248,6 +248,42 @@ OK</pre>
 </div>
 <p>Congratulations - you&#8217;ve successfully configured and deployed a Flume 
agent! Subsequent sections cover agent configuration in much more detail.</p>
 </div>
+<div class="section" id="zookeeper-based-configuration">
+<h4>Zookeeper based Configuration<a class="headerlink" 
href="#zookeeper-based-configuration" title="Permalink to this 
headline">Â¶</a></h4>
+<p>Flume supports Agent configurations via Zookeeper. <em>This is an 
experimental feature.</em> The configuration file needs to be uploaded
+in the Zookeeper, under a configurable prefix. The configuration file is 
stored in Zookeeper Node data.
+Following is how the Zookeeper Node tree would look like for agents a1 and 
a2</p>
+<div class="highlight-properties"><pre>- /flume
+ |- /a1 [Agent config file]
+ |- /a2 [Agent config file]</pre>
+</div>
+<p>Once the configuration file is uploaded, start the agent with following 
options</p>
+<blockquote>
+<div>$ bin/flume-ng agent &#8211;conf conf -z zkhost:2181,zkhost1:2181 -p 
/flume &#8211;name a1 -Dflume.root.logger=INFO,console</div></blockquote>
+<table border="1" class="docutils">
+<colgroup>
+<col width="17%" />
+<col width="15%" />
+<col width="68%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Argument Name</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td><strong>z</strong></td>
+<td>&#8211;</td>
+<td>Zookeeper connection string. Comma separated list of hostname:port</td>
+</tr>
+<tr class="row-odd"><td><strong>p</strong></td>
+<td>/flume</td>
+<td>Base Path in Zookeeper to store Agent configurations</td>
+</tr>
+</tbody>
+</table>
+</div>
 <div class="section" id="installing-third-party-plugins">
 <h4>Installing third-party plugins<a class="headerlink" 
href="#installing-third-party-plugins" title="Permalink to this 
headline">Â¶</a></h4>
 <p>Flume has a fully plugin-based architecture. While Flume ships with many
@@ -726,7 +762,7 @@ Required properties are in <strong>bold<
 <td>false</td>
 <td>Set this to true to enable ipFiltering for netty</td>
 </tr>
-<tr class="row-even"><td>ipFilter.rules</td>
+<tr class="row-even"><td>ipFilterRules</td>
 <td>&#8211;</td>
 <td>Define N netty ipFilter pattern rules with this config.</td>
 </tr>
@@ -741,12 +777,12 @@ Required properties are in <strong>bold<
 <span class="na">a1.sources.r1.port</span> <span class="o">=</span> <span 
class="s">4141</span>
 </pre></div>
 </div>
-<p>Example of ipFilter.rules</p>
-<p>ipFilter.rules defines N netty ipFilters separated by a comma a pattern 
rule must be in this format.</p>
+<p>Example of ipFilterRules</p>
+<p>ipFilterRules defines N netty ipFilters separated by a comma a pattern rule 
must be in this format.</p>
 <p>&lt;&#8217;allow&#8217; or deny&gt;:&lt;&#8217;ip&#8217; or 
&#8216;name&#8217; for computer name&gt;:&lt;pattern&gt;
 or
 allow/deny:ip/name:pattern</p>
-<p>example: ipFilter.rules=allow:ip:127.*,allow:name:localhost,deny:ip:*</p>
+<p>example: ipFilterRules=allow:ip:127.*,allow:name:localhost,deny:ip:*</p>
 <p>Note that the first rule to match will apply as the example below shows 
from a client on the localhost</p>
 <p>This will Allow the client on localhost be deny clients from any other ip 
&#8220;allow:name:localhost,deny:ip:<em>&#8221;
 This will deny the client on localhost be allow clients from any other ip 
&#8220;deny:name:localhost,allow:ip:</em>&#8220;</p>
@@ -756,12 +792,15 @@ This will deny the client on localhost b
 <p>Listens on Thrift port and receives events from external Thrift client 
streams.
 When paired with the built-in ThriftSink on another (previous hop) Flume agent,
 it can create tiered collection topologies.
+Thrift source can be configured to start in secure mode by enabling kerberos 
authentication.
+agent-principal and agent-keytab are the properties used by the
+Thrift source to authenticate to the kerberos KDC.
 Required properties are in <strong>bold</strong>.</p>
 <table border="1" class="docutils">
 <colgroup>
-<col width="23%" />
-<col width="14%" />
-<col width="64%" />
+<col width="5%" />
+<col width="3%" />
+<col width="91%" />
 </colgroup>
 <thead valign="bottom">
 <tr class="row-odd"><th class="head">Property Name</th>
@@ -806,6 +845,38 @@ Required properties are in <strong>bold<
 <td>&nbsp;</td>
 <td>&nbsp;</td>
 </tr>
+<tr class="row-odd"><td>ssl</td>
+<td>false</td>
+<td>Set this to true to enable SSL encryption. You must also specify a 
&#8220;keystore&#8221; and a &#8220;keystore-password&#8221;.</td>
+</tr>
+<tr class="row-even"><td>keystore</td>
+<td>&#8211;</td>
+<td>This is the path to a Java keystore file. Required for SSL.</td>
+</tr>
+<tr class="row-odd"><td>keystore-password</td>
+<td>&#8211;</td>
+<td>The password for the Java keystore. Required for SSL.</td>
+</tr>
+<tr class="row-even"><td>keystore-type</td>
+<td>JKS</td>
+<td>The type of the Java keystore. This can be &#8220;JKS&#8221; or 
&#8220;PKCS12&#8221;.</td>
+</tr>
+<tr class="row-odd"><td>exclude-protocols</td>
+<td>SSLv3</td>
+<td>Space-separated list of SSL/TLS protocols to exclude. SSLv3 will always be 
excluded in addition to the protocols specified.</td>
+</tr>
+<tr class="row-even"><td>kerberos</td>
+<td>false</td>
+<td>Set to true to enable kerberos authentication. In kerberos mode, 
agent-principal and agent-keytab  are required for successful authentication. 
The Thrift source in secure mode, will accept connections only from Thrift 
clients that have kerberos enabled and are successfully authenticated to the 
kerberos KDC.</td>
+</tr>
+<tr class="row-odd"><td>agent-principal</td>
+<td>&#8211;</td>
+<td>The kerberos principal used by the Thrift Source to authenticate to the 
kerberos KDC.</td>
+</tr>
+<tr class="row-even"><td>agent-keytab</td>
+<td>â-</td>
+<td>The keytab location used by the Thrift Source in combination with the 
agent-principal to authenticate to the kerberos KDC.</td>
+</tr>
 </tbody>
 </table>
 <p>Example for agent named a1:</p>
@@ -873,19 +944,23 @@ latter produces a single event and exits
 <td>20</td>
 <td>The max number of lines to read and send to the channel at a time</td>
 </tr>
-<tr class="row-even"><td>selector.type</td>
+<tr class="row-even"><td>batchTimeout</td>
+<td>3000</td>
+<td>Amount of time (in milliseconds) to wait, if the buffer size was not 
reached, before data is pushed downstream</td>
+</tr>
+<tr class="row-odd"><td>selector.type</td>
 <td>replicating</td>
 <td>replicating or multiplexing</td>
 </tr>
-<tr class="row-odd"><td>selector.*</td>
+<tr class="row-even"><td>selector.*</td>
 <td>&nbsp;</td>
 <td>Depends on the selector.type value</td>
 </tr>
-<tr class="row-even"><td>interceptors</td>
+<tr class="row-odd"><td>interceptors</td>
 <td>&#8211;</td>
 <td>Space-separated list of interceptors</td>
 </tr>
-<tr class="row-odd"><td>interceptors.*</td>
+<tr class="row-even"><td>interceptors.*</td>
 <td>&nbsp;</td>
 <td>&nbsp;</td>
 </tr>
@@ -931,9 +1006,9 @@ allows the &#8216;command&#8217; to use
 loops, conditionals etc. In the absence of the &#8216;shell&#8217; config, the 
&#8216;command&#8217; will be
 invoked directly.  Common values for &#8216;shell&#8217; :  &#8216;/bin/sh 
-c&#8217;, &#8216;/bin/ksh -c&#8217;,
 &#8216;cmd /c&#8217;,  &#8216;powershell -Command&#8217;, etc.</p>
-<div class="highlight-properties"><div class="highlight"><pre><span 
class="na">agent_foo.sources.tailsource-1.type</span> <span class="o">=</span> 
<span class="s">exec</span>
-<span class="na">agent_foo.sources.tailsource-1.shell</span> <span 
class="o">=</span> <span class="s">/bin/bash -c</span>
-<span class="na">agent_foo.sources.tailsource-1.command</span> <span 
class="o">=</span> <span class="s">for i in /path/*.txt; do cat $i; done</span>
+<div class="highlight-properties"><div class="highlight"><pre><span 
class="na">a1.sources.tailsource-1.type</span> <span class="o">=</span> <span 
class="s">exec</span>
+<span class="na">a1.sources.tailsource-1.shell</span> <span class="o">=</span> 
<span class="s">/bin/bash -c</span>
+<span class="na">a1.sources.tailsource-1.command</span> <span 
class="o">=</span> <span class="s">for i in /path/*.txt; do cat $i; done</span>
 </pre></div>
 </div>
 </div>
@@ -1199,86 +1274,13 @@ Defaults to parsing each line as an even
 </tbody>
 </table>
 <p>Example for an agent named agent-1:</p>
-<div class="highlight-properties"><div class="highlight"><pre><span 
class="na">agent-1.channels</span> <span class="o">=</span> <span 
class="s">ch-1</span>
-<span class="na">agent-1.sources</span> <span class="o">=</span> <span 
class="s">src-1</span>
+<div class="highlight-properties"><div class="highlight"><pre><span 
class="na">a1.channels</span> <span class="o">=</span> <span 
class="s">ch-1</span>
+<span class="na">a1.sources</span> <span class="o">=</span> <span 
class="s">src-1</span>
 
-<span class="na">agent-1.sources.src-1.type</span> <span class="o">=</span> 
<span class="s">spooldir</span>
-<span class="na">agent-1.sources.src-1.channels</span> <span 
class="o">=</span> <span class="s">ch-1</span>
-<span class="na">agent-1.sources.src-1.spoolDir</span> <span 
class="o">=</span> <span class="s">/var/log/apache/flumeSpool</span>
-<span class="na">agent-1.sources.src-1.fileHeader</span> <span 
class="o">=</span> <span class="s">true</span>
-</pre></div>
-</div>
-</div>
-<div class="section" id="twitter-1-firehose-source-experimental">
-<h4>Twitter 1% firehose Source (experimental)<a class="headerlink" 
href="#twitter-1-firehose-source-experimental" title="Permalink to this 
headline">Â¶</a></h4>
-<div class="admonition warning">
-<p class="first admonition-title">Warning</p>
-<p class="last">This source is hightly experimental and may change between 
minor versions of Flume.
-Use at your own risk.</p>
-</div>
-<p>Experimental source that connects via Streaming API to the 1% sample twitter
-firehose, continously downloads tweets, converts them to Avro format and
-sends Avro events to a downstream Flume sink. Requires the consumer and
-access tokens and secrets of a Twitter developer account.
-Required properties are in <strong>bold</strong>.</p>
-<table border="1" class="docutils">
-<colgroup>
-<col width="18%" />
-<col width="9%" />
-<col width="72%" />
-</colgroup>
-<thead valign="bottom">
-<tr class="row-odd"><th class="head">Property Name</th>
-<th class="head">Default</th>
-<th class="head">Description</th>
-</tr>
-</thead>
-<tbody valign="top">
-<tr class="row-even"><td><strong>channels</strong></td>
-<td>&#8211;</td>
-<td>&nbsp;</td>
-</tr>
-<tr class="row-odd"><td><strong>type</strong></td>
-<td>&#8211;</td>
-<td>The component type name, needs to be <tt class="docutils literal"><span 
class="pre">org.apache.flume.source.twitter.TwitterSource</span></tt></td>
-</tr>
-<tr class="row-even"><td><strong>consumerKey</strong></td>
-<td>&#8211;</td>
-<td>OAuth consumer key</td>
-</tr>
-<tr class="row-odd"><td><strong>consumerSecret</strong></td>
-<td>&#8211;</td>
-<td>OAuth consumer secret</td>
-</tr>
-<tr class="row-even"><td><strong>accessToken</strong></td>
-<td>&#8211;</td>
-<td>OAuth access token</td>
-</tr>
-<tr class="row-odd"><td><strong>accessTokenSecret</strong></td>
-<td>&#8211;</td>
-<td>OAuth toekn secret</td>
-</tr>
-<tr class="row-even"><td>maxBatchSize</td>
-<td>1000</td>
-<td>Maximum number of twitter messages to put in a single batch</td>
-</tr>
-<tr class="row-odd"><td>maxBatchDurationMillis</td>
-<td>1000</td>
-<td>Maximum number of milliseconds to wait before closing a batch</td>
-</tr>
-</tbody>
-</table>
-<p>Example for agent named a1:</p>
-<div class="highlight-properties"><div class="highlight"><pre><span 
class="na">a1.sources</span> <span class="o">=</span> <span class="s">r1</span>
-<span class="na">a1.channels</span> <span class="o">=</span> <span 
class="s">c1</span>
-<span class="na">a1.sources.r1.type</span> <span class="o">=</span> <span 
class="s">org.apache.flume.source.twitter.TwitterSource</span>
-<span class="na">a1.sources.r1.channels</span> <span class="o">=</span> <span 
class="s">c1</span>
-<span class="na">a1.sources.r1.consumerKey</span> <span class="o">=</span> 
<span class="s">YOUR_TWITTER_CONSUMER_KEY</span>
-<span class="na">a1.sources.r1.consumerSecret</span> <span class="o">=</span> 
<span class="s">YOUR_TWITTER_CONSUMER_SECRET</span>
-<span class="na">a1.sources.r1.accessToken</span> <span class="o">=</span> 
<span class="s">YOUR_TWITTER_ACCESS_TOKEN</span>
-<span class="na">a1.sources.r1.accessTokenSecret</span> <span 
class="o">=</span> <span class="s">YOUR_TWITTER_ACCESS_TOKEN_SECRET</span>
-<span class="na">a1.sources.r1.maxBatchSize</span> <span class="o">=</span> 
<span class="s">10</span>
-<span class="na">a1.sources.r1.maxBatchDurationMillis</span> <span 
class="o">=</span> <span class="s">200</span>
+<span class="na">a1.sources.src-1.type</span> <span class="o">=</span> <span 
class="s">spooldir</span>
+<span class="na">a1.sources.src-1.channels</span> <span class="o">=</span> 
<span class="s">ch-1</span>
+<span class="na">a1.sources.src-1.spoolDir</span> <span class="o">=</span> 
<span class="s">/var/log/apache/flumeSpool</span>
+<span class="na">a1.sources.src-1.fileHeader</span> <span class="o">=</span> 
<span class="s">true</span>
 </pre></div>
 </div>
 <div class="section" id="event-deserializers">
@@ -1381,6 +1383,155 @@ inefficient compared to <tt class="docut
 </div>
 </div>
 </div>
+<div class="section" id="twitter-1-firehose-source-experimental">
+<h4>Twitter 1% firehose Source (experimental)<a class="headerlink" 
href="#twitter-1-firehose-source-experimental" title="Permalink to this 
headline">Â¶</a></h4>
+<div class="admonition warning">
+<p class="first admonition-title">Warning</p>
+<p class="last">This source is hightly experimental and may change between 
minor versions of Flume.
+Use at your own risk.</p>
+</div>
+<p>Experimental source that connects via Streaming API to the 1% sample twitter
+firehose, continously downloads tweets, converts them to Avro format and
+sends Avro events to a downstream Flume sink. Requires the consumer and
+access tokens and secrets of a Twitter developer account.
+Required properties are in <strong>bold</strong>.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="18%" />
+<col width="9%" />
+<col width="72%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Property Name</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td><strong>channels</strong></td>
+<td>&#8211;</td>
+<td>&nbsp;</td>
+</tr>
+<tr class="row-odd"><td><strong>type</strong></td>
+<td>&#8211;</td>
+<td>The component type name, needs to be <tt class="docutils literal"><span 
class="pre">org.apache.flume.source.twitter.TwitterSource</span></tt></td>
+</tr>
+<tr class="row-even"><td><strong>consumerKey</strong></td>
+<td>&#8211;</td>
+<td>OAuth consumer key</td>
+</tr>
+<tr class="row-odd"><td><strong>consumerSecret</strong></td>
+<td>&#8211;</td>
+<td>OAuth consumer secret</td>
+</tr>
+<tr class="row-even"><td><strong>accessToken</strong></td>
+<td>&#8211;</td>
+<td>OAuth access token</td>
+</tr>
+<tr class="row-odd"><td><strong>accessTokenSecret</strong></td>
+<td>&#8211;</td>
+<td>OAuth toekn secret</td>
+</tr>
+<tr class="row-even"><td>maxBatchSize</td>
+<td>1000</td>
+<td>Maximum number of twitter messages to put in a single batch</td>
+</tr>
+<tr class="row-odd"><td>maxBatchDurationMillis</td>
+<td>1000</td>
+<td>Maximum number of milliseconds to wait before closing a batch</td>
+</tr>
+</tbody>
+</table>
+<p>Example for agent named a1:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span 
class="na">a1.sources</span> <span class="o">=</span> <span class="s">r1</span>
+<span class="na">a1.channels</span> <span class="o">=</span> <span 
class="s">c1</span>
+<span class="na">a1.sources.r1.type</span> <span class="o">=</span> <span 
class="s">org.apache.flume.source.twitter.TwitterSource</span>
+<span class="na">a1.sources.r1.channels</span> <span class="o">=</span> <span 
class="s">c1</span>
+<span class="na">a1.sources.r1.consumerKey</span> <span class="o">=</span> 
<span class="s">YOUR_TWITTER_CONSUMER_KEY</span>
+<span class="na">a1.sources.r1.consumerSecret</span> <span class="o">=</span> 
<span class="s">YOUR_TWITTER_CONSUMER_SECRET</span>
+<span class="na">a1.sources.r1.accessToken</span> <span class="o">=</span> 
<span class="s">YOUR_TWITTER_ACCESS_TOKEN</span>
+<span class="na">a1.sources.r1.accessTokenSecret</span> <span 
class="o">=</span> <span class="s">YOUR_TWITTER_ACCESS_TOKEN_SECRET</span>
+<span class="na">a1.sources.r1.maxBatchSize</span> <span class="o">=</span> 
<span class="s">10</span>
+<span class="na">a1.sources.r1.maxBatchDurationMillis</span> <span 
class="o">=</span> <span class="s">200</span>
+</pre></div>
+</div>
+</div>
+<div class="section" id="kafka-source">
+<h4>Kafka Source<a class="headerlink" href="#kafka-source" title="Permalink to 
this headline">Â¶</a></h4>
+<p>Kafka Source is an Apache Kafka consumer that reads messages from a Kafka 
topic.
+If you have multiple Kafka sources running, you can configure them with the 
same Consumer Group
+so each will read a unique set of partitions for the topic.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="21%" />
+<col width="8%" />
+<col width="71%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Property Name</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td><strong>channels</strong></td>
+<td>&#8211;</td>
+<td>&nbsp;</td>
+</tr>
+<tr class="row-odd"><td><strong>type</strong></td>
+<td>&#8211;</td>
+<td>The component type name, needs to be <tt class="docutils literal"><span 
class="pre">org.apache.flume.source.kafka,KafkaSource</span></tt></td>
+</tr>
+<tr class="row-even"><td><strong>zookeeperConnect</strong></td>
+<td>&#8211;</td>
+<td>URI of ZooKeeper used by Kafka cluster</td>
+</tr>
+<tr class="row-odd"><td><strong>groupId</strong></td>
+<td>flume</td>
+<td>Unique identified of consumer group. Setting the same id in multiple 
sources or agents
+indicates that they are part of the same consumer group</td>
+</tr>
+<tr class="row-even"><td><strong>topic</strong></td>
+<td>&#8211;</td>
+<td>Kafka topic we&#8217;ll read messages from. At the time, this is a single 
topic only.</td>
+</tr>
+<tr class="row-odd"><td>batchSize</td>
+<td>1000</td>
+<td>Maximum number of messages written to Channel in one batch</td>
+</tr>
+<tr class="row-even"><td>batchDurationMillis</td>
+<td>1000</td>
+<td>Maximum time (in ms) before a batch will be written to Channel
+The batch will be written whenever the first of size and time will be 
reached.</td>
+</tr>
+<tr class="row-odd"><td>Other Kafka Consumer Properties</td>
+<td>&#8211;</td>
+<td>These properties are used to configure the Kafka Consumer. Any producer 
property supported
+by Kafka can be used. The only requirement is to prepend the property name 
with the prefix <tt class="docutils literal"><span 
class="pre">kafka.</span></tt>.
+For example: kafka.consumer.timeout.ms
+Check <cite>Kafka documentation 
&lt;https://kafka.apache.org/08/configuration.html#consumerconfigs&gt;</cite> 
for details</td>
+</tr>
+</tbody>
+</table>
+<div class="admonition note">
+<p class="first admonition-title">Note</p>
+<p class="last">The Kafka Source overrides two Kafka consumer parameters:
+auto.commit.enable is set to &#8220;false&#8221; by the source and we commit 
every batch. For improved performance
+this can be set to &#8220;true&#8221;, however, this can lead to loss of data
+consumer.timeout.ms is set to 10ms, so when we check Kafka for new data we 
wait at most 10ms for the data to arrive
+setting this to a higher value can reduce CPU utilization (we&#8217;ll poll 
Kafka in less of a tight loop), but also means
+higher latency in writing batches to channel (since we&#8217;ll wait longer 
for data to arrive).</p>
+</div>
+<p>Example for agent named tier1:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span 
class="na">tier1.sources.source1.type</span> <span class="o">=</span> <span 
class="s">org.apache.flume.source.kafka.KafkaSource</span>
+<span class="na">tier1.sources.source1.channels</span> <span 
class="o">=</span> <span class="s">channel1</span>
+<span class="na">tier1.sources.source1.zookeeperConnect</span> <span 
class="o">=</span> <span class="s">localhost:2181</span>
+<span class="na">tier1.sources.source1.topic</span> <span class="o">=</span> 
<span class="s">test1</span>
+<span class="na">tier1.sources.source1.groupId</span> <span class="o">=</span> 
<span class="s">flume</span>
+<span class="na">tier1.sources.source1.kafka.consumer.timeout.ms</span> <span 
class="o">=</span> <span class="s">100</span>
+</pre></div>
+</div>
+</div>
 <div class="section" id="netcat-source">
 <h4>NetCat Source<a class="headerlink" href="#netcat-source" title="Permalink 
to this headline">Â¶</a></h4>
 <p>A netcat-like source that listens on a given port and turns each line of 
text
@@ -1553,9 +1704,14 @@ of characters separated by a newline (&#
 <td>Maximum size of a single event line, in bytes</td>
 </tr>
 <tr class="row-odd"><td>keepFields</td>
-<td>false</td>
-<td>Setting this to true will preserve the Priority,
-Timestamp and Hostname in the body of the event.</td>
+<td>none</td>
+<td>Setting this to &#8216;all&#8217; will preserve the Priority,
+Timestamp and Hostname in the body of the event.
+A spaced separated list of fields to include
+is allowed as well. Currently, the following
+fields can be included: priority, version,
+timestamp, hostname. The values &#8216;true&#8217; and &#8216;false&#8217;
+have been deprecated in favor of &#8216;all&#8217; and &#8216;none&#8217;.</td>
 </tr>
 <tr class="row-even"><td>selector.type</td>
 <td>&nbsp;</td>
@@ -1628,9 +1784,14 @@ basis.</p>
 <td>Maximum size of a single event line, in bytes.</td>
 </tr>
 <tr class="row-odd"><td>keepFields</td>
-<td>false</td>
-<td>Setting this to true will preserve the
-Priority, Timestamp and Hostname in the body of the event.</td>
+<td>none</td>
+<td>Setting this to &#8216;all&#8217; will preserve the
+Priority, Timestamp and Hostname in the body of the event.
+A spaced separated list of fields to include
+is allowed as well. Currently, the following
+fields can be included: priority, version,
+timestamp, hostname. The values &#8216;true&#8217; and &#8216;false&#8217;
+have been deprecated in favor of &#8216;all&#8217; and &#8216;none&#8217;.</td>
 </tr>
 <tr class="row-even"><td>portHeader</td>
 <td>&#8211;</td>
@@ -1904,6 +2065,58 @@ for list of events can be created by:</p
 </table>
 </div>
 </div>
+<div class="section" id="stress-source">
+<h4>Stress Source<a class="headerlink" href="#stress-source" title="Permalink 
to this headline">Â¶</a></h4>
+<p>StressSource is an internal load-generating source implementation which is 
very useful for
+stress tests. It allows User to configure the size of Event payload, with 
empty headers.
+User can configure total number of events to be sent as well maximum number of 
Successful
+Event to be delivered.</p>
+<p>Required properties are in <strong>bold</strong>.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="18%" />
+<col width="10%" />
+<col width="72%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Property Name</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td><strong>type</strong></td>
+<td>&#8211;</td>
+<td>The component type name, needs to be <tt class="docutils literal"><span 
class="pre">org.apache.flume.source.StressSource</span></tt></td>
+</tr>
+<tr class="row-odd"><td>size</td>
+<td>500</td>
+<td>Payload size of each Event. Unit:<strong>byte</strong></td>
+</tr>
+<tr class="row-even"><td>maxTotalEvents</td>
+<td>-1</td>
+<td>Maximum number of Events to be sent</td>
+</tr>
+<tr class="row-odd"><td>maxSuccessfulEvents</td>
+<td>-1</td>
+<td>Maximum number of Events successfully sent</td>
+</tr>
+<tr class="row-even"><td>batchSize</td>
+<td>1</td>
+<td>Number of Events to be sent in one batch</td>
+</tr>
+</tbody>
+</table>
+<p>Example for agent named <strong>a1</strong>:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span 
class="na">a1.sources</span> <span class="o">=</span> <span 
class="s">stresssource-1</span>
+<span class="na">a1.channels</span> <span class="o">=</span> <span 
class="s">memoryChannel-1</span>
+<span class="na">a1.sources.stresssource-1.type</span> <span 
class="o">=</span> <span class="s">org.apache.flume.source.StressSource</span>
+<span class="na">a1.sources.stresssource-1.size</span> <span 
class="o">=</span> <span class="s">10240</span>
+<span class="na">a1.sources.stresssource-1.maxTotalEvents</span> <span 
class="o">=</span> <span class="s">1000000</span>
+<span class="na">a1.sources.stresssource-1.channels</span> <span 
class="o">=</span> <span class="s">memoryChannel-1</span>
+</pre></div>
+</div>
+</div>
 <div class="section" id="legacy-sources">
 <h4>Legacy Sources<a class="headerlink" href="#legacy-sources" 
title="Permalink to this headline">Â¶</a></h4>
 <p>The legacy sources allow a Flume 1.x agent to receive events from Flume 
0.9.4
@@ -2103,9 +2316,9 @@ For deployment of Scribe please follow t
 Required properties are in <strong>bold</strong>.</p>
 <table border="1" class="docutils">
 <colgroup>
-<col width="13%" />
+<col width="17%" />
 <col width="10%" />
-<col width="77%" />
+<col width="73%" />
 </colgroup>
 <thead valign="bottom">
 <tr class="row-odd"><th class="head">Property Name</th>
@@ -2122,15 +2335,19 @@ Required properties are in <strong>bold<
 <td>1499</td>
 <td>Port that Scribe should be connected</td>
 </tr>
-<tr class="row-even"><td>workerThreads</td>
+<tr class="row-even"><td>maxReadBufferBytes</td>
+<td>16384000</td>
+<td>Thrift Default FrameBuffer Size</td>
+</tr>
+<tr class="row-odd"><td>workerThreads</td>
 <td>5</td>
 <td>Handing threads number in Thrift</td>
 </tr>
-<tr class="row-odd"><td>selector.type</td>
+<tr class="row-even"><td>selector.type</td>
 <td>&nbsp;</td>
 <td>&nbsp;</td>
 </tr>
-<tr class="row-even"><td>selector.*</td>
+<tr class="row-odd"><td>selector.*</td>
 <td>&nbsp;</td>
 <td>&nbsp;</td>
 </tr>
@@ -2198,24 +2415,30 @@ required.</p>
 <tr class="row-odd"><td>%d</td>
 <td>day of month (01)</td>
 </tr>
-<tr class="row-even"><td>%D</td>
+<tr class="row-even"><td>%e</td>
+<td>day of month without padding (1)</td>
+</tr>
+<tr class="row-odd"><td>%D</td>
 <td>date; same as %m/%d/%y</td>
 </tr>
-<tr class="row-odd"><td>%H</td>
+<tr class="row-even"><td>%H</td>
 <td>hour (00..23)</td>
 </tr>
-<tr class="row-even"><td>%I</td>
+<tr class="row-odd"><td>%I</td>
 <td>hour (01..12)</td>
 </tr>
-<tr class="row-odd"><td>%j</td>
+<tr class="row-even"><td>%j</td>
 <td>day of year (001..366)</td>
 </tr>
-<tr class="row-even"><td>%k</td>
+<tr class="row-odd"><td>%k</td>
 <td>hour ( 0..23)</td>
 </tr>
-<tr class="row-odd"><td>%m</td>
+<tr class="row-even"><td>%m</td>
 <td>month (01..12)</td>
 </tr>
+<tr class="row-odd"><td>%n</td>
+<td>month without padding (1..12)</td>
+</tr>
 <tr class="row-even"><td>%M</td>
 <td>minute (00..59)</td>
 </tr>
@@ -2251,9 +2474,9 @@ this automatically is to use the Timesta
 </div>
 <table border="1" class="docutils">
 <colgroup>
-<col width="14%" />
-<col width="8%" />
-<col width="79%" />
+<col width="12%" />
+<col width="6%" />
+<col width="82%" />
 </colgroup>
 <thead valign="bottom">
 <tr class="row-odd"><th class="head">Name</th>
@@ -2356,69 +2579,329 @@ This number should be increased if many
 <td>&#8211;</td>
 <td>Kerberos keytab for accessing secure HDFS</td>
 </tr>
-<tr class="row-even"><td>hdfs.proxyUser</td>
-<td>&nbsp;</td>
-<td>&nbsp;</td>
+<tr class="row-even"><td>hdfs.proxyUser</td>
+<td>&nbsp;</td>
+<td>&nbsp;</td>
+</tr>
+<tr class="row-odd"><td>hdfs.round</td>
+<td>false</td>
+<td>Should the timestamp be rounded down (if true, affects all time based 
escape sequences except %t)</td>
+</tr>
+<tr class="row-even"><td>hdfs.roundValue</td>
+<td>1</td>
+<td>Rounded down to the highest multiple of this (in the unit configured using 
<tt class="docutils literal"><span class="pre">hdfs.roundUnit</span></tt>), 
less than current time.</td>
+</tr>
+<tr class="row-odd"><td>hdfs.roundUnit</td>
+<td>second</td>
+<td>The unit of the round down value - <tt class="docutils literal"><span 
class="pre">second</span></tt>, <tt class="docutils literal"><span 
class="pre">minute</span></tt> or <tt class="docutils literal"><span 
class="pre">hour</span></tt>.</td>
+</tr>
+<tr class="row-even"><td>hdfs.timeZone</td>
+<td>Local Time</td>
+<td>Name of the timezone that should be used for resolving the directory path, 
e.g. America/Los_Angeles.</td>
+</tr>
+<tr class="row-odd"><td>hdfs.useLocalTimeStamp</td>
+<td>false</td>
+<td>Use the local time (instead of the timestamp from the event header) while 
replacing the escape sequences.</td>
+</tr>
+<tr class="row-even"><td>hdfs.closeTries</td>
+<td>0</td>
+<td>Number of times the sink must try renaming a file, after initiating a 
close attempt. If set to 1, this sink will not re-try a failed rename
+(due to, for example, NameNode or DataNode failure), and may leave the file in 
an open state with a .tmp extension.
+If set to 0, the sink will try to rename the file until the file is eventually 
renamed (there is no limit on the number of times it would try).
+The file may still remain open if the close call fails but the data will be 
intact and in this case, the file will be closed only after a Flume 
restart.</td>
+</tr>
+<tr class="row-odd"><td>hdfs.retryInterval</td>
+<td>180</td>
+<td>Time in seconds between consecutive attempts to close a file. Each close 
call costs multiple RPC round-trips to the Namenode,
+so setting this too low can cause a lot of load on the name node. If set to 0 
or less, the sink will not
+attempt to close the file if the first attempt fails, and may leave the file 
open or with a &#8221;.tmp&#8221; extension.</td>
+</tr>
+<tr class="row-even"><td>serializer</td>
+<td><tt class="docutils literal"><span class="pre">TEXT</span></tt></td>
+<td>Other possible options include <tt class="docutils literal"><span 
class="pre">avro_event</span></tt> or the
+fully-qualified class name of an implementation of the
+<tt class="docutils literal"><span 
class="pre">EventSerializer.Builder</span></tt> interface.</td>
+</tr>
+<tr class="row-odd"><td>serializer.*</td>
+<td>&nbsp;</td>
+<td>&nbsp;</td>
+</tr>
+</tbody>
+</table>
+<p>Example for agent named a1:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span 
class="na">a1.channels</span> <span class="o">=</span> <span class="s">c1</span>
+<span class="na">a1.sinks</span> <span class="o">=</span> <span 
class="s">k1</span>
+<span class="na">a1.sinks.k1.type</span> <span class="o">=</span> <span 
class="s">hdfs</span>
+<span class="na">a1.sinks.k1.channel</span> <span class="o">=</span> <span 
class="s">c1</span>
+<span class="na">a1.sinks.k1.hdfs.path</span> <span class="o">=</span> <span 
class="s">/flume/events/%y-%m-%d/%H%M/%S</span>
+<span class="na">a1.sinks.k1.hdfs.filePrefix</span> <span class="o">=</span> 
<span class="s">events-</span>
+<span class="na">a1.sinks.k1.hdfs.round</span> <span class="o">=</span> <span 
class="s">true</span>
+<span class="na">a1.sinks.k1.hdfs.roundValue</span> <span class="o">=</span> 
<span class="s">10</span>
+<span class="na">a1.sinks.k1.hdfs.roundUnit</span> <span class="o">=</span> 
<span class="s">minute</span>
+</pre></div>
+</div>
+<p>The above configuration will round down the timestamp to the last 10th 
minute. For example, an event with
+timestamp 11:54:34 AM, June 12, 2012 will cause the hdfs path to become <tt 
class="docutils literal"><span 
class="pre">/flume/events/2012-06-12/1150/00</span></tt>.</p>
+</div>
+<div class="section" id="hive-sink">
+<h4>Hive Sink<a class="headerlink" href="#hive-sink" title="Permalink to this 
headline">Â¶</a></h4>
+<p>This sink streams events containing delimited text or JSON data directly 
into a Hive table or partition.
+Events are written using Hive transactions. As soon as a set of events are 
committed to Hive, they become
+immediately visible to Hive queries. Partitions to which flume will stream to 
can either be pre-created
+or, optionally, Flume can create them if they are missing. Fields from 
incoming event data are mapped to
+corresponding columns in the Hive table. <strong>This sink is provided as a 
preview feature and not recommended
+for use in production.</strong></p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="15%" />
+<col width="8%" />
+<col width="77%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Name</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td><strong>channel</strong></td>
+<td>&#8211;</td>
+<td>&nbsp;</td>
+</tr>
+<tr class="row-odd"><td><strong>type</strong></td>
+<td>&#8211;</td>
+<td>The component type name, needs to be <tt class="docutils literal"><span 
class="pre">hive</span></tt></td>
+</tr>
+<tr class="row-even"><td><strong>hive.metastore</strong></td>
+<td>&#8211;</td>
+<td>Hive metastore URI (eg thrift://a.b.com:9083 )</td>
+</tr>
+<tr class="row-odd"><td><strong>hive.database</strong></td>
+<td>&#8211;</td>
+<td>Hive database name</td>
+</tr>
+<tr class="row-even"><td><strong>hive.table</strong></td>
+<td>&#8211;</td>
+<td>Hive table name</td>
+</tr>
+<tr class="row-odd"><td>hive.partition</td>
+<td>&#8211;</td>
+<td>Comma separate list of partition values identifying the partition to write 
to. May contain escape
+sequences. E.g: If the table is partitioned by (continent: string, country 
:string, time : string)
+then &#8216;Asia,India,2014-02-26-01-21&#8217; will indicate 
continent=Asia,country=India,time=2014-02-26-01-21</td>
+</tr>
+<tr class="row-even"><td>hive.txnsPerBatchAsk</td>
+<td>100</td>
+<td>Hive grants a <em>batch of transactions</em> instead of single 
transactions to streaming clients like Flume.
+This setting configures the number of desired transactions per Transaction 
Batch. Data from all
+transactions in a single batch end up in a single file. Flume will write a 
maximum of batchSize events
+in each transaction in the batch. This setting in conjunction with batchSize 
provides control over the
+size of each file. Note that eventually Hive will transparently compact these 
files into larger files.</td>
+</tr>
+<tr class="row-odd"><td>heartBeatInterval</td>
+<td>240</td>
+<td>(In seconds) Interval between consecutive heartbeats sent to Hive to keep 
unused transactions from expiring.
+Set this value to 0 to disable heartbeats.</td>
+</tr>
+<tr class="row-even"><td>autoCreatePartitions</td>
+<td>true</td>
+<td>Flume will automatically create the necessary Hive partitions to stream 
to</td>
+</tr>
+<tr class="row-odd"><td>batchSize</td>
+<td>15000</td>
+<td>Max number of events written to Hive in a single Hive transaction</td>
+</tr>
+<tr class="row-even"><td>maxOpenConnections</td>
+<td>500</td>
+<td>Allow only this number of open connections. If this number is exceeded, 
the least recently used connection is closed.</td>
+</tr>
+<tr class="row-odd"><td>callTimeout</td>
+<td>10000</td>
+<td>(In milliseconds) Timeout for Hive &amp; HDFS I/O operations, such as 
openTxn, write, commit, abort.</td>
+</tr>
+<tr class="row-even"><td><strong>serializer</strong></td>
+<td>&nbsp;</td>
+<td>Serializer is responsible for parsing out field from the event and mapping 
them to columns in the hive table.
+Choice of serializer depends upon the format of the data in the event. 
Supported serializers: DELIMITED and JSON</td>
+</tr>
+<tr class="row-odd"><td>roundUnit</td>
+<td>minute</td>
+<td>The unit of the round down value - <tt class="docutils literal"><span 
class="pre">second</span></tt>, <tt class="docutils literal"><span 
class="pre">minute</span></tt> or <tt class="docutils literal"><span 
class="pre">hour</span></tt>.</td>
+</tr>
+<tr class="row-even"><td>roundValue</td>
+<td>1</td>
+<td>Rounded down to the highest multiple of this (in the unit configured using 
hive.roundUnit), less than current time</td>
+</tr>
+<tr class="row-odd"><td>timeZone</td>
+<td>Local Time</td>
+<td>Name of the timezone that should be used for resolving the escape 
sequences in partition, e.g. America/Los_Angeles.</td>
+</tr>
+<tr class="row-even"><td>useLocalTimeStamp</td>
+<td>false</td>
+<td>Use the local time (instead of the timestamp from the event header) while 
replacing the escape sequences.</td>
+</tr>
+</tbody>
+</table>
+<p>Following serializers are provided for Hive sink:</p>
+<p><strong>JSON</strong>: Handles UTF8 encoded Json (strict syntax) events and 
requires no configration. Object names
+in the JSON are mapped directly to columns with the same name in the Hive 
table.
+Internally uses org.apache.hive.hcatalog.data.JsonSerDe but is independent of 
the Serde of the Hive table.
+This serializer requires HCatalog to be installed.</p>
+<p><strong>DELIMITED</strong>: Handles simple delimited textual events.
+Internally uses LazySimpleSerde but is independent of the Serde of the Hive 
table.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="22%" />
+<col width="10%" />
+<col width="68%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Name</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td>serializer.delimiter</td>
+<td>,</td>
+<td>(Type: string) The field delimiter in the incoming data. To use special
+characters, surround them with double quotes like &#8220;\t&#8221;</td>
+</tr>
+<tr class="row-odd"><td><strong>serializer.fieldnames</strong></td>
+<td>&#8211;</td>
+<td>The mapping from input fields to columns in hive table. Specified as a
+comma separated list (no spaces) of hive table columns names, identifying
+the input fields in order of their occurrence. To skip fields leave the
+column name unspecified. Eg. &#8216;time,,ip,message&#8217; indicates the 1st, 
3rd
+and 4th fields in input map to time, ip and message columns in the hive 
table.</td>
+</tr>
+<tr class="row-even"><td>serializer.serdeSeparator</td>
+<td>Ctrl-A</td>
+<td>(Type: character) Customizes the separator used by underlying serde. There
+can be a gain in efficiency if the fields in serializer.fieldnames are in
+same order as table columns, the serializer.delimiter is same as the
+serializer.serdeSeparator and number of fields in serializer.fieldnames
+is less than or equal to number of table columns, as the fields in incoming
+event body do not need to be reordered to match order of table columns.
+Use single quotes for special characters like &#8216;\t&#8217;.
+Ensure input fields do not contain this character. NOTE: If 
serializer.delimiter
+is a single character, preferably set this to the same character</td>
+</tr>
+</tbody>
+</table>
+<p>The following are the escape sequences supported:</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="10%" />
+<col width="90%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Alias</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td>%{host}</td>
+<td>Substitute value of event header named &#8220;host&#8221;. Arbitrary 
header names are supported.</td>
+</tr>
+<tr class="row-odd"><td>%t</td>
+<td>Unix time in milliseconds</td>
+</tr>
+<tr class="row-even"><td>%a</td>
+<td>locale&#8217;s short weekday name (Mon, Tue, ...)</td>
+</tr>
+<tr class="row-odd"><td>%A</td>
+<td>locale&#8217;s full weekday name (Monday, Tuesday, ...)</td>
+</tr>
+<tr class="row-even"><td>%b</td>
+<td>locale&#8217;s short month name (Jan, Feb, ...)</td>
+</tr>
+<tr class="row-odd"><td>%B</td>
+<td>locale&#8217;s long month name (January, February, ...)</td>
+</tr>
+<tr class="row-even"><td>%c</td>
+<td>locale&#8217;s date and time (Thu Mar 3 23:05:25 2005)</td>
+</tr>
+<tr class="row-odd"><td>%d</td>
+<td>day of month (01)</td>
+</tr>
+<tr class="row-even"><td>%D</td>
+<td>date; same as %m/%d/%y</td>
+</tr>
+<tr class="row-odd"><td>%H</td>
+<td>hour (00..23)</td>
+</tr>
+<tr class="row-even"><td>%I</td>
+<td>hour (01..12)</td>
+</tr>
+<tr class="row-odd"><td>%j</td>
+<td>day of year (001..366)</td>
 </tr>
-<tr class="row-odd"><td>hdfs.round</td>
-<td>false</td>
-<td>Should the timestamp be rounded down (if true, affects all time based 
escape sequences except %t)</td>
+<tr class="row-even"><td>%k</td>
+<td>hour ( 0..23)</td>
 </tr>
-<tr class="row-even"><td>hdfs.roundValue</td>
-<td>1</td>
-<td>Rounded down to the highest multiple of this (in the unit configured using 
<tt class="docutils literal"><span class="pre">hdfs.roundUnit</span></tt>), 
less than current time.</td>
+<tr class="row-odd"><td>%m</td>
+<td>month (01..12)</td>
 </tr>
-<tr class="row-odd"><td>hdfs.roundUnit</td>
-<td>second</td>
-<td>The unit of the round down value - <tt class="docutils literal"><span 
class="pre">second</span></tt>, <tt class="docutils literal"><span 
class="pre">minute</span></tt> or <tt class="docutils literal"><span 
class="pre">hour</span></tt>.</td>
+<tr class="row-even"><td>%M</td>
+<td>minute (00..59)</td>
 </tr>
-<tr class="row-even"><td>hdfs.timeZone</td>
-<td>Local Time</td>
-<td>Name of the timezone that should be used for resolving the directory path, 
e.g. America/Los_Angeles.</td>
+<tr class="row-odd"><td>%p</td>
+<td>locale&#8217;s equivalent of am or pm</td>
 </tr>
-<tr class="row-odd"><td>hdfs.useLocalTimeStamp</td>
-<td>false</td>
-<td>Use the local time (instead of the timestamp from the event header) while 
replacing the escape sequences.</td>
+<tr class="row-even"><td>%s</td>
+<td>seconds since 1970-01-01 00:00:00 UTC</td>
 </tr>
-<tr class="row-even"><td>hdfs.closeTries</td>
-<td>0</td>
-<td>Number of times the sink must try to close a file. If set to 1, this sink 
will not re-try a failed close
-(due to, for example, NameNode or DataNode failure), and may leave the file in 
an open state with a .tmp extension.
-If set to 0, the sink will try to close the file until the file is eventually 
closed
-(there is no limit on the number of times it would try).</td>
+<tr class="row-odd"><td>%S</td>
+<td>second (00..59)</td>
 </tr>
-<tr class="row-odd"><td>hdfs.retryInterval</td>
-<td>180</td>
-<td>Time in seconds between consecutive attempts to close a file. Each close 
call costs multiple RPC round-trips to the Namenode,
-so setting this too low can cause a lot of load on the name node. If set to 0 
or less, the sink will not
-attempt to close the file if the first attempt fails, and may leave the file 
open or with a &#8221;.tmp&#8221; extension.</td>
+<tr class="row-even"><td>%y</td>
+<td>last two digits of year (00..99)</td>
 </tr>
-<tr class="row-even"><td>serializer</td>
-<td><tt class="docutils literal"><span class="pre">TEXT</span></tt></td>
-<td>Other possible options include <tt class="docutils literal"><span 
class="pre">avro_event</span></tt> or the
-fully-qualified class name of an implementation of the
-<tt class="docutils literal"><span 
class="pre">EventSerializer.Builder</span></tt> interface.</td>
+<tr class="row-odd"><td>%Y</td>
+<td>year (2010)</td>
 </tr>
-<tr class="row-odd"><td>serializer.*</td>
-<td>&nbsp;</td>
-<td>&nbsp;</td>
+<tr class="row-even"><td>%z</td>
+<td>+hhmm numeric timezone (for example, -0400)</td>
 </tr>
 </tbody>
 </table>
+<div class="admonition note">
+<p class="first admonition-title">Note</p>
+<p class="last">For all of the time related escape sequences, a header with 
the key
+&#8220;timestamp&#8221; must exist among the headers of the event (unless <tt 
class="docutils literal"><span class="pre">useLocalTimeStamp</span></tt> is set 
to <tt class="docutils literal"><span class="pre">true</span></tt>). One way to 
add
+this automatically is to use the TimestampInterceptor.</p>
+</div>
+<p>Example Hive table :</p>
+<div class="highlight-properties"><pre>create table weblogs ( id int , msg 
string )
+    partitioned by (continent string, country string, time string)
+    clustered by (id) into 5 buckets
+    stored as orc;</pre>
+</div>
 <p>Example for agent named a1:</p>
 <div class="highlight-properties"><div class="highlight"><pre><span 
class="na">a1.channels</span> <span class="o">=</span> <span class="s">c1</span>
+<span class="na">a1.channels.c1.type</span> <span class="o">=</span> <span 
class="s">memory</span>
 <span class="na">a1.sinks</span> <span class="o">=</span> <span 
class="s">k1</span>
-<span class="na">a1.sinks.k1.type</span> <span class="o">=</span> <span 
class="s">hdfs</span>
+<span class="na">a1.sinks.k1.type</span> <span class="o">=</span> <span 
class="s">hive</span>
 <span class="na">a1.sinks.k1.channel</span> <span class="o">=</span> <span 
class="s">c1</span>
-<span class="na">a1.sinks.k1.hdfs.path</span> <span class="o">=</span> <span 
class="s">/flume/events/%y-%m-%d/%H%M/%S</span>
-<span class="na">a1.sinks.k1.hdfs.filePrefix</span> <span class="o">=</span> 
<span class="s">events-</span>
-<span class="na">a1.sinks.k1.hdfs.round</span> <span class="o">=</span> <span 
class="s">true</span>
-<span class="na">a1.sinks.k1.hdfs.roundValue</span> <span class="o">=</span> 
<span class="s">10</span>
-<span class="na">a1.sinks.k1.hdfs.roundUnit</span> <span class="o">=</span> 
<span class="s">minute</span>
+<span class="na">a1.sinks.k1.hive.metastore</span> <span class="o">=</span> 
<span class="s">thrift://127.0.0.1:9083</span>
+<span class="na">a1.sinks.k1.hive.database</span> <span class="o">=</span> 
<span class="s">logsdb</span>
+<span class="na">a1.sinks.k1.hive.table</span> <span class="o">=</span> <span 
class="s">weblogs</span>
+<span class="na">a1.sinks.k1.hive.partition</span> <span class="o">=</span> 
<span class="s">asia,%{country},%y-%m-%d-%H-%M</span>
+<span class="na">a1.sinks.k1.useLocalTimeStamp</span> <span class="o">=</span> 
<span class="s">false</span>
+<span class="na">a1.sinks.k1.round</span> <span class="o">=</span> <span 
class="s">true</span>
+<span class="na">a1.sinks.k1.roundValue</span> <span class="o">=</span> <span 
class="s">10</span>
+<span class="na">a1.sinks.k1.roundUnit</span> <span class="o">=</span> <span 
class="s">minute</span>
+<span class="na">a1.sinks.k1.serializer</span> <span class="o">=</span> <span 
class="s">DELIMITED</span>
+<span class="na">a1.sinks.k1.serializer.delimiter</span> <span 
class="o">=</span> <span class="s">&quot;\t&quot;</span>
+<span class="na">a1.sinks.k1.serializer.serdeSeparator</span> <span 
class="o">=</span> <span class="s">&#39;\t&#39;</span>
+<span class="na">a1.sinks.k1.serializer.fieldnames</span> <span 
class="o">=</span><span class="s">id,,msg</span>
 </pre></div>
 </div>
 <p>The above configuration will round down the timestamp to the last 10th 
minute. For example, an event with
-timestamp 11:54:34 AM, June 12, 2012 will cause the hdfs path to become <tt 
class="docutils literal"><span 
class="pre">/flume/events/2012-06-12/1150/00</span></tt>.</p>
+timestamp header set to 11:54:34 AM, June 12, 2012 and &#8216;country&#8217; 
header set to &#8216;india&#8217; will evaluate to the
+partition 
(continent=&#8217;asia&#8217;,country=&#8217;india&#8217;,time=&#8216;2012-06-12-11-50&#8217;.
 The serializer is configured to
+accept tab separated input containing three fields and to skip the second 
field.</p>
 </div>
 <div class="section" id="logger-sink">
 <h4>Logger Sink<a class="headerlink" href="#logger-sink" title="Permalink to 
this headline">Â¶</a></h4>
@@ -2426,9 +2909,9 @@ timestamp 11:54:34 AM, June 12, 2012 wil
 Required properties are in <strong>bold</strong>.</p>
 <table border="1" class="docutils">
 <colgroup>
-<col width="21%" />
+<col width="20%" />
 <col width="10%" />
-<col width="69%" />
+<col width="70%" />
 </colgroup>
 <thead valign="bottom">
 <tr class="row-odd"><th class="head">Property Name</th>
@@ -2445,6 +2928,10 @@ Required properties are in <strong>bold<
 <td>&#8211;</td>
 <td>The component type name, needs to be <tt class="docutils literal"><span 
class="pre">logger</span></tt></td>
 </tr>
+<tr class="row-even"><td>maxBytesToLog</td>
+<td>16</td>
+<td>Maximum number of bytes of the Event body to log</td>
+</tr>
 </tbody>
 </table>
 <p>Example for agent named a1:</p>
@@ -2560,13 +3047,18 @@ Required properties are in <strong>bold<
 <p>This sink forms one half of Flume&#8217;s tiered collection support. Flume 
events
 sent to this sink are turned into Thrift events and sent to the configured
 hostname / port pair. The events are taken from the configured Channel in
-batches of the configured batch size.
+batches of the configured batch size.</p>
+<p>Thrift sink can be configured to start in secure mode by enabling kerberos 
authentication.
+To communicate with a Thrift source started in secure mode, the Thrift sink 
should also
+operate in secure mode. client-principal and client-keytab are the properties 
used by the
+Thrift sink to authenticate to the kerberos KDC. The server-principal 
represents the
+principal of the Thrift source this sink is configured to connect to in secure 
mode.
 Required properties are in <strong>bold</strong>.</p>
 <table border="1" class="docutils">
 <colgroup>
-<col width="9%" />
+<col width="7%" />
 <col width="2%" />
-<col width="89%" />
+<col width="91%" />
 </colgroup>
 <thead valign="bottom">
 <tr class="row-odd"><th class="head">Property Name</th>
@@ -2607,6 +3099,42 @@ Required properties are in <strong>bold<
 <td>none</td>
 <td>Amount of time (s) before the connection to the next hop is reset. This 
will force the Thrift Sink to reconnect to the next hop. This will allow the 
sink to connect to hosts behind a hardware load-balancer when news hosts are 
added without having to restart the agent.</td>
 </tr>
+<tr class="row-even"><td>ssl</td>
+<td>false</td>
+<td>Set to true to enable SSL for this ThriftSink. When configuring SSL, you 
can optionally set a &#8220;truststore&#8221;, 
&#8220;truststore-password&#8221; and &#8220;truststore-type&#8221;</td>
+</tr>
+<tr class="row-odd"><td>truststore</td>
+<td>&#8211;</td>
+<td>The path to a custom Java truststore file. Flume uses the certificate 
authority information in this file to determine whether the remote Thrift 
Source&#8217;s SSL authentication credentials should be trusted. If not 
specified, the default Java JSSE certificate authority files (typically 
&#8220;jssecacerts&#8221; or &#8220;cacerts&#8221; in the Oracle JRE) will be 
used.</td>
+</tr>
+<tr class="row-even"><td>truststore-password</td>
+<td>&#8211;</td>
+<td>The password for the specified truststore.</td>
+</tr>
+<tr class="row-odd"><td>truststore-type</td>
+<td>JKS</td>
+<td>The type of the Java truststore. This can be &#8220;JKS&#8221; or other 
supported Java truststore type.</td>
+</tr>
+<tr class="row-even"><td>exclude-protocols</td>
+<td>SSLv3</td>
+<td>Space-separated list of SSL/TLS protocols to exclude</td>
+</tr>
+<tr class="row-odd"><td>kerberos</td>
+<td>false</td>
+<td>Set to true to enable kerberos authentication. In kerberos mode, 
client-principal, client-keytab and server-principal are required for 
successful authentication and communication to a kerberos enabled Thrift 
Source.</td>
+</tr>
+<tr class="row-even"><td>client-principal</td>
+<td>â-</td>
+<td>The kerberos principal used by the Thrift Sink to authenticate to the 
kerberos KDC.</td>
+</tr>
+<tr class="row-odd"><td>client-keytab</td>
+<td>â-</td>
+<td>The keytab location used by the Thrift Sink in combination with the 
client-principal to authenticate to the kerberos KDC.</td>
+</tr>
+<tr class="row-even"><td>server-principal</td>
+<td>&#8211;</td>
+<td>The kerberos principal of the Thrift Source to which the Thrift Sink is 
configured to connect to.</td>
+</tr>
 </tbody>
 </table>
 <p>Example for agent named a1:</p>
@@ -3092,11 +3620,13 @@ the more powerful ElasticSearchIndexRequ
 </tr>
 <tr class="row-odd"><td>indexName</td>
 <td>flume</td>
-<td>The name of the index which the date will be appended to. Example 
&#8216;flume&#8217; -&gt; &#8216;flume-yyyy-MM-dd&#8217;</td>
+<td>The name of the index which the date will be appended to. Example 
&#8216;flume&#8217; -&gt; &#8216;flume-yyyy-MM-dd&#8217;
+Arbitrary header substitution is supported, eg. %{header} replaces with value 
of named event header</td>
 </tr>
 <tr class="row-even"><td>indexType</td>
 <td>logs</td>
-<td>The type to index the document to, defaults to &#8216;log&#8217;</td>
+<td>The type to index the document to, defaults to &#8216;log&#8217;
+Arbitrary header substitution is supported, eg. %{header} replaces with value 
of named event header</td>
 </tr>
 <tr class="row-odd"><td>clusterName</td>
 <td>elasticsearch</td>
@@ -3125,6 +3655,12 @@ either class are accepted but ElasticSea
 </tr>
 </tbody>
 </table>
+<div class="admonition note">
+<p class="first admonition-title">Note</p>
+<p class="last">Header substitution is a handy to use the value of an event 
header to dynamically decide the indexName and indexType to use when storing 
the event.
+Caution should be used in using this feature as the event submitter now has 
control of the indexName and indexType.
+Furthermore, if the elasticsearch REST client is used then the event submitter 
has control of the URL path used.</p>
+</div>
 <p>Example for agent named a1:</p>
 <div class="highlight-properties"><div class="highlight"><pre><span 
class="na">a1.channels</span> <span class="o">=</span> <span class="s">c1</span>
 <span class="na">a1.sinks</span> <span class="o">=</span> <span 
class="s">k1</span>
@@ -3140,18 +3676,12 @@ either class are accepted but ElasticSea
 </pre></div>
 </div>
 </div>
-<div class="section" id="kite-dataset-sink-experimental">
-<h4>Kite Dataset Sink (experimental)<a class="headerlink" 
href="#kite-dataset-sink-experimental" title="Permalink to this 
headline">Â¶</a></h4>
-<div class="admonition warning">
-<p class="first admonition-title">Warning</p>
-<p class="last">This source is experimental and may change between minor 
versions of Flume.
-Use at your own risk.</p>
-</div>
-<p>Experimental sink that writes events to a <a class="reference external" 
href="http://kitesdk.org/docs/current/kite-data/guide.html";>Kite Dataset</a>.
+<div class="section" id="kite-dataset-sink">
+<h4>Kite Dataset Sink<a class="headerlink" href="#kite-dataset-sink" 
title="Permalink to this headline">Â¶</a></h4>
+<p>Experimental sink that writes events to a <a class="reference external" 
href="http://kitesdk.org/docs/current/guide/";>Kite Dataset</a>.
 This sink will deserialize the body of each incoming event and store the
-resulting record in a Kite Dataset. It determines target Dataset by opening a
-repository URI, <tt class="docutils literal"><span 
class="pre">kite.repo.uri</span></tt>, and loading a Dataset by name,
-<tt class="docutils literal"><span 
class="pre">kite.dataset.name</span></tt>.</p>
+resulting record in a Kite Dataset. It determines target Dataset by loading a
+dataset by URI.</p>
 <p>The only supported serialization is avro, and the record schema must be 
passed
 in the event headers, using either <tt class="docutils literal"><span 
class="pre">flume.avro.schema.literal</span></tt> with the JSON
 schema representation or <tt class="docutils literal"><span 
class="pre">flume.avro.schema.url</span></tt> with a URL where the schema
@@ -3164,9 +3694,9 @@ has been exceeded. However, this delay w
 cases, the delay is neglegible.</p>
 <table border="1" class="docutils">
 <colgroup>
-<col width="26%" />
-<col width="8%" />
-<col width="66%" />
+<col width="28%" />
+<col width="7%" />
+<col width="65%" />
 </colgroup>
 <thead valign="bottom">
 <tr class="row-odd"><th class="head">Property Name</th>
@@ -3183,13 +3713,24 @@ cases, the delay is neglegible.</p>
 <td>&#8211;</td>
 <td>Must be org.apache.flume.sink.kite.DatasetSink</td>
 </tr>
-<tr class="row-even"><td><strong>kite.repo.uri</strong></td>
+<tr class="row-even"><td><strong>kite.dataset.uri</strong></td>
+<td>&#8211;</td>
+<td>URI of the dataset to open</td>
+</tr>
+<tr class="row-odd"><td>kite.repo.uri</td>
+<td>&#8211;</td>
+<td>URI of the repository to open
+(deprecated; use kite.dataset.uri instead)</td>
+</tr>
+<tr class="row-even"><td>kite.dataset.namespace</td>
 <td>&#8211;</td>
-<td>URI of the repository to open</td>
+<td>Namespace of the Dataset where records will be written
+(deprecated; use kite.dataset.uri instead)</td>
 </tr>
-<tr class="row-odd"><td><strong>kite.dataset.name</strong></td>
+<tr class="row-odd"><td>kite.dataset.name</td>
 <td>&#8211;</td>
-<td>Name of the Dataset where records will be written</td>
+<td>Name of the Dataset where records will be written
+(deprecated; use kite.dataset.uri instead)</td>
 </tr>
 <tr class="row-even"><td>kite.batchSize</td>
 <td>100</td>
@@ -3199,15 +3740,56 @@ cases, the delay is neglegible.</p>
 <td>30</td>
 <td>Maximum wait time (seconds) before data files are released</td>
 </tr>
-<tr class="row-even"><td>auth.kerberosPrincipal</td>
+<tr class="row-even"><td>kite.flushable.commitOnBatch</td>
+<td>true</td>
+<td>If <tt class="docutils literal"><span class="pre">true</span></tt>, the 
Flume transaction will be commited and the
+writer will be flushed on each batch of <tt class="docutils literal"><span 
class="pre">kite.batchSize</span></tt>
+records. This setting only applies to flushable datasets. When
+<tt class="docutils literal"><span class="pre">true</span></tt>, it&#8217;s 
possible for temp files with commited data to be
+left in the dataset directory. These files need to be recovered
+by hand for the data to be visible to DatasetReaders.</td>
+</tr>
+<tr class="row-odd"><td>kite.syncable.syncOnBatch</td>
+<td>true</td>
+<td>Controls whether the sink will also sync data when committing
+the transaction. This setting only applies to syncable datasets.
+Syncing gaurentees that data will be written on stable storage
+on the remote system while flushing only gaurentees that data
+has left Flume&#8217;s client buffers. When the
+<tt class="docutils literal"><span 
class="pre">kite.flushable.commitOnBatch</span></tt> property is set to <tt 
class="docutils literal"><span class="pre">false</span></tt>,
+this property must also be set to <tt class="docutils literal"><span 
class="pre">false</span></tt>.</td>
+</tr>
+<tr class="row-even"><td>kite.entityParser</td>
+<td>avro</td>
+<td>Parser that turns Flume <tt class="docutils literal"><span 
class="pre">Events</span></tt> into Kite entities.
+Valid values are <tt class="docutils literal"><span 
class="pre">avro</span></tt> and the fully-qualified class name
+of an implementation of the <tt class="docutils literal"><span 
class="pre">EntityParser.Builder</span></tt> interface.</td>
+</tr>
+<tr class="row-odd"><td>kite.failurePolicy</td>
+<td>retry</td>
+<td>Policy that handles non-recoverable errors such as a missing
+<tt class="docutils literal"><span class="pre">Schema</span></tt> in the <tt 
class="docutils literal"><span class="pre">Event</span></tt> header. The 
default value, <tt class="docutils literal"><span class="pre">retry</span></tt>,
+will fail the current batch and try again which matches the old
+behavior. Other valid values are <tt class="docutils literal"><span 
class="pre">save</span></tt>, which will write the
+raw <tt class="docutils literal"><span class="pre">Event</span></tt> to the 
<tt class="docutils literal"><span 
class="pre">kite.error.dataset.uri</span></tt> dataset, and the
+fully-qualified class name of an implementation of the
+<tt class="docutils literal"><span 
class="pre">FailurePolicy.Builder</span></tt> interface.</td>
+</tr>
+<tr class="row-even"><td>kite.error.dataset.uri</td>
+<td>&#8211;</td>
+<td>URI of the dataset where failed events are saved when
+<tt class="docutils literal"><span class="pre">kite.failurePolicy</span></tt> 
is set to <tt class="docutils literal"><span class="pre">save</span></tt>. 
<strong>Required</strong> when
+the <tt class="docutils literal"><span 
class="pre">kite.failurePolicy</span></tt> is set to <tt class="docutils 
literal"><span class="pre">save</span></tt>.</td>
+</tr>
+<tr class="row-odd"><td>auth.kerberosPrincipal</td>
 <td>&#8211;</td>
 <td>Kerberos user principal for secure authentication to HDFS</td>
 </tr>
-<tr class="row-odd"><td>auth.kerberosKeytab</td>
+<tr class="row-even"><td>auth.kerberosKeytab</td>
 <td>&#8211;</td>
 <td>Kerberos keytab location (local FS) for the principal</td>
 </tr>
-<tr class="row-even"><td>auth.proxyUser</td>
+<tr class="row-odd"><td>auth.proxyUser</td>
 <td>&#8211;</td>
 <td>The effective user for HDFS actions, if different from
 the kerberos principal</td>
@@ -3215,6 +3797,84 @@ the kerberos principal</td>
 </tbody>
 </table>
 </div>
+<div class="section" id="kafka-sink">
+<h4>Kafka Sink<a class="headerlink" href="#kafka-sink" title="Permalink to 
this headline">Â¶</a></h4>
+<p>This is a Flume Sink implementation that can publish data to a
+<a class="reference external" href="http://kafka.apache.org/";>Kafka</a> topic. 
One of the objective is to integrate Flume
+with Kafka so that pull based processing systems can process the data coming
+through various Flume sources. This currently supports Kafka 0.8.x series of 
releases.</p>
+<p>Required properties are marked in bold font.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="20%" />
+<col width="12%" />
+<col width="68%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Property Name</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td><strong>type</strong></td>
+<td>&#8211;</td>
+<td>Must be set to <tt class="docutils literal"><span 
class="pre">org.apache.flume.sink.kafka.KafkaSink</span></tt></td>
+</tr>
+<tr class="row-odd"><td><strong>brokerList</strong></td>
+<td>&#8211;</td>
+<td>List of brokers Kafka-Sink will connect to, to get the list of topic 
partitions
+This can be a partial list of brokers, but we recommend at least two for HA.
+The format is comma separated list of hostname:port</td>
+</tr>
+<tr class="row-even"><td>topic</td>
+<td>default-flume-topic</td>
+<td>The topic in Kafka to which the messages will be published. If this 
parameter is configured,
+messages will be published to this topic.
+If the event header contains a &#8220;topic&#8221; field, the event will be 
published to that topic
+overriding the topic configured here.</td>
+</tr>
+<tr class="row-odd"><td>batchSize</td>
+<td>100</td>
+<td>How many messages to process in one batch. Larger batches improve 
throughput while adding latency.</td>
+</tr>
+<tr class="row-even"><td>requiredAcks</td>
+<td>1</td>
+<td>How many replicas must acknowledge a message before its considered 
successfully written.
+Accepted values are 0 (Never wait for acknowledgement), 1 (wait for leader 
only), -1 (wait for all replicas)
+Set this to -1 to avoid data loss in some cases of leader failure.</td>
+</tr>
+<tr class="row-odd"><td>Other Kafka Producer Properties</td>
+<td>&#8211;</td>
+<td>These properties are used to configure the Kafka Producer. Any producer 
property supported
+by Kafka can be used. The only requirement is to prepend the property name 
with the prefix <tt class="docutils literal"><span 
class="pre">kafka.</span></tt>.
+For example: kafka.producer.type</td>
+</tr>
+</tbody>
+</table>
+<div class="admonition note">
+<p class="first admonition-title">Note</p>
+<p class="last">Kafka Sink uses the <tt class="docutils literal"><span 
class="pre">topic</span></tt> and <tt class="docutils literal"><span 
class="pre">key</span></tt> properties from the FlumeEvent headers to send 
events to Kafka.
+If <tt class="docutils literal"><span class="pre">topic</span></tt> exists in 
the headers, the event will be sent to that specific topic, overriding the 
topic configured for the Sink.
+If <tt class="docutils literal"><span class="pre">key</span></tt> exists in 
the headers, the key will used by Kafka to partition the data between the topic 
partitions. Events with same key
+will be sent to the same partition. If the key is null, events will be sent to 
random partitions.</p>
+</div>
+<p>An example configuration of a Kafka sink is given below. Properties starting
+with the prefix <tt class="docutils literal"><span 
class="pre">kafka</span></tt> (the last 3 properties) are used when 
instantiating
+the Kafka producer. The properties that are passed when creating the Kafka
+producer are not limited to the properties given in this example.
+Also it&#8217;s possible include your custom properties here and access them 
inside
+the preprocessor through the Flume Context object passed in as a method
+argument.</p>
+<div class="highlight-properties"><div class="highlight"><pre><span 
class="na">a1.sinks.k1.type</span> <span class="o">=</span> <span 
class="s">org.apache.flume.sink.kafka.KafkaSink</span>
+<span class="na">a1.sinks.k1.topic</span> <span class="o">=</span> <span 
class="s">mytopic</span>
+<span class="na">a1.sinks.k1.brokerList</span> <span class="o">=</span> <span 
class="s">localhost:9092</span>
+<span class="na">a1.sinks.k1.requiredAcks</span> <span class="o">=</span> 
<span class="s">1</span>
+<span class="na">a1.sinks.k1.batchSize</span> <span class="o">=</span> <span 
class="s">20</span>
+<span class="na">a1.sinks.k1.channel</span> <span class="o">=</span> <span 
class="s">c1</span>
+</pre></div>
+</div>
+</div>
 <div class="section" id="custom-sink">
 <h4>Custom Sink<a class="headerlink" href="#custom-sink" title="Permalink to 
this headline">Â¶</a></h4>
 <p>A custom sink is your own implementation of the Sink interface. A custom
@@ -3412,14 +4072,100 @@ READ_COMMITTED, SERIALIZABLE, REPEATABLE
 </pre></div>
 </div>
 </div>
+<div class="section" id="kafka-channel">
+<h4>Kafka Channel<a class="headerlink" href="#kafka-channel" title="Permalink 
to this headline">Â¶</a></h4>
+<p>The events are stored in a Kafka cluster (must be installed separately). 
Kafka provides high availability and
+replication, so in case an agent or a kafka broker crashes, the events are 
immediately available to other sinks</p>
+<p>The Kafka channel can be used for multiple scenarios:
+* With Flume source and sink - it provides a reliable and highly available 
channel for events
+* With Flume source and interceptor but no sink - it allows writing Flume 
events into a Kafka topic, for use by other apps
+* With Flume sink, but no source - it is a low-latency, fault tolerant way to 
send events from Kafka to Flume sources such as HDFS, HBase or Solr</p>
+<p>Required properties are in <strong>bold</strong>.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="14%" />
+<col width="16%" />
+<col width="70%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Property Name</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td><strong>type</strong></td>
+<td>&#8211;</td>
+<td>The component type name, needs to be <tt class="docutils literal"><span 
class="pre">org.apache.flume.channel.kafka.KafkaChannel</span></tt></td>
+</tr>
+<tr class="row-odd"><td><strong>brokerList</strong></td>
+<td>&#8211;</td>
+<td>List of brokers in the Kafka cluster used by the channel
+This can be a partial list of brokers, but we recommend at least two for HA.
+The format is comma separated list of hostname:port</td>
+</tr>
+<tr class="row-even"><td><strong>zookeeperConnect</strong></td>
+<td>&#8211;</td>
+<td>URI of ZooKeeper used by Kafka cluster
+The format is comma separated list of hostname:port. If chroot is used, it is 
added once at the end.
+For example: zookeeper-1:2181,zookeeper-2:2182,zookeeper-3:2181/kafka</td>
+</tr>
+<tr class="row-odd"><td>topic</td>
+<td>flume-channel</td>
+<td>Kafka topic which the channel will use</td>
+</tr>
+<tr class="row-even"><td>groupId</td>
+<td>flume</td>
+<td>Consumer group ID the channel uses to register with Kafka.
+Multiple channels must use the same topic and group to ensure that when one 
agent fails another can get the data
+Note that having non-channel consumers with the same ID can lead to data 
loss.</td>
+</tr>
+<tr class="row-odd"><td>parseAsFlumeEvent</td>
+<td>true</td>
+<td>Expecting Avro datums with FlumeEvent schema in the channel.
+This should be true if Flume source is writing to the channel
+And false if other producers are writing into the topic that the channel is 
using
+Flume source messages to Kafka can be parsed outside of Flume by using
+org.apache.flume.source.avro.AvroFlumeEvent provided by the flume-ng-sdk 
artifact</td>
+</tr>
+<tr class="row-even"><td>readSmallestOffset</td>
+<td>false</td>
+<td>When set to true, the channel will read all data in the topic, starting 
from the oldest event
+when false, it will read only events written after the channel started
+When &#8220;parseAsFlumeEvent&#8221; is true, this will be false. Flume source 
will start prior to the sinks and this
+guarantees that events sent by source before sinks start will not be lost.</td>
+</tr>
+<tr class="row-odd"><td>Other Kafka Properties</td>
+<td>&#8211;</td>
+<td>These properties are used to configure the Kafka Producer and Consumer 
used by the channel.
+Any property supported by Kafka can be used.
+The only requirement is to prepend the property name with the prefix <tt 
class="docutils literal"><span class="pre">kafka.</span></tt>.
+For example: kafka.producer.type</td>
+</tr>
+</tbody>
+</table>
+<div class="admonition note">
+<p class="first admonition-title">Note</p>
+<p class="last">Due to the way the channel is load balanced, there may be 
duplicate events when the agent first starts up</p>
+</div>
+<p>Example for agent named a1:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span 
class="na">a1.channels.channel1.type</span>   <span class="o">=</span> <span 
class="s">org.apache.flume.channel.kafka.KafkaChannel</span>
+<span class="na">a1.channels.channel1.capacity</span> <span class="o">=</span> 
<span class="s">10000</span>
+<span class="na">a1.channels.channel1.transactionCapacity</span> <span 
class="o">=</span> <span class="s">1000</span>
+<span class="na">a1.channels.channel1.brokerList</span><span 
class="o">=</span><span class="s">kafka-2:9092,kafka-3:9092</span>
+<span class="na">a1.channels.channel1.topic</span><span 
class="o">=</span><span class="s">channel1</span>
+<span class="na">a1.channels.channel1.zookeeperConnect</span><span 
class="o">=</span><span class="s">kafka-1:2181</span>
+</pre></div>
+</div>
+</div>
 <div class="section" id="file-channel">
 <h4>File Channel<a class="headerlink" href="#file-channel" title="Permalink to 
this headline">Â¶</a></h4>
 <p>Required properties are in <strong>bold</strong>.</p>
 <table border="1" class="docutils">
 <colgroup>
-<col width="21%" />
-<col width="14%" />
-<col width="65%" />
+<col width="20%" />
+<col width="13%" />
+<col width="67%" />
 </colgroup>
 <thead valign="bottom">
 <tr class="row-odd"><th class="head">Property Name         Default</th>
@@ -3480,31 +4226,35 @@ READ_COMMITTED, SERIALIZABLE, REPEATABLE
 <td>false</td>
 <td>Expert: Replay without using queue</td>
 </tr>
-<tr class="row-odd"><td>encryption.activeKey</td>
+<tr class="row-odd"><td>checkpointOnClose</td>
+<td>true</td>
+<td>Controls if a checkpoint is created when the channel is closed. Creating a 
checkpoint on close speeds up subsequent startup of the file channel by 
avoiding replay.</td>
+</tr>
+<tr class="row-even"><td>encryption.activeKey</td>
 <td>&#8211;</td>
 <td>Key name used to encrypt new data</td>
 </tr>
-<tr class="row-even"><td>encryption.cipherProvider</td>
+<tr class="row-odd"><td>encryption.cipherProvider</td>
 <td>&#8211;</td>
 <td>Cipher provider type, supported types: AESCTRNOPADDING</td>
 </tr>
-<tr class="row-odd"><td>encryption.keyProvider</td>
+<tr class="row-even"><td>encryption.keyProvider</td>
 <td>&#8211;</td>
 <td>Key provider type, supported types: JCEKSFILE</td>
 </tr>
-<tr class="row-even"><td>encryption.keyProvider.keyStoreFile</td>
+<tr class="row-odd"><td>encryption.keyProvider.keyStoreFile</td>
 <td>&#8211;</td>
 <td>Path to the keystore file</td>
 </tr>
-<tr class="row-odd"><td>encrpytion.keyProvider.keyStorePasswordFile</td>
+<tr class="row-even"><td>encrpytion.keyProvider.keyStorePasswordFile</td>
 <td>&#8211;</td>
 <td>Path to the keystore password file</td>
 </tr>
-<tr class="row-even"><td>encryption.keyProvider.keys</td>
+<tr class="row-odd"><td>encryption.keyProvider.keys</td>
 <td>&#8211;</td>
 <td>List of all keys (e.g. history of the activeKey setting)</td>
 </tr>
-<tr class="row-odd"><td>encyption.keyProvider.keys.*.passwordFile</td>
+<tr class="row-even"><td>encyption.keyProvider.keys.*.passwordFile</td>
 <td>&#8211;</td>
 <td>Path to the optional key password file</td>
 </tr>
@@ -3910,7 +4660,12 @@ that so long as one is available events
 <p>The failover mechanism works by relegating failed sinks to a pool where
 they are assigned a cool down period, increasing with sequential failures
 before they are retried. Once a sink successfully sends an event, it is
-restored to the live pool.</p>
+restored to the live pool. The Sinks have a priority associated with them,
+larger the number, higher the priority. If a Sink fails while sending a Event
+the next Sink with highest priority shall be tried next for sending Events.
+For example, a sink with priority 100 is activated before the Sink with 
priority
+80. If no priority is specified, thr priority is determined based on the order 
in which
+the Sinks are specified in configuration.</p>
 <p>To configure, set a sink groups processor to <tt class="docutils 
literal"><span class="pre">failover</span></tt> and set
 priorities for all individual sinks. All specified priorities must
 be unique. Furthermore, upper limit to failover time can be set
@@ -3918,9 +4673,9 @@ be unique. Furthermore, upper limit to f
 <p>Required properties are in <strong>bold</strong>.</p>
 <table border="1" class="docutils">
 <colgroup>
-<col width="26%" />
-<col width="9%" />
-<col width="66%" />
+<col width="23%" />
+<col width="8%" />
+<col width="70%" />
 </colgroup>
 <thead valign="bottom">
 <tr class="row-odd"><th class="head">Property Name</th>
@@ -3939,11 +4694,12 @@ be unique. Furthermore, upper limit to f
 </tr>
 <tr 
class="row-even"><td><strong>processor.priority.&lt;sinkName&gt;</strong></td>
 <td>&#8211;</td>
-<td>&lt;sinkName&gt; must be one of the sink instances associated with the 
current sink group</td>
+<td>Priority value.  &lt;sinkName&gt; must be one of the sink instances 
associated with the current sink group
+A higher priority value Sink gets activated earlier. A larger absolute value 
indicates higher priority</td>
 </tr>
 <tr class="row-odd"><td>processor.maxpenalty</td>
 <td>30000</td>
-<td>(in millis)</td>
+<td>The maximum backoff period for the failed Sink (in millis)</td>
 </tr>
 </tbody>
 </table>
@@ -4347,6 +5103,61 @@ MorphlineInterceptor can also help to im
 </pre></div>
 </div>
 </div>
+<div class="section" id="search-and-replace-interceptor">
+<h4>Search and Replace Interceptor<a class="headerlink" 
href="#search-and-replace-interceptor" title="Permalink to this 
headline">Â¶</a></h4>
+<p>This interceptor provides simple string-based search-and-replace 
functionality
+based on Java regular expressions. Backtracking / group capture is also 
available.
+This interceptor uses the same rules as in the Java Matcher.replaceAll() 
method.</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="17%" />
+<col width="7%" />
+<col width="76%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Property Name</th>
+<th class="head">Default</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td><strong>type</strong></td>
+<td>&#8211;</td>
+<td>The component type name has to be <tt class="docutils literal"><span 
class="pre">search_replace</span></tt></td>
+</tr>
+<tr class="row-odd"><td>searchPattern</td>
+<td>&#8211;</td>
+<td>The pattern to search for and replace.</td>
+</tr>
+<tr class="row-even"><td>replaceString</td>
+<td>&#8211;</td>
+<td>The replacement string.</td>
+</tr>
+<tr class="row-odd"><td>charset</td>
+<td>UTF-8</td>
+<td>The charset of the event body. Assumed by default to be UTF-8.</td>
+</tr>
+</tbody>
+</table>
+<p>Example configuration:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span 
class="na">a1.sources.avroSrc.interceptors</span> <span class="o">=</span> 
<span class="s">search-replace</span>
+<span class="na">a1.sources.avroSrc.interceptors.search-replace.type</span> 
<span class="o">=</span> <span class="s">search_replace</span>
+
+<span class="c"># Remove leading alphanumeric characters in an event 
body.</span>
+<span 
class="na">a1.sources.avroSrc.interceptors.search-replace.searchPattern</span> 
<span class="o">=</span> <span class="s">^[A-Za-z0-9_]+</span>
+<span 
class="na">a1.sources.avroSrc.interceptors.search-replace.replaceString</span> 
<span class="o">=</span>
+</pre></div>
+</div>
+<p>Another example:</p>
+<div class="highlight-properties"><div class="highlight"><pre><span 
class="na">a1.sources.avroSrc.interceptors</span> <span class="o">=</span> 
<span class="s">search-replace</span>
+<span class="na">a1.sources.avroSrc.interceptors.search-replace.type</span> 
<span class="o">=</span> <span class="s">search_replace</span>
+
+<span class="c"># Use grouping operators to reorder and munge words on a 
line.</span>
+<span 
class="na">a1.sources.avroSrc.interceptors.search-replace.searchPattern</span> 
<span class="o">=</span> <span class="s">The quick brown ([a-z]+) jumped over 
the lazy ([a-z]+)</span>
+<span 
class="na">a1.sources.avroSrc.interceptors.search-replace.replaceString</span> 
<span class="o">=</span> <span class="s">The hungry $2 ate the careless 
$1</span>
+</pre></div>
+</div>
+</div>
 <div class="section" id="regex-filtering-interceptor">
 <h4>Regex Filtering Interceptor<a class="headerlink" 
href="#regex-filtering-interceptor" title="Permalink to this 
headline">Â¶</a></h4>
 <p>This interceptor filters events selectively by interpreting the event body 
as text and matching the text against a configured regular expression.
@@ -4509,7 +5320,7 @@ polling rather than terminating.</p>
 <h2>Log4J Appender<a class="headerlink" href="#log4j-appender" 
title="Permalink to this headline">Â¶</a></h2>
 <p>Appends Log4j events to a flume agent&#8217;s avro source. A client using 
this
 appender must have the flume-ng-sdk in the classpath (eg,
-flume-ng-sdk-1.5.2.jar).
+flume-ng-sdk-1.6.0.jar).
 Required properties are in <strong>bold</strong>.</p>
 <table border="1" class="docutils">
 <colgroup>
@@ -4589,7 +5400,7 @@ then the schema will be included as a Fl
 <h2>Load Balancing Log4J Appender<a class="headerlink" 
href="#load-balancing-log4j-appender" title="Permalink to this 
headline">Â¶</a></h2>
 <p>Appends Log4j events to a list of flume agent&#8217;s avro source. A client 
using this
 appender must have the flume-ng-sdk in the classpath (eg,
-flume-ng-sdk-1.5.2.jar). This appender supports a round-robin and random
+flume-ng-sdk-1.6.0.jar). This appender supports a round-robin and random
 scheme for performing the load balancing. It also supports a configurable 
backoff
 timeout so that down agents are removed temporarily from the set of hosts
 Required properties are in <strong>bold</strong>.</p>
@@ -4674,15 +5485,27 @@ send the events.</td>
 </div>
 <div class="section" id="security">
 <h2>Security<a class="headerlink" href="#security" title="Permalink to this 
headline">Â¶</a></h2>
-<p>The HDFS sink supports Kerberos authentication if the underlying HDFS is
-running in secure mode. Please refer to the HDFS Sink section for
-configuring the HDFS sink Kerberos-related options.</p>
+<p>The HDFS sink, HBase sink, Thrift source, Thrift sink and Kite Dataset sink 
all support
+Kerberos authentication. Please refer to the corresponding sections for
+configuring the Kerberos-related options.</p>
+<p>Flume agent will authenticate to the kerberos KDC as a single principal, 
which will be
+used by different components that require kerberos authentication. The 
principal and
+keytab configured for Thrift source, Thrift sink, HDFS sink, HBase sink and 
DataSet sink
+should be the same, otherwise the component will fail to start.</p>
 </div>
 <div class="section" id="monitoring">
 <h2>Monitoring<a class="headerlink" href="#monitoring" title="Permalink to 
this headline">Â¶</a></h2>
 <p>Monitoring in Flume is still a work in progress. Changes can happen very 
often.
 Several Flume components report metrics to the JMX platform MBean server. These
 metrics can be queried using Jconsole.</p>
+<div class="section" id="jmx-reporting">
+<h3>JMX Reporting<a class="headerlink" href="#jmx-reporting" title="Permalink 
to this headline">Â¶</a></h3>
+<p>JMX Reporting can be enabled by specifying JMX parameters in the JAVA_OPTS 
environment variable using
+flume-env.sh, like</p>
+<blockquote>
+<div>export JAVA_OPTS=&#8221;-Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.port=5445 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false&#8221;</div></blockquote>
+<p>NOTE: The sample above disables the security. To enable Security, please 
refer <a class="reference external" 
href="http://docs.oracle.com/javase/6/docs/technotes/guides/management/agent.html";>http://docs.oracle.com/javase/6/docs/technotes/guides/management/agent.html</a></p>
+</div>
 <div class="section" id="ganglia-reporting">
 <h3>Ganglia Reporting<a class="headerlink" href="#ganglia-reporting" 
title="Permalink to this headline">Â¶</a></h3>
 <p>Flume can also report these metrics to
@@ -4881,6 +5704,39 @@ metrics as long values.</p>
 </div>
 </div>
 </div>
+<div class="section" id="tools">
+<h2>Tools<a class="headerlink" href="#tools" title="Permalink to this 
headline">Â¶</a></h2>
+<div class="section" id="file-channel-integrity-tool">
+<h3>File Channel Integrity Tool<a class="headerlink" 
href="#file-channel-integrity-tool" title="Permalink to this 
headline">Â¶</a></h3>
+<p>File Channel Integrity tool verifies the integrity of individual Events in 
the File channel
+and removes corrupted Events.</p>
+<p>The tools can be run as follows:</p>
+<div class="highlight-none"><div class="highlight"><pre>$bin/flume-ng tool 
--conf ./conf FCINTEGRITYTOOL -l ./datadir
+</pre></div>
+</div>
+<p>where datadir the comma separated list of data directory to ve verified.</p>
+<p>Following are the options available</p>
+<table border="1" class="docutils">
+<colgroup>
+<col width="25%" />
+<col width="75%" />
+</colgroup>
+<thead valign="bottom">
+<tr class="row-odd"><th class="head">Option Name</th>
+<th class="head">Description</th>
+</tr>
+</thead>
+<tbody valign="top">
+<tr class="row-even"><td>h/help</td>
+<td>Displays help</td>
+</tr>
+<tr class="row-odd"><td><strong>l/dataDirs</strong></td>
+<td>Comma-separated list of data directories which the tool must verify</td>
+</tr>
+</tbody>
+</table>
+</div>
+</div>
 <div class="section" id="topology-design-considerations">
 <h2>Topology Design Considerations<a class="headerlink" 
href="#topology-design-considerations" title="Permalink to this 
headline">Â¶</a></h2>
 <p>Flume is very flexible and allows a large range of possible deployment
@@ -5303,7 +6159,7 @@ can be leveraged to move the Flume agent
 
 <h3><a href="index.html">This Page</a></h3>
 <ul>
-<li><a class="reference internal" href="#">Flume 1.5.2 User Guide</a><ul>
+<li><a class="reference internal" href="#">Flume 1.6.0 User Guide</a><ul>
 <li><a class="reference internal" href="#introduction">Introduction</a><ul>
 <li><a class="reference internal" href="#overview">Overview</a></li>
 <li><a class="reference internal" href="#system-requirements">System 
Requirements</a></li>
@@ -5322,6 +6178,7 @@ can be leveraged to move the Flume agent
 <li><a class="reference internal" href="#wiring-the-pieces-together">Wiring 
the pieces together</a></li>
 <li><a class="reference internal" href="#starting-an-agent">Starting an 
agent</a></li>
 <li><a class="reference internal" href="#a-simple-example">A simple 
example</a></li>
+<li><a class="reference internal" 
href="#zookeeper-based-configuration">Zookeeper based Configuration</a></li>
 <li><a class="reference internal" 
href="#installing-third-party-plugins">Installing third-party plugins</a><ul>
 <li><a class="reference internal" href="#the-plugins-d-directory">The 
plugins.d directory</a></li>
 <li><a class="reference internal" 
href="#directory-layout-for-plugins">Directory layout for plugins</a></li>
@@ -5354,8 +6211,7 @@ can be leveraged to move the Flume agent
 <li><a class="reference internal" href="#converter">Converter</a></li>
 </ul>
 </li>
-<li><a class="reference internal" href="#spooling-directory-source">Spooling 
Directory Source</a></li>
-<li><a class="reference internal" 
href="#twitter-1-firehose-source-experimental">Twitter 1% firehose Source 
(experimental)</a><ul>
+<li><a class="reference internal" href="#spooling-directory-source">Spooling 
Directory Source</a><ul>
 <li><a class="reference internal" href="#event-deserializers">Event 
Deserializers</a><ul>
 <li><a class="reference internal" href="#line">LINE</a></li>
 <li><a class="reference internal" href="#avro">AVRO</a></li>
@@ -5364,6 +6220,8 @@ can be leveraged to move the Flume agent
 </li>
 </ul>
 </li>
+<li><a class="reference internal" 
href="#twitter-1-firehose-source-experimental">Twitter 1% firehose Source 
(experimental)</a></li>
+<li><a class="reference internal" href="#kafka-source">Kafka Source</a></li>
 <li><a class="reference internal" href="#netcat-source">NetCat Source</a></li>
 <li><a class="reference internal" href="#sequence-generator-source">Sequence 
Generator Source</a></li>
 <li><a class="reference internal" href="#syslog-sources">Syslog Sources</a><ul>
@@ -5377,6 +6235,7 @@ can be leveraged to move the Flume agent
 <li><a class="reference internal" href="#blobhandler">BlobHandler</a></li>
 </ul>
 </li>
+<li><a class="reference internal" href="#stress-source">Stress Source</a></li>
 <li><a class="reference internal" href="#legacy-sources">Legacy Sources</a><ul>
 <li><a class="reference internal" href="#avro-legacy-source">Avro Legacy 
Source</a></li>
 <li><a class="reference internal" href="#thrift-legacy-source">Thrift Legacy 
Source</a></li>
@@ -5388,6 +6247,7 @@ can be leveraged to move the Flume agent
 </li>
 <li><a class="reference internal" href="#flume-sinks">Flume Sinks</a><ul>
 <li><a class="reference internal" href="#hdfs-sink">HDFS Sink</a></li>
+<li><a class="reference internal" href="#hive-sink">Hive Sink</a></li>
 <li><a class="reference internal" href="#logger-sink">Logger Sink</a></li>
 <li><a class="reference internal" href="#avro-sink">Avro Sink</a></li>
 <li><a class="reference internal" href="#thrift-sink">Thrift Sink</a></li>
@@ -5401,13 +6261,15 @@ can be leveraged to move the Flume agent
 </li>
 <li><a class="reference internal" 
href="#morphlinesolrsink">MorphlineSolrSink</a></li>
 <li><a class="reference internal" 
href="#elasticsearchsink">ElasticSearchSink</a></li>
-<li><a class="reference internal" href="#kite-dataset-sink-experimental">Kite 
Dataset Sink (experimental)</a></li>
+<li><a class="reference internal" href="#kite-dataset-sink">Kite Dataset 
Sink</a></li>
+<li><a class="reference internal" href="#kafka-sink">Kafka Sink</a></li>
 <li><a class="reference internal" href="#custom-sink">Custom Sink</a></li>
 </ul>
 </li>
 <li><a class="reference internal" href="#flume-channels">Flume Channels</a><ul>
 <li><a class="reference internal" href="#memory-channel">Memory 
Channel</a></li>
 <li><a class="reference internal" href="#jdbc-channel">JDBC Channel</a></li>
+<li><a class="reference internal" href="#kafka-channel">Kafka Channel</a></li>
 <li><a class="reference internal" href="#file-channel">File Channel</a></li>


[... 29 lines stripped ...]

svn commit: r953472 [2/5] - in /websites/staging/flume/trunk/content: ./ .doctrees/ .doctrees/releases/ _sources/ _sources/releases/ releases/

Reply via email to