Updated Branches: refs/heads/flume-1.4 ff6c64e65 -> 7d6947d73
FLUME-1910. Add thrift RPC documentation. (Hari Shreedharan via Mike Percy) Project: http://git-wip-us.apache.org/repos/asf/flume/repo Commit: http://git-wip-us.apache.org/repos/asf/flume/commit/7d6947d7 Tree: http://git-wip-us.apache.org/repos/asf/flume/tree/7d6947d7 Diff: http://git-wip-us.apache.org/repos/asf/flume/diff/7d6947d7 Branch: refs/heads/flume-1.4 Commit: 7d6947d73f05565a0d6c10d461d9b60e1b1482af Parents: ff6c64e Author: Mike Percy <[email protected]> Authored: Wed Apr 3 12:57:39 2013 -0700 Committer: Mike Percy <[email protected]> Committed: Wed Apr 3 12:57:39 2013 -0700 ---------------------------------------------------------------------- flume-ng-doc/sphinx/FlumeDeveloperGuide.rst | 46 +++++++----- flume-ng-doc/sphinx/FlumeUserGuide.rst | 86 ++++++++++++++++++++-- 2 files changed, 107 insertions(+), 25 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/flume/blob/7d6947d7/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst ---------------------------------------------------------------------- diff --git a/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst b/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst index 02b0088..71afa4e 100644 --- a/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst +++ b/flume-ng-doc/sphinx/FlumeDeveloperGuide.rst @@ -160,15 +160,15 @@ by using a convenience implementation such as the SimpleEvent class, or by using ``EventBuilder``\ 's overloaded ``withBody()`` static helper methods. -Avro RPC default client -''''''''''''''''''''''' +RPC clients - Avro and Thrift +''''''''''''''''''''''''''''' -As of Flume 1.1.0, Avro is the only supported RPC protocol. The -``NettyAvroRpcClient`` implements the ``RpcClient`` interface. The client needs -to create this object with the host and port of the target Flume agent, and can -then use the ``RpcClient`` to send data into the agent. The following example -shows how to use the Flume Client SDK API within a user's data-generating -application: +As of Flume 1.4.0, Avro is the default RPC protocol. The +``NettyAvroRpcClient`` and ``ThriftRpcClient`` implement the ``RpcClient`` +interface. The client needs to create this object with the host and port of +the target Flume agent, and canthen use the ``RpcClient`` to send data into +the agent. The following example shows how to use the Flume Client SDK API +within a user's data-generating application: .. code-block:: java @@ -206,6 +206,8 @@ application: this.hostname = hostname; this.port = port; this.client = RpcClientFactory.getDefaultInstance(hostname, port); + // Use the following method to create a thrift client (instead of the above line): + // this.client = RpcClientFactory.getThriftInstance(hostname, port); } public void sendDataToFlume(String data) { @@ -220,6 +222,8 @@ application: client.close(); client = null; client = RpcClientFactory.getDefaultInstance(hostname, port); + // Use the following method to create a thrift client (instead of the above line): + // this.client = RpcClientFactory.getThriftInstance(hostname, port); } } @@ -230,7 +234,8 @@ application: } -The remote Flume agent needs to have an ``AvroSource`` listening on some port. +The remote Flume agent needs to have an ``AvroSource`` (or a +``ThriftSource`` if you are using a Thrift client) listening on some port. Below is an example Flume agent configuration that's waiting for a connection from MyApp: @@ -244,18 +249,21 @@ from MyApp: a1.sources.r1.channels = c1 a1.sources.r1.type = avro + # For using a thrift source set the following instead of the above line. + # a1.source.r1.type = thrift a1.sources.r1.bind = 0.0.0.0 a1.sources.r1.port = 41414 a1.sinks.k1.channel = c1 a1.sinks.k1.type = logger -For more flexibility, the default Flume client implementation -(``NettyAvroRpcClient``) can be configured with these properties: +For more flexibility, the default Flume client implementations +(``NettyAvroRpcClient`` and ``ThriftRpcClient``) can be configured with these +properties: .. code-block:: properties - client.type = default + client.type = default (for avro) or thrift (for thrift) hosts = h1 # default client accepts only 1 host # (additional hosts will be ignored) @@ -274,7 +282,8 @@ Failover Client This class wraps the default Avro RPC client to provide failover handling capability to clients. This takes a whitespace-separated list of <host>:<port> -representing the Flume agents that make-up a failover group. If thereâs a +representing the Flume agents that make-up a failover group. The Failover RPC +Client currently does not support thrift. If thereâs a communication error with the currently selected host (i.e. agent) agent, then the failover client automatically fails-over to the next host in the list. For example: @@ -306,7 +315,7 @@ For more flexibility, the failover Flume client implementation client.type = default_failover - hosts = h1 h2 h3 # at least one is required, but 2 or + hosts = h1 h2 h3 # at least one is required, but 2 or # more makes better sense hosts.h1 = host1.example.org:41414 @@ -324,7 +333,7 @@ For more flexibility, the failover Flume client implementation # once to send the Event, and if it # fails then there will be no failover # to a second client, so this value - # causes the failover client to + # causes the failover client to # degenerate into just a default client. # It makes sense to set this value to at # least the number of hosts that you @@ -339,7 +348,7 @@ For more flexibility, the failover Flume client implementation LoadBalancing RPC client '''''''''''''''''''''''' -The Flume Client SDK also supports an RpcClient which load-balances among +The Flume Client SDK also supports an RpcClient which load-balances among multiple hosts. This type of client takes a whitespace-separated list of <host>:<port> representing the Flume agents that make-up a load-balancing group. This client can be configured with a load balancing strategy that either @@ -347,7 +356,8 @@ randomly selects one of the configured hosts, or selects a host in a round-robin fashion. You can also specify your own custom class that implements the ``LoadBalancingRpcClient$HostSelector`` interface so that a custom selection order is used. In that case, the FQCN of the custom class needs to be specified -as the value of the ``host-selector`` property. +as the value of the ``host-selector`` property. The LoadBalancing RPC Client +currently does not support thrift. If ``backoff`` is enabled then the client will temporarily blacklist hosts that fail, causing them to be excluded from being selected as a failover @@ -578,7 +588,7 @@ processing its own configuration settings. For example: // Process the myProp value (e.g. validation) - // Store myProp for later retrieval by process() method + // Store myProp for later retrieval by process() method this.myProp = myProp; } http://git-wip-us.apache.org/repos/asf/flume/blob/7d6947d7/flume-ng-doc/sphinx/FlumeUserGuide.rst ---------------------------------------------------------------------- diff --git a/flume-ng-doc/sphinx/FlumeUserGuide.rst b/flume-ng-doc/sphinx/FlumeUserGuide.rst index 600a360..060c0ac 100644 --- a/flume-ng-doc/sphinx/FlumeUserGuide.rst +++ b/flume-ng-doc/sphinx/FlumeUserGuide.rst @@ -58,7 +58,10 @@ A Flume source consumes events delivered to it by an external source like a web server. The external source sends events to Flume in a format that is recognized by the target Flume source. For example, an Avro Flume source can be used to receive Avro events from Avro clients or other Flume agents in the flow -that send events from an Avro sink. When a Flume source receives an event, it +that send events from an Avro sink. A similar flow can be defined using +a Thrift Flume Source to receive events from a Thrift Sink or a Flume +Thrift Rpc Client or Thrift clients written in any language generated from +the Flume thrift protocol.When a Flume source receives an event, it stores it into one or more channels. The channel is a passive store that keeps the event until it's consumed by a Flume sink. The file channel is one example -- it is backed by the local filesystem. The sink removes the event @@ -292,6 +295,7 @@ Flume supports the following mechanisms to read data from popular log stream types, such as: #. Avro +#. Thrift #. Syslog #. Netcat @@ -319,9 +323,10 @@ dozen of agents that write to HDFS cluster. :alt: A fan-in flow using Avro RPC to consolidate events in one place This can be achieved in Flume by configuring a number of first tier agents with -an avro sink, all pointing to an avro source of single agent. This source on -the second tier agent consolidates the received events into a single channel -which is consumed by a sink to its final destination. +an avro sink, all pointing to an avro source of single agent (Again you could +use the thrift sources/sinks/clients in such a scenario). This source +on the second tier agent consolidates the received events into a single +channel which is consumed by a sink to its final destination. Multiplexing the flow --------------------- @@ -481,9 +486,9 @@ config to do that: Configuring a multi agent flow ------------------------------ -To setup a multi-tier flow, you need to have an avro sink of first hop pointing -to avro source of the next hop. This will result in the first Flume agent -forwarding events to the next Flume agent. For example, if you are +To setup a multi-tier flow, you need to have an avro/thrift sink of first hop +pointing to avro/thrift source of the next hop. This will result in the first +Flume agent forwarding events to the next Flume agent. For example, if you are periodically sending files (1 file per event) using avro client to a local Flume agent, then this local agent can forward it to another agent that has the mounted for storage. @@ -693,6 +698,39 @@ Example for agent named a1: a1.sources.r1.bind = 0.0.0.0 a1.sources.r1.port = 4141 +Thrift Source +~~~~~~~~~~~~~ + +Listens on Thrift port and receives events from external Thrift client streams. +When paired with the built-in ThriftSink on another (previous hop) Flume agent, +it can create tiered collection topologies. +Required properties are in **bold**. + +================== =========== =================================================== +Property Name Default Description +================== =========== =================================================== +**channels** -- +**type** -- The component type name, needs to be ``thrift`` +**bind** -- hostname or IP address to listen on +**port** -- Port # to bind to +threads -- Maximum number of worker threads to spawn +selector.type +selector.* +interceptors -- Space separated list of interceptors +interceptors.* +================== =========== =================================================== + +Example for agent named a1: + +.. code-block:: properties + + a1.sources = r1 + a1.channels = c1 + a1.sources.r1.type = thrift + a1.sources.r1.channels = c1 + a1.sources.r1.bind = 0.0.0.0 + a1.sources.r1.port = 4141 + Exec Source ~~~~~~~~~~~ @@ -762,6 +800,7 @@ allows the 'command' to use features from the shell such as wildcards, back tick loops, conditionals etc. In the absence of the 'shell' config, the 'command' will be invoked directly. Common values for 'shell' : '/bin/sh -c', '/bin/ksh -c', 'cmd /c', 'powershell -Command', etc. + .. code-block:: properties agent_foo.sources.tailsource-1.type = exec agent_foo.sources.tailsource-1.shell = /bin/bash -c @@ -1479,6 +1518,39 @@ Example for agent named a1: a1.sinks.k1.hostname = 10.10.10.10 a1.sinks.k1.port = 4545 +Thrift Sink +~~~~~~~~~~~ + +This sink forms one half of Flume's tiered collection support. Flume events +sent to this sink are turned into Thrift events and sent to the configured +hostname / port pair. The events are taken from the configured Channel in +batches of the configured batch size. +Required properties are in **bold**. + +========================== ======= ============================================== +Property Name Default Description +========================== ======= ============================================== +**channel** -- +**type** -- The component type name, needs to be ``thrift``. +**hostname** -- The hostname or IP address to bind to. +**port** -- The port # to listen on. +batch-size 100 number of event to batch together for send. +connect-timeout 20000 Amount of time (ms) to allow for the first (handshake) request. +request-timeout 20000 Amount of time (ms) to allow for requests after the first. +connection-reset-interval none Amount of time (s) before the connection to the next hop is reset. This will force the Thrift Sink to reconnect to the next hop. This will allow the sink to connect to hosts behind a hardware load-balancer when news hosts are added without having to restart the agent. +========================== ======= ============================================== + +Example for agent named a1: + +.. code-block:: properties + + a1.channels = c1 + a1.sinks = k1 + a1.sinks.k1.type = thrift + a1.sinks.k1.channel = c1 + a1.sinks.k1.hostname = 10.10.10.10 + a1.sinks.k1.port = 4545 + IRC Sink ~~~~~~~~
