Re: Spark Streaming and Flume Avro RPC Servers

Christophe Clapp Mon, 07 Apr 2014 13:17:34 -0700

Could it be as simple as just changing FlumeUtils to accept a list ofhost/port number pairs to start the RPC servers on?


On 4/7/14, 12:58 PM, Christophe Clapp wrote:

Based on the source code here:
https://github.com/apache/spark/blob/master/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeUtils.scala
It looks like in its current version, FlumeUtils does not supportstarting an Avro RPC server on more than one worker.
- Christophe

On 4/7/14, 12:23 PM, Michael Ernest wrote:
You can configure your sinks to write to one or more Avro sources in a
load-balanced configuration.

https://flume.apache.org/FlumeUserGuide.html#flume-sink-processors

mfe


On Mon, Apr 7, 2014 at 3:19 PM, Christophe Clapp
<christo...@christophe.cc>wrote:
Hi,
From my testing of Spark Streaming with Flume, it seems thatthere's onlyone of the Spark worker nodes that runs a Flume Avro RPC server toreceive
messages at any given time, as opposed to every Spark worker running an
Avro RPC server to receive messages. Is this the case? Our use-casewould
benefit from balancing the load across Workers because of our volume of
messages. We would be using a load balancer in front of the Sparkworkers
running the Avro RPC servers, essentially round-robinning the messages
across all of them.
If this is something that is currently not supported, I'd beinterested in
contributing to the code to make it happen.

- Christophe

Re: Spark Streaming and Flume Avro RPC Servers

Reply via email to