Could it be as simple as just changing FlumeUtils to accept a list of host/port number pairs to start the RPC servers on?

On 4/7/14, 12:58 PM, Christophe Clapp wrote:
Based on the source code here:
https://github.com/apache/spark/blob/master/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeUtils.scala

It looks like in its current version, FlumeUtils does not support starting an Avro RPC server on more than one worker.

- Christophe

On 4/7/14, 12:23 PM, Michael Ernest wrote:
You can configure your sinks to write to one or more Avro sources in a
load-balanced configuration.

https://flume.apache.org/FlumeUserGuide.html#flume-sink-processors

mfe


On Mon, Apr 7, 2014 at 3:19 PM, Christophe Clapp
<christo...@christophe.cc>wrote:

Hi,

From my testing of Spark Streaming with Flume, it seems that there's only one of the Spark worker nodes that runs a Flume Avro RPC server to receive
messages at any given time, as opposed to every Spark worker running an
Avro RPC server to receive messages. Is this the case? Our use-case would
benefit from balancing the load across Workers because of our volume of
messages. We would be using a load balancer in front of the Spark workers
running the Avro RPC servers, essentially round-robinning the messages
across all of them.

If this is something that is currently not supported, I'd be interested in
contributing to the code to make it happen.

- Christophe





Reply via email to