Hi,
From my testing of Spark Streaming with Flume, it seems that there's
only one of the Spark worker nodes that runs a Flume Avro RPC server to
receive messages at any given time, as opposed to every Spark worker
running an Avro RPC server to receive messages. Is this the case? Our
use-case would benefit from balancing the load across Workers because of
our volume of messages. We would be using a load balancer in front of
the Spark workers running the Avro RPC servers, essentially
round-robinning the messages across all of them.
If this is something that is currently not supported, I'd be interested
in contributing to the code to make it happen.
- Christophe
- Spark Streaming and Flume Avro RPC Servers Christophe Clapp
-