[jira] [Updated] (SPARK-2201) Improve FlumeInputDStream's stability and make it scalable

sunshangchun (JIRA) Sun, 06 Jul 2014 06:59:16 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


sunshangchun updated SPARK-2201:
--------------------------------

    Description: 
Currently:
FlumeUtils.createStream(ssc, "localhost", port); 
This means that only one flume receiver can work with FlumeInputDStream .so the 
solution is not scalable. 
I use a zookeeper to solve this problem.

Spark flume receivers register themselves to a zk path when started, and a 
flume agent get physical hosts and push events to them.

Some works need to be done here: 
1.receiver create tmp node in zk,  listeners just watch those tmp nodes.
2. when spark FlumeReceivers started, they acquire a physical host (localhost's 
ip and an idle port) and register itself to zookeeper.
3. A new flume sink. In the method of appendEvents, they get physical hosts and 
push data to them in a round-robin manner.


  was:
Currently only one flume receiver can work with FlumeInputDStream and I am 
willing to do some works to improve it, my ideas are described as follows: 

a ip and port denotes a physical host, and a logical host consists of one or 
more physical hosts

In our case, spark flume receivers bind themselves to a logical host when 
started, and a flume agent get physical hosts and push events to them.
Two classes are introduced, LogicalHostRouter supplies a map between logical 
host and physical host, and LogicalHostRouterListener let relation changes 
watchable.

Some works need to be done here: 
1. LogicalHostRouter and LogicalHostRouterListener  can be implemented by 
zookeeper. when physical host started, create tmp node in zk,  listeners just 
watch those tmp nodes.
2. when spark FlumeReceivers started, they acquire a physical host (localhost's 
ip and an idle port) and register itself to zookeeper.
3. A new flume sink. In the method of appendEvents, they get physical hosts and 
push data to them in a round-robin manner.

Does it a feasible plan? Thanks.


        Summary: Improve FlumeInputDStream's stability and make it scalable  
(was: Improve FlumeInputDStream's stability)

> Improve FlumeInputDStream's stability and make it scalable
> ----------------------------------------------------------
>
>                 Key: SPARK-2201
>                 URL: https://issues.apache.org/jira/browse/SPARK-2201
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: sunshangchun
>
> Currently:
> FlumeUtils.createStream(ssc, "localhost", port); 
> This means that only one flume receiver can work with FlumeInputDStream .so 
> the solution is not scalable. 
> I use a zookeeper to solve this problem.
> Spark flume receivers register themselves to a zk path when started, and a 
> flume agent get physical hosts and push events to them.
> Some works need to be done here: 
> 1.receiver create tmp node in zk,  listeners just watch those tmp nodes.
> 2. when spark FlumeReceivers started, they acquire a physical host 
> (localhost's ip and an idle port) and register itself to zookeeper.
> 3. A new flume sink. In the method of appendEvents, they get physical hosts 
> and push data to them in a round-robin manner.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (SPARK-2201) Improve FlumeInputDStream's stability and make it scalable

Reply via email to