[ 
https://issues.apache.org/jira/browse/SPARK-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983971#comment-13983971
 ] 

Hari Shreedharan commented on SPARK-1645:
-----------------------------------------

Yep, that is correct. I'd like to contribute to the design as much as possible 
- so perhaps we can work on the design document together. Once we start looking 
into this, we can definitely have to proceed on multiple fronts so we can get 
more of these features committed faster.

> Improve Spark Streaming compatibility with Flume
> ------------------------------------------------
>
>                 Key: SPARK-1645
>                 URL: https://issues.apache.org/jira/browse/SPARK-1645
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>            Reporter: Hari Shreedharan
>
> Currently the following issues affect Spark Streaming and Flume compatibilty:
> * If a spark worker goes down, it needs to be restarted on the same node, 
> else Flume cannot send data to it. We can fix this by adding a Flume receiver 
> that is polls Flume, and a Flume sink that supports this.
> * Receiver sends acks to Flume before the driver knows about the data. The 
> new receiver should also handle this case.
> * Data loss when driver goes down - This is true for any streaming ingest, 
> not just Flume. I will file a separate jira for this and we should work on it 
> there. This is a longer term project and requires considerable development 
> work.
> I intend to start working on these soon. Any input is appreciated. (It'd be 
> great if someone can add me as a contributor on jira, so I can assign the 
> jira to myself).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to