[ 
https://issues.apache.org/jira/browse/SPARK-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984800#comment-13984800
 ] 

Tathagata Das commented on SPARK-1645:
--------------------------------------

But this does not solve the scenario where the whole worker running the 
receiver dies. If worker dies, then receiver and sink all of it is gone, and 
Flume source has nowhere to send the data to, isnt it?

As far as I understand, the only way to deal with a worker failure is to 
configure a pool of workers as sinks. If the one of the sinks dont work because 
the worker failed, the second sink on a the second worker can still receive 
data. Am I missing something?

> Improve Spark Streaming compatibility with Flume
> ------------------------------------------------
>
>                 Key: SPARK-1645
>                 URL: https://issues.apache.org/jira/browse/SPARK-1645
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>            Reporter: Hari Shreedharan
>
> Currently the following issues affect Spark Streaming and Flume compatibilty:
> * If a spark worker goes down, it needs to be restarted on the same node, 
> else Flume cannot send data to it. We can fix this by adding a Flume receiver 
> that is polls Flume, and a Flume sink that supports this.
> * Receiver sends acks to Flume before the driver knows about the data. The 
> new receiver should also handle this case.
> * Data loss when driver goes down - This is true for any streaming ingest, 
> not just Flume. I will file a separate jira for this and we should work on it 
> there. This is a longer term project and requires considerable development 
> work.
> I intend to start working on these soon. Any input is appreciated. (It'd be 
> great if someone can add me as a contributor on jira, so I can assign the 
> jira to myself).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to