[
https://issues.apache.org/jira/browse/SPARK-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983799#comment-13983799
]
Tathagata Das commented on SPARK-1645:
--------------------------------------
Hey [~hshreedharan] Thanks for jotting these down. Once Spark 1.0 is out of the
picture I will start working up a design document for this. The more I think,
all these issues (reliable storing of data, driver failure recovery using
tachyon, etc. ) are quite intricately linked with each other and we need to do
a proper design of the solution before jump into this. Well, for Spark
Streaming, this is the highest priority issue, so we will start working on this
soon after Spark 1.0
Also, this is a massive task involving many subtask. I may actually break up
this JIRA into multiple sub-JIRAs, some of which we can start working on
independently.
> Improve Spark Streaming compatibility with Flume
> ------------------------------------------------
>
> Key: SPARK-1645
> URL: https://issues.apache.org/jira/browse/SPARK-1645
> Project: Spark
> Issue Type: Bug
> Components: Streaming
> Reporter: Hari Shreedharan
>
> Currently the following issues affect Spark Streaming and Flume compatibilty:
> * If a spark worker goes down, it needs to be restarted on the same node,
> else Flume cannot send data to it. We can fix this by adding a Flume receiver
> that is polls Flume, and a Flume sink that supports this.
> * Receiver sends acks to Flume before the driver knows about the data. The
> new receiver should also handle this case.
> * Data loss when driver goes down - This is true for any streaming ingest,
> not just Flume. I will file a separate jira for this and we should work on it
> there. This is a longer term project and requires considerable development
> work.
> I intend to start working on these soon. Any input is appreciated. (It'd be
> great if someone can add me as a contributor on jira, so I can assign the
> jira to myself).
--
This message was sent by Atlassian JIRA
(v6.2#6252)