[
https://issues.apache.org/jira/browse/SPARK-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986220#comment-13986220
]
Tathagata Das edited comment on SPARK-1645 at 4/30/14 11:38 PM:
----------------------------------------------------------------
This makes sense from the integration point of view. Though I wonder from
thePOV of Flume's deployment configuration does it make things more complex?
Like for example, if someone has a the flume system already setup, in the
current situation, the configuration change to add a new sink seems standard
and easy to understand. However, in the proposed model, since Flume's data
pushing node has to run a sink, how much complicated does this configuration
step become?
was (Author: tdas):
This makes sense from the integration point of view. Though I wonder from
thePOV of Flume's deployment configuration does it make things more complex?
Like for example, if someone has a the flume system already setup, in the
current situation, the configuration change to add a new sink seems standard
and easy. However, in the proposed model, since Flume's data pushing node has
to run a sink, how much complicated does this configuration process get?
> Improve Spark Streaming compatibility with Flume
> ------------------------------------------------
>
> Key: SPARK-1645
> URL: https://issues.apache.org/jira/browse/SPARK-1645
> Project: Spark
> Issue Type: Bug
> Components: Streaming
> Reporter: Hari Shreedharan
>
> Currently the following issues affect Spark Streaming and Flume compatibilty:
> * If a spark worker goes down, it needs to be restarted on the same node,
> else Flume cannot send data to it. We can fix this by adding a Flume receiver
> that is polls Flume, and a Flume sink that supports this.
> * Receiver sends acks to Flume before the driver knows about the data. The
> new receiver should also handle this case.
> * Data loss when driver goes down - This is true for any streaming ingest,
> not just Flume. I will file a separate jira for this and we should work on it
> there. This is a longer term project and requires considerable development
> work.
> I intend to start working on these soon. Any input is appreciated. (It'd be
> great if someone can add me as a contributor on jira, so I can assign the
> jira to myself).
--
This message was sent by Atlassian JIRA
(v6.2#6252)