[ 
https://issues.apache.org/jira/browse/SPARK-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14989521#comment-14989521
 ] 

Rodrigo Boavida commented on SPARK-10420:
-----------------------------------------

Hi all,
I've touched base about this with Luc, Julian and Prakash in the Spark Summit 
EU.

RS implementation is extremely important for a distributed system to keep up 
data consumption at its own pace without risking overwhelming available 
resources and failure conditions such as OOM. My company has extreme interest 
in this feature, given we are implementing a new streaming architecture. 

The current implementation of Spark Streaming has the Receiver as entry point 
for data which can easily become a constraint as corresponding failures or slow 
downs on corresponding JVM can impact the whole spark streaming application 
dramatically. When the data stream allows to partition the feed (like Kafka or 
Akka with Sharding) making the receiver concept a built in executor concept 
where each executor has it's own slice of the feed. leverages the parallel 
processing nature of Spark. This is already done with Kafka Direct Stream.

I would be very interested in this feature, in particular without a receiver 
and with a direct stream approach (like the Kafka one) where each executor 
could subscribe directly, based on whatever is the type of streaming context. 
For example: for an Akka based RS stream, given an sharding function passed in 
the streaming context. 

This would also potentially address the currently known issue regarding Mesos 
and dynamic allocation, where Mesos could bring down the executor on which the 
Receiver is running thus stopping the whole stream - if there is no Receiver, 
the stream wouldn't have to stop and it's up to each streaming context to 
define how streaming slices would be redistributed. For example: with an Akka 
based RS streaming context I assume this could be easily achieved.

I intend to create two tickets - one for generic RS abstraction layer on the 
executors and another for specific Akka RS based Streaming context. Will tag a 
few people on both,

I will appreciate any comments given!

Tnks,
Rod

> Implementing Reactive Streams based Spark Streaming Receiver
> ------------------------------------------------------------
>
>                 Key: SPARK-10420
>                 URL: https://issues.apache.org/jira/browse/SPARK-10420
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>            Reporter: Nilanjan Raychaudhuri
>            Priority: Minor
>
> Hello TD,
> This is probably the last bit of the back-pressure story, implementing 
> ReactiveStreams based Spark streaming receivers. After discussing about this 
> with my Typesafe team we came up with the following design document
> https://docs.google.com/document/d/1lGQKXfNznd5SPuQigvCdLsudl-gcvWKuHWr0Bpn3y30/edit?usp=sharing
> Could you please take a look at this when you get a chance?
> Thanks
> Nilanjan



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to