[
https://issues.apache.org/jira/browse/SPARK-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14989521#comment-14989521
]
Rodrigo Boavida commented on SPARK-10420:
-----------------------------------------
Hi all,
I've touched base about this with Luc, Julian and Prakash in the Spark Summit
EU.
RS implementation is extremely important for a distributed system to keep up
data consumption at its own pace without risking overwhelming available
resources and failure conditions such as OOM. My company has extreme interest
in this feature, given we are implementing a new streaming architecture.
The current implementation of Spark Streaming has the Receiver as entry point
for data which can easily become a constraint as corresponding failures or slow
downs on corresponding JVM can impact the whole spark streaming application
dramatically. When the data stream allows to partition the feed (like Kafka or
Akka with Sharding) making the receiver concept a built in executor concept
where each executor has it's own slice of the feed. leverages the parallel
processing nature of Spark. This is already done with Kafka Direct Stream.
I would be very interested in this feature, in particular without a receiver
and with a direct stream approach (like the Kafka one) where each executor
could subscribe directly, based on whatever is the type of streaming context.
For example: for an Akka based RS stream, given an sharding function passed in
the streaming context.
This would also potentially address the currently known issue regarding Mesos
and dynamic allocation, where Mesos could bring down the executor on which the
Receiver is running thus stopping the whole stream - if there is no Receiver,
the stream wouldn't have to stop and it's up to each streaming context to
define how streaming slices would be redistributed. For example: with an Akka
based RS streaming context I assume this could be easily achieved.
I intend to create two tickets - one for generic RS abstraction layer on the
executors and another for specific Akka RS based Streaming context. Will tag a
few people on both,
I will appreciate any comments given!
Tnks,
Rod
> Implementing Reactive Streams based Spark Streaming Receiver
> ------------------------------------------------------------
>
> Key: SPARK-10420
> URL: https://issues.apache.org/jira/browse/SPARK-10420
> Project: Spark
> Issue Type: Improvement
> Components: Streaming
> Reporter: Nilanjan Raychaudhuri
> Priority: Minor
>
> Hello TD,
> This is probably the last bit of the back-pressure story, implementing
> ReactiveStreams based Spark streaming receivers. After discussing about this
> with my Typesafe team we came up with the following design document
> https://docs.google.com/document/d/1lGQKXfNznd5SPuQigvCdLsudl-gcvWKuHWr0Bpn3y30/edit?usp=sharing
> Could you please take a look at this when you get a chance?
> Thanks
> Nilanjan
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]