We had a similar issue while working on one of our usecase where we were
processing at a moderate throughput (around 500MB/S). When the processing
time exceeds the batch duration, it started to throw up blocknotfound
exceptions, i made a workaround for that issue and is explained over here
http://apache-spark-developers-list.1001551.n3.nabble.com/SparkStreaming-Workaround-for-BlockNotFound-Exceptions-td12096.html

Basically, instead of generating blocks blindly, i made the receiver sleep
if there's an increase in the scheduling delay (if scheduling delay exceeds
3 times the batch duration). This prototype is working nicely and the speed
is encouraging as its processing at 500MB/S without having any failures so
far.


Thanks
Best Regards

On Fri, May 8, 2015 at 8:11 PM, François Garillot <
francois.garil...@typesafe.com> wrote:

> Hi guys,
>
> We[1] are doing a bit of work on Spark Streaming, to help it face
> situations where the throughput of data on an InputStream is (momentarily)
> susceptible to overwhelm the Receiver(s) memory.
>
> The JIRA & design doc is here:
> https://issues.apache.org/jira/browse/SPARK-7398
>
> We'd sure appreciate your comments !
>
> --
> François Garillot
> [1]: Typesafe & some helpful collaborators on benchmarking 'at scale'
>

Reply via email to